Art. 07 – Vol. 21 – No. 4 – 2011

Integrated System for Developing Semantically-enhanced Archive eContent

Mihaela Dinşoreanu
Ioan Salomie
Cristina Bianca Pop
Technical University of Cluj-Napoca, Department of Computer Science, Cluj-Napoca, România

Abstract: This paper addresses the problem of knowledge processing from historical documents available in archives. Thus, we propose an integrated solution which performs information extraction and knowledge acquisition on one hand and information and knowledge retrieval on the other hand. We present a method that adapts the Text2Onto framework to semi-automatically extract relevant information from the documents content through lexical and semantic text annotation. The semantic annotations will further populate a domain ontology which is used in information and knowledge retrieval. We also present a method for querying the digital knowledge base of historical documents in the Romanian natural language. The method is augmented with suggestions and word meaning disambiguation. We tested and validated our integrated solution on a set of documents addressing the history of Transylvania.

Keywords: knowledge acquisition, semantic annotation, knowledge retrieval, natural language query.

View full article


  1. CIMIANO, P.; VÖLKER, J.: Text2Onto–A Framework for ontology Learning and Data-Driven Change Discovery. Proc. of the 10th International Conference on Applications of Natural Language to Information Systems, Vol. 3513 of LNCS, 2005, pp. 227-238.
  2. CUNNINGHAM, H. et al.: Developing Language Processing Components with GATE. Available online at:
  3. AMARDEILH, F.: OntoPop or how to annotate documents and populate ontologies from texts. Proceedings of the ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation, 2006.
  4. LACLAVIK, M.; CIGLAN, M.; SELENG, M.; KRAJEI, S.: Ontea: Semi-automatic Pattern based Text Annotation empowered with Information Retrieval Methods. Tools for acquisition, organisation and presenting of information and knowledge: Proceedings in Informatics and Information Technologies, Kosice, Vydavatelstvo STU, Bratislava, part 2, (2007), ISBN 978-80-227-2716-7, 2007, pp. 119-129.
  5. BUITELAAR, P.; CIMIANO, P.; RACIOPPA, S.; SIEGEL, M.: Ontology-based Information Extraction with SOBA. Proceedings of the International Conference on Language Resources and Evaluation, 2006, pp. 2321-2324.
  6. BERNSTEIN, A.; KAUFMANN, E.; KAISER, C.; KIEFER, C.: Ginseng: A Guided Input Natural Language Search Engine for Querying Ontologies. Jena User Conference, 2006.
  7. BERNSTEIN, A.; KAUFMANN, E.: GINO – A Guided Input Natural Language Ontology Editor. Proceedings of the 5th International Semantic Web Conference (ISWC 2006), Athens, Georgia, 2006, pp. 144-157.
  8. LOPEZ, V.; MOTTA, E.; UREN, V.; PASIN, M.: AquaLog: An Ontology-driven Question Answering System for Organizational Semantic Intranets. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 5(2), 2007, pp. 72-105.
  9. The Pellet OWL-DL Reasoner. Available:
  10. Cluj County National Archives,

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.