Art. 07 – Vol. 21 – No. 4 – 2011

Integrated System for Developing Semantically-enhanced Archive eContent

Mihaela Dinşoreanu
Ioan Salomie
Cristina Bianca Pop
Technical University of Cluj-Napoca, Department of Computer Science, Cluj-Napoca, România

Abstract: This paper addresses the problem of knowledge processing from historical documents available in archives. Thus, we propose an integrated solution which performs information extraction and knowledge acquisition on one hand and information and knowledge retrieval on the other hand. We present a method that adapts the Text2Onto framework to semi-automatically extract relevant information from the documents content through lexical and semantic text annotation. The semantic annotations will further populate a domain ontology which is used in information and knowledge retrieval. We also present a method for querying the digital knowledge base of historical documents in the Romanian natural language. The method is augmented with suggestions and word meaning disambiguation. We tested and validated our integrated solution on a set of documents addressing the history of Transylvania.

Keywords: knowledge acquisition, semantic annotation, knowledge retrieval, natural language query.

