Technical report detail

Concept-Based Semantic Annotation, Indexing and Retrieval of Office-Like Document Units

by Sasa Nesic, Mehdi Jazayeri, Fabio Crestani, Dragan Gasevic

We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make oce-like document units be uniquely identi ed, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we propose, we rst lexically expand descriptions of ontological concepts to enhance syntactic matching. Next, we expand a set of syntactic matches with semantically related concepts (i.e., semantic matches) discovered by exploring the annotation ontology. Moreover, we calculate the annotation weight of both the syntactic and semantic matches by taking into account the e ects of the lexical expansion and measuring semantic distance between ontological concepts. The retrieval model of document units utilizes the inverted concept index that we generate from the concepts used in the annotation and their weights for document units they annotate. Results of the preliminary evaluation conducted with a prototype implementation are promising. We present the analysis of these results.

Technical report 2010/01, January 2010

BibTex entry

@techreport{10conceptbased, author = {Sasa Nesic and Mehdi Jazayeri and Fabio Crestani and Dragan Gasevic}, title = {Concept-Based Semantic Annotation, Indexing and Retrieval of Office-Like Document Units}, institution = {University of Lugano}, number = {2010/01}, year = 2010, month = jan }