Semantic-enhanced search: Finding meaning in large-scale scanned text collections

Kolloquium Angewandte Informatik

This talk presents Capisco, a system for semantic-enhanced search in a digital library of full-texts. Document search in Digital Libraries typically use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to over-come the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. Capisco instead analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing. For established systems, completely replacing, or even making significant changes to the docu-ment retrieval mechanism would require major technological effort, and would most likely be dis-ruptive. We explored ways to use the results of semantic analysis and disambiguation, while retain-ing an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifica-tions.

More information at http://www.cms.waikato.ac.nz/people/hinze

(Dr. Annika Hinze)

