Published: 2013 September
Buchtitel: Conference and Labs of the Evaluation Forum (CLEF 2013)
In this paper, we study the problem of extracting knowledge from textual documents written in different languages by annotating the text on the basis of a cross-lingual knowledge base, namely Wikipedia. Our contribution is twofold. First, we propose a novel framework for evaluating cross-lingual text annotation techniques, based on annotation of a parallel corpus to a hub-language in a cross-lingual knowledge base. Second, we investigate the performance of different cross-lingual text annotation techniques according to our proposed evaluation framework. We perform experiments for an empirical comparison of three approaches: (i) Cross-lingual Named Entity Annotation (CL-NEA), (ii) Cross-lingual Wikifier Annotation (CL-WIFI), and (iii) Cross-lingual Explicit Semantic Analysis (CL-ESA). Besides establishing an evaluation framework, our results show the differences between the three investigated approaches and demonstrate their advantages and disadvantages.