Knowledge Based Short Text Categorization Using Entity and Category Embeddings
Published: 2019 Juni
Herausgeber: Hitzler P. et al. (eds)
Buchtitel: The Semantic Web. ESWC 2019. Lecture Notes in Computer Science
Short text categorization is an important task due to the rapid growth of online available short texts in various domains such as web search snippets, etc. Most of the traditional methods suffer from sparsity and shortness of the text. Moreover, supervised learning methods require a significant amount of training data and manually labelling such data can be very time-consuming and costly. In this study, we pro-pose a novel probabilistic model for Knowledge-Based Short Text Categorization (KBSTC), which does not require any labeled training data to classify a short text. This is achieved by leveraging entities and categories from large knowledge bases, which are further embedded into a common vector space, for which we propose a new entity and category embedding model. Given a short text, its category (e.g. Business, Sports, etc.) can then be derived based on the entities mentioned in the text by exploiting semantic similarity between entities and categories. To validate the effectiveness of the proposed method, we conducted experiments on two real-world datasets, i.e., AG News and Google Snippets. The experimental results show that our approach significantly outperforms the classification approaches which do not require any labeled data, while it comes close to the results of the supervised approaches.
Weitere Informationen unter: Link
DOI Link: https://doi.org/10.1007/978-3-030-21348-0_23