HistorEx: Exploring Historical Text Corpora Using Word and Document Embeddings

Published: 2019 Juni
Herausgeber: Hitzler P. et al. (eds)
Buchtitel: The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science
Ausgabe: 11762
Seiten: 302
Verlag: Springer
Erscheinungsort: Cham

Written text can be understood as a means to acquire in-sights into the nature of past and present cultures and societies. Numer-ous projects have been devoted to digitizing and publishing historicaltextual documents in digital libraries which scientists can utilize as valu-able resources for research. However, the extent of textual data availableexceeds humans’ abilities to explore the data efficiently. In this paper, aframework is presented which combines unsupervised machine learningtechniques and natural language processing on the example of histor-ical text documents on the 19thcentury of the USA. Named entitiesare extracted from semi-structured text, which is enriched with com-plementary information from Wikidata. Word embeddings are leveragedto enable further analysis of the text corpus, which is visualized in aweb-based application.

ISBN: 978-3-030-32326-4
Weitere Informationen unter: Link
DOI Link:


