Inproceedings3834: Unterschied zwischen den Versionen
Cx0800 (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „{{Publikation Erster Autor |ErsterAutorNachname=Hoppe |ErsterAutorVorname=Fabian }} {{Publikation Author |Rank=2 |Author=Tabea Tietz }} {{Publikation Author |R…“) |
Xi5455 (Diskussion | Beiträge) K |
||
Zeile 33: | Zeile 33: | ||
|Month=Juni | |Month=Juni | ||
|Booktitle=Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020 | |Booktitle=Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020 | ||
+ | |Pages=15-20 | ||
|Organization=WHiSe Workshop | |Organization=WHiSe Workshop | ||
|Publisher=CEUR | |Publisher=CEUR | ||
+ | |Series=CEUR Workshop Proceedings | ||
+ | |Volume=2695 | ||
}} | }} | ||
{{Publikation Details | {{Publikation Details | ||
+ | |Abstract=Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categoriza- tion approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Pre- liminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task. | ||
+ | |Download=paper2.pdf | ||
+ | |Link=https://ceur-ws.org/Vol-2695/paper2.pdf | ||
|Forschungsgruppe=Information Service Engineering | |Forschungsgruppe=Information Service Engineering | ||
}} | }} |
Aktuelle Version vom 17. November 2022, 08:49 Uhr
The Challenges of German Archival Document Categorization on Insufficient Labeled Data
The Challenges of German Archival Document Categorization on Insufficient Labeled Data
Published: 2020
Juni
Buchtitel: Proceedings of Workshop on Humanities in the Semantic Web co-located with ESWC 2020
Ausgabe: 2695
Reihe: CEUR Workshop Proceedings
Seiten: 15-20
Verlag: CEUR
Organisation: WHiSe Workshop
Referierte Veröffentlichung
BibTeX
Kurzfassung
Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categoriza- tion approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Pre- liminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task.
Download: Media:paper2.pdf
Weitere Informationen unter: Link
Information Service Engineering