Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data

Hidir Aras, Rima Türker, Daniela Geiss, Max Milbradt, Harald Sack

Published: 2018 September

Buchtitel: Proc. of the 14th Int. Conf. on Semantic Systems (SEMANTICS 2018), P&D Track
Ausgabe: 2198
Verlag: CEUR Workshop Proceedings

Referierte Veröffentlichung

BibTeX

Kurzfassung
Patent search systems allow complex queries to be formulated by combining different search terms using boolean and other operators such as proximity, wildcards, etc. in order to find relevant patents. This widely adopted approach is based on exact match, making it difficult to efficiently identify and analyze relevant patents, as the search terms often do not match the terminology used by the inventors. Another problem concerns the large number of relevant hits due to weekly and monthly updates of patent applications and grants. Although some semantic search systems for patents based on latent semantic analysis have been implemented as black-box systems in the past, word embeddings that have been successfully applied to generate semantic representations of text have rarely been employed and evaluated for a (large) patent corpus. The work described here aims to evaluate semantic representations for patent data via a pre-trained general model in comparison to an adapted word embedding model created from a patent corpus in order to contribute to a multitude of semantic analysis tasks for patents such as similarity search, content analysis, entity linking etc.

Download: Media:paper_123.pdf
Weitere Informationen unter: Link

Forschungsgruppe

Information Service Engineering

Forschungsgebiet

Inproceedings3737

Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data

Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data