Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data
Published: 2018 September
Buchtitel: Proc. of the 14th Int. Conf. on Semantic Systems (SEMANTICS 2018), P&D Track
Verlag: CEUR Workshop Proceedings
Patent search systems allow complex queries to be formulated by combining different search terms using boolean and other operators such as proximity, wildcards, etc. in order to find relevant patents. This widely adopted approach is based on exact match, making it difficult to efficiently identify and analyze relevant patents, as the search terms often do not match the terminology used by the inventors. Another problem concerns the large number of relevant hits due to weekly and monthly updates of patent applications and grants. Although some semantic search systems for patents based on latent semantic analysis have been implemented as black-box systems in the past, word embeddings that have been successfully applied to generate semantic representations of text have rarely been employed and evaluated for a (large) patent corpus. The work described here aims to evaluate semantic representations for patent data via a pre-trained general model in comparison to an adapted word embedding model created from a patent corpus in order to contribute to a multitude of semantic analysis tasks for patents such as similarity search, content analysis, entity linking etc.
Weitere Informationen unter: Link