Thema4958

Patent Document Summarization with Contextual Embeddings

Informationen zur Arbeit

Abschlussarbeitstyp: Master
Betreuer: Harald Sack, Rima Türker
Forschungsgruppe: Information Service Engineering
Partner: FIZ Karlsruhe (https://www.fiz-karlsruhe.de/de/forschung/information-service-engineering)
Archivierungsnummer: 4958
Abschlussarbeitsstatus: Offen
Beginn: 13. Oktober 2022
Abgabe: unbekannt

Weitere Informationen

Are you interested in making a big impact with your thesis? Work with us on an innovative approach for patent summarization. Patents drive innovations by enabling international organizations to protect their inventions from a legal perspective. Consequently, these documents are important resources that describe inventions. Due to the rapid growth of the number of available patent documents, manually analyzing such data is beyond human capabilities. Therefore, text summarization as an initial step to enable more efficient processing of the documents became a necessity. The existing summarization models are either abstractive or extractive [1,2]. While the extractive summarization models aim to extract the most descriptive sentences from a given document, the abstractive summarization approaches generate phrases or sentences that may not appear in the original document. There exist several summarization models and they seem to perform well. However, most of them focus on conventional documents, e.g., news articles. This thesis aims to design a patent summarization model which has the ability to generate summaries that are similar to human-written abstracts. The patent abstracts are manually written by the inventors after the patent application accepted [3]. They help users to understand the inventions and their key technical details without going through the entire documents [3]. In order to avoid such a costly task, the goal of the thesis is to propose a fully automated patent summarization approach that uses contextual embedding models (e.g., BERT), and applies the-state-of-the-art techniques to generate abstractive summaries. The model will be trained with the description of patent documents for which abstracts are available. The generated summaries can be utilized as a starting point for different tasks, e.g., patent classification. This thesis will be supervised by Prof. Dr. Harald Sack, Information Service Engineering at Institute AIFB, KIT, in collaboration with FIZ Karlsruhe. [1] https://arxiv.org/pdf/1909.03186.pdf [2] https://arxiv.org/pdf/2004.08795.pdf [3] https://aclanthology.org/P19-1212.pdf

Which prerequisites should you have? • Good programming skills in Python • Interest in Natural Language Processing • Interest in Deep Learning technologies

Ausschreibung: Download (pdf)