Identifying Used Methods and Datasets in Scientific Publications
Buchtitel: Proceedings of the AAAI-21 Workshop on Scientific Document Understanding (SDU'21)
Although it has become common to assess publications and researchers by means of their citation count (e.g., using the h-index), measuring the impact of scientific methods and datasets (e.g., using an h-index for datasets) has been performed only to a limited extent. This is not surprising because the usage information of methods and datasets is typically not explicitly provided by the authors, but hidden in a publication's text. In this paper, we propose an approach to identifying methods and datasets in texts that have actually been used by the authors. Our approach first recognizes datasets and methods in the text by means of a domain-specific named entity recognition method with minimal human interaction. It then classifies these mentions into used vs. non-used based on the textual contexts. The obtained labels are aggregated on the document level and integrated into the Microsoft Academic Knowledge Graph modeling publications' metadata. In experiments based on the Microsoft Academic Graph, we show that both method and dataset mentions can be identified and correctly classified with respect to their usage to a high degree. Overall, our approach facilitates method and dataset recommendation, enhanced paper recommendation, and scientific impact quantification. It can be extended in such a way that it can identify mentions of any entity type (e.g., task).