Analyzing the GitHub Repositories of Research Papers
Buchtitel: Proceedings of the 20th ACM/IEEE Joint Conference on Digital Libraries (JCDL'20)
Erscheinungsort: Xi'an, China
Linking to code repositories, such as on GitHub, in scientific papers becomes increasingly common in the field of computer science. The actual quality and usage of these repositories are, however, to a large degree unknown so far. In this paper, we present for the first time a thorough analysis of all GitHub code repositories linked in scientific papers using the Microsoft Academic Graph as a data source. We analyze the repositories and their associated papers with respect to various dimensions. We observe that the number of stars and forks, respectively, over all repositories follows a power-law distribution. In the majority of cases, only one person from the authors is contributing to the repository. The GitHub manuals are mostly kept rather short with few sentences. The source code is mostly provided in Python. The papers containing the repository URLs as well as the papers' authors are typically from the AI field.