Treffer: An empirical study on the accuracy of GitHub's dependency graph and the nature of its inaccuracy.

Title:
An empirical study on the accuracy of GitHub's dependency graph and the nature of its inaccuracy.
Authors:
Bifolco, Daniele1 (AUTHOR) d.bifolco@studenti.unisannio.it, Romano, Simone2 (AUTHOR), Nocera, Sabato2 (AUTHOR), Francese, Rita2 (AUTHOR), Scanniello, Giuseppe2 (AUTHOR), Di Penta, Massimiliano1 (AUTHOR)
Source:
Information & Software Technology. Nov2025, Vol. 187, pN.PAG-N.PAG. 1p.
Company/Entity:
Database:
Business Source Premier

Weitere Informationen

GitHub's dependency graph is a tool that eases Software Composition Analysis (SCA), and it is leveraged not only by other tools or by practitioners in their analyses but also by researchers when conducting studies on open-source projects. However, its potential inaccuracy may seriously harm its applicability and usefulness. This paper quantitatively and qualitatively analyzes the accuracy of GitHub's dependency graphs for Java and Python projects, how such accuracy has changed over time, and what the likely pitfalls and limitations of the dependency graph are. After creating statistically significant samples of Java and Python projects, we analyzed their dependency graph in two directions, forward (by looking at dependencies), backward (by looking at dependents), and inspected their manifest/lock files. Results indicate that in our sample, dependencies have over 27% of inaccuracy, and dependents up to 10%. Errors depend on several reasons, among others, an oversimplified processing of manifest/lock files by the dependency graph generator. Our results provide (i) guidelines for researchers to understand the threats arising in studies based on the dependency graph and (ii) insights to practitioners and tool builders to enhance their SCA, given the current limitations of the dependency graph. • We show that GitHub dependency graph is inaccurate (≃ 20% errors in dependencies and ≃ 10 errors in dependents). • We report a qualitative categorization of dependency graph inaccuracies root causes. • The findings of our research warn about the accuracy of tools and studies leveraging the dependency graph. [ABSTRACT FROM AUTHOR]

Copyright of Information & Software Technology is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)