Treffer: Finding Synergy between Software Repositories using README Files
Weitere Informationen
Software version control platforms, such as GitHub, host millions of open-source software repositories. Due to their diversity, these repositories are an appealing realm for discovering software trends, and fostering collaboration among programmers and researchers. Users can already explore repositories by searching via keyword queries, enabling them to fetch repositories that relatively "do something". Furthermore, collaboration on platforms such as GitHub has been tackled previously by developing recommendation systems focusing on similarities between software features. In our work, we seek to discover 'synergies' between software projects by exploiting their similar as well as complementary features. To this end, we quantify the synergy score between two repositories by employing a novel approach inspired by Literature-Based-Discovery (LBD). LBD was originally developed to uncover 'implicit' knowledge in scientific literature databases by linking different information sources. We apply it here to uncover synergy in software projects by linking software features. More precisely, our approach contains three sequential steps: (1) Extracting each repository's software features from its 'readme' file using existing natural language processing techniques, (2) modeling features of the readme files' sections extracted in step one using topic modeling algorithms, and 3) detecting pairs of software repositories that bear 'synergy' based on the combination of their 'features' resulting from step two. We conduct our experiments on 13,264 GitHub (open-source) Python projects using several pair ranking approaches. Our Human rating evaluation, on a subset of 90 repository pairs, shows that our developed models significantly outperform the random baseline at p < 0.001 with a medium effect size, r.