Treffer: Beyond One-to-One: A Benchmark Dataset for One-to-Many Issue-Commit Links

Title:
Beyond One-to-One: A Benchmark Dataset for One-to-Many Issue-Commit Links
Publisher Information:
Zenodo
Publication Year:
2025
Collection:
Zenodo
Document Type:
dataset
Language:
English
DOI:
10.5281/zenodo.15524784
Rights:
Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number:
edsbas.6870ADE3
Database:
BASE

Weitere Informationen

Recovering missing links between issues and commits is crucial for effective software traceability, maintenance, and understanding how code evolves over time. While existing research has predominantly focused on recovering one-to-one issue-commit links, real-world software development often follows a one-to-many pattern, where a single issue is resolved across multiple commits. This one-to-many pattern has largely been overlooked, making it harder for existing automated methods to work effectively in real development scenarios. In this study, we introduce a large-scale dataset specifically built for one-to-many issue-commit link recovery, spanning projects written in four widely-used programming languages: Java, C++, Python, and JavaScript. To accurately assess model performance in the one-to-many setting, we propose an issue-wise evaluation strategy that moves beyond traditional link-level metrics, better reflecting the need to recover complete sets of relevant commits. We benchmark a diverse set of state-of-the-art models, including rule-based, machine learning, deep learning, and transformer-based approaches. Our results reveal that, using a balanced dataset, existing models achieve only moderate performance, with the highest F1-score reaching 73%, underscoring the need for more effective models tailored for one-to-many link recovery. We publicly release our dataset and evaluation framework to support future research in building more robust and effective models for one-to-many link recovery.