Treffer: Identifying multi-parameter constraint errors in Python data science library API documentation

Title:

Identifying multi-parameter constraint errors in Python data science library API documentation

Authors:

Xu, Xiufeng, Xie, Fuman, Zhu, Chenguang, Bai, Guangdong, Khurshid, Sarfraz, Li, Yi

Contributors:

College of Computing and Data Science, Alibaba-NTU Global e-Sustainability CorpLab (ANGEL)

Publisher Information:

Association for Computing Machinery (ACM)

Publication Year:

2025

Collection:

DR-NTU (Digital Repository at Nanyang Technological University, Singapore)

Subject Terms:

Computer and Information Science, API Documentation, Symbolic Execution

Document Type:

Fachzeitschrift article in journal/newspaper

File Description:

application/pdf

Language:

English

Relation:

RG12/23; I2301E0026; Proceedings of the ACM on Software Engineering; https://hdl.handle.net/10356/201462; ISSTA; 1536; 1558

DOI:

10.1145/3728945

Availability:

https://hdl.handle.net/10356/201462
https://doi.org/10.1145/3728945

Rights:

Accession Number:

edsbas.512B9938

Database:

BASE

Weitere Informationen

Modern AI- and Data-intensive software systems rely heavily on data science and machine learning libraries that provide essential algorithmic implementations and computational frameworks. These libraries expose complex APIs whose correct usage has to follow constraints among multiple interdependent parameters. Developers using these APIs are expected to learn about the constraints through the provided documentation and any discrepancy may lead to unexpected behaviors. However, maintaining correct and consistent multi-parameter constraints in API documentation remains a significant challenge for API compatibility and reliability. To address this challenge, we propose MPChecker for detecting inconsistencies between code and documentation, specifically focusing on multi-parameter constraints. MPChecker identifies these constraints at the code level by exploring execution paths through symbolic execution and further extracts corresponding constraints from documentation using large language models (LLMs). We propose a customized fuzzy constraint logic to reconcile the unpredictability of LLM outputs and detect logical inconsistencies between the code and documentation constraints. We collected and constructed two datasets from four popular data science libraries and evaluated MPChecker on them. Our tool identified 117 of 126 inconsistent constraints, achieving a recall of 92.8% and demonstrating its effectiveness at detecting inconsistency issues. We further reported 14 detected inconsistency issues to the library developers, who have confirmed 11 issues at the time of writing. ; Ministry of Education (MOE) ; Agency for Science, Technology and Research (A*STAR) ; Nanyang Technological University ; Published version ; This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 1 (RG12/23) and the RIE2025 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP)(Award I2301E0026), administered by A*STAR, as well as supported by Alibaba Group and NTU Singapore through ...

Treffer: Identifying multi-parameter constraint errors in Python data science library API documentation

Weitere Informationen

Links

Zusatz-Funktionen