Treffer: Static test flakiness prediction: How Far Can We Go?

Title:
Static test flakiness prediction: How Far Can We Go?
Authors:
Pontillo V; Software Engineering (SeSa) Lab - Department of Computer Science, University of Salerno, Fisciano, Italy., Palomba F; Software Engineering (SeSa) Lab - Department of Computer Science, University of Salerno, Fisciano, Italy., Ferrucci F; Software Engineering (SeSa) Lab - Department of Computer Science, University of Salerno, Fisciano, Italy.
Source:
Empirical software engineering [Empir Softw Eng] 2022; Vol. 27 (7), pp. 187. Date of Electronic Publication: 2022 Oct 01.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Springer US Country of Publication: United States NLM ID: 101769304 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1573-7616 (Electronic) Linking ISSN: 13823256 NLM ISO Abbreviation: Empir Softw Eng Subsets: PubMed not MEDLINE
Imprint Name(s):
Publication: <2005-> : [New York] : Springer US
Original Publication: [Dordrecht] : Kluwer Academic Publishers
References:
Nat Biotechnol. 2006 Dec;24(12):1565-7. (PMID: 17160063)
Empir Softw Eng. 2022;27(7):187. (PMID: 36199835)
Contributed Indexing:
Keywords: Flaky tests; Machine learning; Software testing
Entry Date(s):
Date Created: 20221006 Latest Revision: 20221122
Update Code:
20250114
PubMed Central ID:
PMC9526694
DOI:
10.1007/s10664-022-10227-1
PMID:
36199835
Database:
MEDLINE

Weitere Informationen

Test flakiness is a phenomenon occurring when a test case is non-deterministic and exhibits both a passing and failing behavior when run against the same code. Over the last years, the problem has been closely investigated by researchers and practitioners, who all have shown its relevance in practice. The software engineering research community has been working toward defining approaches for detecting and addressing test flakiness. Despite being quite accurate, most of these approaches rely on expensive dynamic steps, e.g., the computation of code coverage information. Consequently, they might suffer from scalability issues that possibly preclude their practical use. This limitation has been recently targeted through machine learning solutions that could predict the flakiness of tests using various features, like source code vocabulary or a mixture of static and dynamic metrics computed on individual snapshots of the system. In this paper, we aim to perform a step forward and predict test flakiness only using static metrics . We propose a large-scale experiment on 70 Java projects coming from the iDFlakies and FlakeFlagger datasets. First, we statistically assess the differences between flaky and non-flaky tests in terms of 25 test and production code metrics and smells, analyzing both their individual and combined effects. Based on the results achieved, we experiment with a machine learning approach that predicts test flakiness solely based on static features, comparing it with two state-of-the-art approaches. The key results of the study show that the static approach has performance comparable to those of the baselines. In addition, we found that the characteristics of the production code might impact the performance of the flaky test prediction models.
(© The Author(s) 2022.)

Competing interestsThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.