Rensselaer Polytechnic Institute (RPI), Parallélisme de Kahn Synchrone (Parkas), Département d'informatique - ENS-PSL (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), IBM T. J. Watson Research Centre
Source:
ISSTA 2022 - 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Jul 2022, Virtual, South Korea. ⟨10.1145/3533767.3534400⟩
Machine-learning operators often have correctness constraints that cut across multiple hyperparameters and/or data. Violating these constraints causes the operator to raise runtime exceptions, but those are usually documented only informally or not at all. This paper presents the first interprocedural weakest-precondition analysis for Python to extract hyperparameter constraints. The analysis is mostly static, but to make it tractable for typical Python idioms in machine-learning libraries, it selectively switches to the concrete domain for some cases. This paper demonstrates the analysis by extracting hyperparameter constraints for 181 operators from a total of 8 ML libraries, where it achieved high precision and recall and found real bugs. Our technique advances static analysis for Python and is a step towards safer and more robust machine learning.