Treffer: Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control

Title:
Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control
Source:
Proceedings of the ACM on Management of Data. 1:1-26
Publisher Information:
Association for Computing Machinery (ACM), 2023.
Publication Year:
2023
Document Type:
Fachzeitschrift Article
Language:
English
ISSN:
2836-6573
DOI:
10.1145/3626712
Rights:
CC BY
Accession Number:
edsair.doi...........16f7457732bd656226a4a7d378b3e15d
Database:
OpenAIRE

Weitere Informationen

Many big data systems are written in languages such as C, C++, Java, and Scala to process large amounts of data efficiently, while data analysts often use Python to conduct data wrangling, statistical analysis, and machine learning. User-defined functions (UDFs) are commonly used in these systems to bridge the gap between the two ecosystems. In this paper, we propose Udon, a novel debugger to support fine-grained debugging of UDFs. Udon encapsulates the modern line-by-line debugging primitives, such as the ability to set breakpoints, perform code inspections, and make code modifications while executing a UDF on a single tuple. It includes a novel debug-aware UDF execution model to ensure the responsiveness of the operator during debugging. It utilizes advanced state-transfer techniques to satisfy breakpoint conditions that span across multiple UDFs. It incorporates various optimization techniques to reduce the runtime overhead. We conduct experiments with multiple UDF workloads on various datasets and show its high efficiency and scalability.