Treffer: An Empirical Study on the Effects of Jayvee, a Domain‐Specific Language for Data Engineering, on Understanding Data Pipeline Architectures.
Weitere Informationen
A large part of data science projects is spent on data engineering. Especially in open data contexts, data quality issues are prevalent and are often tackled by non‐professional programmers. We introduce and evaluate Jayvee, a domain‐specific language for data engineering aimed at reducing barriers to building data pipelines. We show that a structured DSL can have positive effects on speed, ease of use, and quality for data engineering by non‐professional developers. For this, we present an empirical quantitative study, in which we compare the performance of students as proxies for non‐professional programmers using Jayvee with Python and Pandas. We search for reasons for the empirical findings using a follow‐up interview study on how using a DSL changes how non‐professional programmers build data pipelines. Participants solve a subset of tasks faster, more easily, and with higher quality when using Jayvee compared to Python. Interviewees describe tradeoffs regarding the DSL's more limited features, stricter code structure, and explicit descriptions. Jayvee is found to be more approachable, which leads to a more guided development flow. New data engineering languages should provide good tooling and documentation, plan how to visualize intermediate data and consider new development workflows involving tools like ChatGPT to find adoption. [ABSTRACT FROM AUTHOR]
Copyright of Software: Practice & Experience is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)