Treffer: Mapping Python Programs to Vectors Using Recursive Neural Encodings

Title:
Mapping Python Programs to Vectors Using Recursive Neural Encodings
Language:
English
Authors:
Paassen, Benjamin (ORCID 0000-0002-3899-2450), McBroom, Jessica, Jeffries, Bryn (ORCID 0000-0002-5981-4426), Koprinska, Irena, Yacef, Kalina (ORCID 0000-0001-7521-6429)
Source:
Journal of Educational Data Mining. 2021 13(3):1-35.
Availability:
International Educational Data Mining. e-mail: jedm.editor@gmail.com; Web site: https://jedm.educationaldatamining.org/index.php/JEDM
Peer Reviewed:
Y
Page Count:
35
Publication Date:
2021
Document Type:
Fachzeitschrift Journal Articles<br />Reports - Research
Education Level:
Higher Education
Postsecondary Education
ISSN:
2157-2100
Entry Date:
2022
Accession Number:
EJ1320641
Database:
ERIC

Weitere Informationen

Educational data mining involves the application of data mining techniques to student activity. However, in the context of computer programming, many data mining techniques can not be applied because they require vector-shaped input, whereas computer programs have the form of syntax trees. In this paper, we present ast2vec, a neural network that maps Python syntax trees to vectors and back, thereby enabling about a hundred data mining techniques that were previously not applicable. Ast2vec has been trained on almost half a million programs of novice programmers and is designed to be applied across learning tasks "without re-training," meaning that users can apply it without any need for deep learning. We demonstrate the generality of ast2vec in three settings. First, we provide example analyses using ast2vec on a classroom-sized dataset, involving two novel techniques, namely progress-variance projection for visualization and a dynamical systems analysis for prediction. In these examples, we also explain how ast2vec can be utilized for educational decisions. Second, we consider the ability of ast2vec to recover the original syntax tree from its vector representation on the training data and two other large-scale programming datasets. Finally, we evaluate the predictive capability of a linear dynamical system on top of ast2vec, obtaining similar results to techniques that work directly on syntax trees while being much faster (constant- instead of linear-time processing). We hope ast2vec can augment the educational data mining toolkit by making analyses of computer programs easier, richer, and more efficient.

As Provided