Treffer: OppropBERT: An Extensible Graph Neural Network and BERT-style Reinforcement Learning-based Type Inference System

Title:
OppropBERT: An Extensible Graph Neural Network and BERT-style Reinforcement Learning-based Type Inference System
Authors:
Publisher Information:
University of Waterloo
Publication Year:
2022
Collection:
University of Waterloo, Canada: Institutional Repository
Document Type:
Dissertation master thesis
Language:
English
Accession Number:
edsbas.8544ECB2
Database:
BASE

Weitere Informationen

Built-in type systems for statically-typed programming languages (e.g., Java) can only prevent rudimentary and domain-specific errors at compile time. They do not check for type errors in other domains, e.g., to prevent null pointer exceptions or enforce owner-as-modifier encapsulation discipline. Optional Properties (Opprop) or Pluggable type systems, e.g., Checker Framework, provide a framework where users can specify type rules and guarantee the particular property holds with the help of a type checker. However, manually inserting these user-defined type annotations for new and existing large projects requires a lot of human effort. Inference systems like Checker Framework Inference provide a constraint-based whole-program inference framework. However, to develop such a system, the developer must thoroughly understand the underlying framework (Checker Framework) and the accompanying helper methods, which is time-consuming. Furthermore, these frameworks make expensive calls to SAT and SMT solvers, which increases the runtime overhead during inference. The developers write test cases to ensure their framework covers all the possible type rules and works as expected. Our core idea is to leverage only these manually written test cases to create a Deep Learning model to learn the type rules implicitly using a data-driven approach. We present a novel model, OppropBERT, which takes as an input the raw code along with its Control Flow Graphs to predict the error heatmap or the type annotation. The pre-trained BERT-style Transformer model helps encode the code tokens without specifying the programming language's grammar including the type rules. Moreover, using a custom masked loss function, the Graph Convolutional Network better captures the Control Flow Graphs. Suppose a sound type checker is already provided, and the developer wants to create an inference framework. In that case, the model, as mentioned above, can be refined further using a Proximal Policy Optimization (PPO)-based reinforcement learning (RL) ...