Treffer: WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis
Weitere Informationen
This repository houses the official replication package for the paper titled "WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis". The package contains the following components: Dataset: The dataset used in this study includes WebAssembly (wasm) binaries compiled by SnowWhite, which were used to generate two path-based code representations. The dataset was processed using our pipeline, resulting in a new dataset that we employed for training our models. This replication package comprises both our new dataset and SnowWhite's dataset. Pipeline: Our pipeline has been designed to extract path sequences from Wasm binaries. We implemented our pipeline using Rust and Python. Data cleaning: These scripts enable the splitting of the dataset into different variants and the creation of different input sequences. Training notebooks: We have included two Jupyter notebooks, one for training a feed-forward neural network for creating code embeddings for method names, and the other for training seq2seq models with five different variants of input sequences. Models: This section includes the weights of the seq2seq models trained using OpenNMT and the feedforward neural network used to generate the code embeddings. Results: The log files in this section contain the evaluation results of our models, including prediction accuracy scores, BLEU scores, and other evaluation metrics. For more info, see README.md