Treffer: Unsupervised Translation of Programming Languages ; Traduction Non Supervisée de Langages de Programmation

Title:

Unsupervised Translation of Programming Languages ; Traduction Non Supervisée de Langages de Programmation

Authors:

Roziere, Baptiste

Contributors:

Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision (LAMSADE), Université Paris Dauphine-PSL, Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Université Paris sciences et lettres, Tristan Cazenave

Source:

https://theses.hal.science/tel-03852612 ; Neural and Evolutionary Computing [cs.NE]. Université Paris sciences et lettres, 2022. English. ⟨NNT : 2022UPSLD015⟩.

Publisher Information:

CCSD

Publication Year:

2022

Collection:

Université Paris-Dauphine: HAL

Subject Terms:

Transcompilation, Programming languages, Program synthesis, Translation, Neural networks, Deep learning, Langages de programmation, Synthèse de code, Traduction, Réseaux de neurones, Apprentissage profond, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]

Document Type:

Dissertation doctoral or postdoctoral thesis

Language:

English

Relation:

NNT: 2022UPSLD015

Availability:

https://theses.hal.science/tel-03852612
https://theses.hal.science/tel-03852612v1/document
https://theses.hal.science/tel-03852612v1/file/2022UPSLD015.pdf

Rights:

info:eu-repo/semantics/OpenAccess

Accession Number:

edsbas.BA64F75B

Database:

BASE

Weitere Informationen

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is time-consuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this thesis, we propose methods to train effective and fully unsupervised neural transcompilers.Natural language translators are evaluated with metrics based on token co-occurences between the translation and the reference. We identify that they do not capture the semantics of programming languages. Hence, we build and release a test set composed of 852 parallel functions, along with unit tests to check the semantic correctness of translations. We first leverage objectives designed for natural languages to learn multilingual representations of source code, and train a model to translate, using source code from open source GitHub projects. This model outperforms rule-based methods for translating functions between C++, Java, and Python. Then, we develop an improved pre-training method, which leads the model to learn deeper semantic representations of source code. It results in enhanced performances on several tasks including unsupervised code translation. Finally, we use automated unit tests to automatically create examples ...

Treffer: Unsupervised Translation of Programming Languages ; Traduction Non Supervisée de Langages de Programmation

Weitere Informationen

Links

Zusatz-Funktionen