Treffer: Optimizing Pre-Trained Code Embeddings With Triplet Loss for Code Smell Detection

Title:

Optimizing Pre-Trained Code Embeddings With Triplet Loss for Code Smell Detection

Authors:

Ali Nizam, Ertugrul Islamoglu, Omer Kerem Adali, Musa Aydin

Source:

IEEE Access, Vol 13, Pp 31335-31350 (2025)

Publisher Information:

IEEE, 2025.

Publication Year:

2025

Collection:

LCC:Electrical engineering. Electronics. Nuclear engineering

Subject Terms:

Code embedding, contrastive learning, triplet loss, code smell detection, Electrical engineering. Electronics. Nuclear engineering, TK1-9971

Document Type:

Fachzeitschrift article

File Description:

electronic resource

Language:

English

ISSN:

2169-3536

Relation:

https://ieeexplore.ieee.org/document/10890964/; https://doaj.org/toc/2169-3536

DOI:

10.1109/ACCESS.2025.3542566

Access URL:

https://doaj.org/article/9fe6b8a447d3493090bfb60feea8039c

Accession Number:

edsdoj.9fe6b8a447d3493090bfb60feea8039c

Database:

Directory of Open Access Journals

Weitere Informationen

Code embedding represents code semantics in vector form. Although code embedding-based systems have been successfully applied to various source code analysis tasks, further research is required to enhance code embedding for better code analysis capabilities, aiming to surpass the performance and functionality of static code analysis tools. In addition, standard methods for improving code embedding are essential to develop more effective embedding-based systems, similar to augmentation techniques in the image processing domain. This study aims to create a contrastive learning-based system to explore the potential of a generic method for enhancing code embedding for code classification tasks. A triplet loss-based deep learning network is designed to optimize in-class similarity and increase the distance between classes. An experimental dataset that contains code from Java, Python, and PHP programming languages and 4 different code smells is created by collecting code from open-source repositories on GitHub. We evaluate the proposed system’s effectiveness with widely used BERT, CodeBERT, and GraphCodeBERT pretrained models to create code embedding for the code classification task of code smell detection. Our findings indicate that the proposed system may offer improvements in accuracy, an average of 8% and a maximum of 13% for models. These results suggest that incorporating contrastive learning techniques into the generation process of code representation as a preprocessing step can enhance performance in code analysis.

Treffer: Optimizing Pre-Trained Code Embeddings With Triplet Loss for Code Smell Detection

Weitere Informationen

Links

Zusatz-Funktionen