Treffer: Non-technical loss detection in power distribution networks using machine learning.
Sci Rep. 2024 Oct 18;14(1):24489. (PMID: 39424849)
Sci Rep. 2025 Jan 8;15(1):1277. (PMID: 39779779)
Weitere Informationen
Non-technical losses (NTL) in power distribution, such as illegal meter tapping, cause significant financial losses for utilities, amounting to billions annually. This study evaluates various machine learning methods for NTL detection, addressing the challenge of imbalanced electricity consumption data. Seven techniques for data balancing were employed: Adaptive Synthetic Sampling (ADASYN), Random Over Sampling, Random Under Sampling, Near Miss Under Sampling, and several variations of Synthetic Minority Over Sampling (SMOTE), including Borderline-SMOTE, SMOTE-ENN, and SMOTE-Tomek links. The model comprises two stages: first, seven classification algorithms (Decision Tree, Logistic Regression, XGBoost, Random Forest, SVM, Naïve Bayes, and KNN) were tested across diverse training-testing ratios to identify optimal performance. The second stage applied the comprehensive consumption dataset along with data balancing techniques to improve algorithm efficacy. Performance metrics-accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC)-were utilized for evaluation. Results revealed that the Random Forest algorithm, when paired with Random Over Sampling at a 70 - 30% training-testing ratio, yielded the highest metrics: 98.03% accuracy, 99.02% precision, surpassing existing literature. The model achieved exceptional precision (0.990) and the highest overall performance, with rigorous statistical testing confirming all improvements were significant at the 95% confidence level.
(© 2025. The Author(s).)
Declarations. Competing interests: The authors declare no competing interests. Consent for publication: The authors have full consent for publication. Materials availability: Available at author’s request. Software and packages used: The Scipy package in Python provides a range of statistical techniques, such as hypothesis testing, probability distributions, and correlation analysis. Moreover, statistical modelling, data analysis, manipulation, and date and time handling are performed using the Statsmodels, NumPy, and Pandas packages and the DateTime package in Python.