Treffer: Non-technical loss detection in power distribution networks using machine learning.

Title:
Non-technical loss detection in power distribution networks using machine learning.
Authors:
Abro SA; Department of Electrical Engineering Technology, Benazir Bhutto Shaheed University of Technology and Skill Development, Khairpur Mirs, 66020, Pakistan.; Department of Electrical Engineering, Quaid-e-Awam University of Engineering, Science and Technology, Nawabshah, 67450, Pakistan., Laghari JA; Department of Electrical Engineering, Quaid-e-Awam University of Engineering, Science and Technology, Nawabshah, 67450, Pakistan., Memon SA; Department of Defense Systems Engineering, Sejong University, Gwangjin-gu, Seoul, 05006, Republic of Korea. sufyanahmedali@sejong.ac.kr., Khan TA; Cybersecurity and Technological Convergence, Malaysian Institute of Information and Technology (MIIT), Universiti Kuala Lumpur, Kuala Lumpur, 50250, Malaysia. talha@unikl.edu.my., Memon I; Department of Computer Science, Shah Abdul Latif University, Shahdadkot campus, Shahdadkot, 77300, Imran, Pakistan., Nasir H; Computer Engineering Technology Section, Malaysian Institute of Information and Technology, Universiti Kuala Lumpur, Kuala Lumpur, 50250, Malaysia., Fatima K; Department of Electronic Engineering, Mehran University of Engineering and Technology, SZAB Campus, Khairpur Mirs 66020, Mehran, Pakistan.
Source:
Scientific reports [Sci Rep] 2025 Oct 16; Vol. 15 (1), pp. 36189. Date of Electronic Publication: 2025 Oct 16.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: PubMed not MEDLINE; MEDLINE
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
References:
Sci Rep. 2024 Oct 8;14(1):23368. (PMID: 39375370)
Sci Rep. 2024 Oct 18;14(1):24489. (PMID: 39424849)
Sci Rep. 2025 Jan 8;15(1):1277. (PMID: 39779779)
Contributed Indexing:
Keywords: Adaptive synthetic sampling (ADASYN); Decision tree; Extreme gradient boosting (XGBoost); Machine learning; Random forest; Random sampler
Entry Date(s):
Date Created: 20251016 Latest Revision: 20251019
Update Code:
20251019
PubMed Central ID:
PMC12533172
DOI:
10.1038/s41598-025-20048-z
PMID:
41102276
Database:
MEDLINE

Weitere Informationen

Non-technical losses (NTL) in power distribution, such as illegal meter tapping, cause significant financial losses for utilities, amounting to billions annually. This study evaluates various machine learning methods for NTL detection, addressing the challenge of imbalanced electricity consumption data. Seven techniques for data balancing were employed: Adaptive Synthetic Sampling (ADASYN), Random Over Sampling, Random Under Sampling, Near Miss Under Sampling, and several variations of Synthetic Minority Over Sampling (SMOTE), including Borderline-SMOTE, SMOTE-ENN, and SMOTE-Tomek links. The model comprises two stages: first, seven classification algorithms (Decision Tree, Logistic Regression, XGBoost, Random Forest, SVM, Naïve Bayes, and KNN) were tested across diverse training-testing ratios to identify optimal performance. The second stage applied the comprehensive consumption dataset along with data balancing techniques to improve algorithm efficacy. Performance metrics-accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC)-were utilized for evaluation. Results revealed that the Random Forest algorithm, when paired with Random Over Sampling at a 70 - 30% training-testing ratio, yielded the highest metrics: 98.03% accuracy, 99.02% precision, surpassing existing literature. The model achieved exceptional precision (0.990) and the highest overall performance, with rigorous statistical testing confirming all improvements were significant at the 95% confidence level.
(© 2025. The Author(s).)

Declarations. Competing interests: The authors declare no competing interests. Consent for publication: The authors have full consent for publication. Materials availability: Available at author’s request. Software and packages used: The Scipy package in Python provides a range of statistical techniques, such as hypothesis testing, probability distributions, and correlation analysis. Moreover, statistical modelling, data analysis, manipulation, and date and time handling are performed using the Statsmodels, NumPy, and Pandas packages and the DateTime package in Python.