Treffer: Is machine learning really effective in detecting corporate fraud?
0737-4607
Weitere Informationen
PurposeThis study aims to evaluate the effectiveness of machine learning (ML) in detecting accounting fraud among Chinese-listed firms from 2007 to 2022. It aims to determine whether advanced ML techniques outperform traditional logistic regression models in accuracy, precision and practical applicability for fraud detection.Design/methodology/approachThe research analyzes a dataset of 48,746 firm-year observations, including 6,790 instances of fraud. Employing nine ML models (e.g. Random Forest, RUSBoost and LightGBM) and traditional logistic regression, the study uses SAS Visualization and Python for variable selection and model construction. It evaluates model performance with metrics such as AUC, precision, recall, F1 score and net benefit under different data processing scenarios.FindingsResults are mixed. While Random Forest, LightGBM and RUSBoost models exhibit superior AUC and F1 scores, none achieve a precision rate above 0.10, indicating high false-positive rates. The low precision rate significantly limits their practical value for regulators, investors and professionals such as analysts and auditors. Logistic regression and support vector machine models often achieve higher recall rates, suggesting traditional approaches remain competitive in identifying fraudulent firms.Originality/valueThis study highlights limitations in the practical utility of ML for corporate fraud detection due to low precision rates and false positives. It contributes a nuanced understanding of ML’s role in accounting research, emphasizing the need for integrating qualitative data and improving model precision for real-world application. Furthermore, it offers new insights into using SAS Visualization and random data-splitting methods in fraud prediction.