Treffer: Improving GBDT performance on imbalanced datasets: An empirical study of class-balanced loss functions.

Title:
Improving GBDT performance on imbalanced datasets: An empirical study of class-balanced loss functions.
Authors:
Luo, Jiaqi1 (AUTHOR) jqluo@suda.edu.cn, Yuan, Yuan2 (AUTHOR) y.yuan@dukekunshan.edu.cn, Xu, Shixin1,2 (AUTHOR) shixin.xu@dukekunshan.edu.cn
Source:
Neurocomputing. Jun2025, Vol. 634, pN.PAG-N.PAG. 1p.
Database:
Academic Search Index

Weitere Informationen

Class imbalance poses a persistent challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models are widely regarded as state-of-the-art for these tasks, their effectiveness diminishes in the presence of imbalanced datasets. This paper is the first to comprehensively explore the integration of class-balanced loss functions into three popular GBDT algorithms, addressing binary, multi-class, and multi-label classification. We present a novel benchmark, derived from extensive experiments across diverse datasets, to evaluate the performance gains from class-balanced losses in GBDT models. Our findings establish the efficacy of these loss functions in enhancing model performance under class imbalance, providing actionable insights for practitioners tackling real-world imbalanced data challenges. To bridge the gap between research and practice, we introduce an open-source Python package that simplifies the application of class-balanced loss functions within GBDT workflows, democratizing access to these advanced methodologies. The code is available at https://github.com/Luojiaqimath/ClassbalancedLoss4GBDT. [ABSTRACT FROM AUTHOR]