Result: Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction

Title:
Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction
Source:
Expert systems with applications. 42(3):1074-1082
Publisher Information:
Amsterdam: Elsevier, 2015.
Publication Year:
2015
Physical Description:
print, 3/4 p
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Recherche operationnelle. Gestion, Operational research. Management science, Recherche opérationnelle et modèles formalisés de gestion, Operational research and scientific management, Modèles d'entreprises, Firm modelling, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Intelligence artificielle, Artificial intelligence, Analyse coût, Cost analysis, Análisis costo, Analyse donnée, Data analysis, Análisis datos, Apprentissage supervisé, Supervised learning, Aprendizaje supervisado, Apprentissage(intelligence artificielle), Learning (artificial intelligence), Classification, Clasificación, Distribution donnée, Data distribution, Distribución dato, Echantillonnage, Sampling, Muestreo, Erreur moyenne, Mean error, Error medio, Faillite, Bankruptcy, Quiebra, Géométrie algorithmique, Computational geometry, Geometría computacional, Intelligence artificielle, Artificial intelligence, Inteligencia artificial, Modèle agrégé, Aggregate model, Modelo agregado, Moyenne géométrique, Geometric mean, Media geométrica, Prévision, Forecasting, Previsión, Régime déséquilibré, Unbalanced conditions, Régimen desequilibrado, Signal faible, Small signal, Señal débil, Taux erreur, Error rate, Indice error, AdaBoost, Bankruptcy prediction, Cost-sensitive boosting, Data imbalance, GMBoost, Over-sampling, SMOTE
Document Type:
Academic journal Article
File Description:
text
Language:
English
Author Affiliations:
School of Business, Pusan National University, 63 Beon-gil 2, Busandaehag-ro, Geumjeong-gu, Busan 609-735, Korea, Republic of
Department of Computer and Information Engineering, Dongseo University, 47, Churye-Ro, Sasang-Gu, Busan 617-716, Korea, Republic of
Division of Business, Dongseo University, 47, Churye-Ro, Sasang-Gu, Busan 617-716, Korea, Republic of
ISSN:
0957-4174
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems

Operational research. Management
Accession Number:
edscal.28928438
Database:
PASCAL Archive

Further Information

In classification or prediction tasks, data imbalance problem is frequently observed when most of instances belong to one majority class. Data imbalance problem has received considerable attention in machine learning community because it is one of the main causes that degrade the performance of classifiers or predictors. In this paper, we propose geometric mean based boosting algorithm (GMBoost) to resolve data imbalance problem. GMBoost enables learning with consideration of both majority and minority classes because it uses the geometric mean of both classes in error rate and accuracy calculation. To evaluate the performance of GMBoost, we have applied GMBoost to bankruptcy prediction task. The results and their comparative analysis with AdaBoost and cost-sensitive boosting indicate that GMBoost has the advantages of high prediction power and robust learning capability in imbalanced data as well as balanced data distribution.