Treffer: The Role of Customer Segmentation in Churn Prediction

Title:
The Role of Customer Segmentation in Churn Prediction
Authors:
Publisher Information:
Stockholms universitet, Institutionen för data- och systemvetenskap 2025
Document Type:
E-Ressource Electronic Resource
Availability:
Open access content. Open access content
info:eu-repo/semantics/openAccess
Note:
application/pdf
English
Other Numbers:
UPE oai:DiVA.org:su-243745
1525885519
Contributing Source:
UPPSALA UNIV LIBR
From OAIster®, provided by the OCLC Cooperative.
Accession Number:
edsoai.on1525885519
Database:
OAIster

Weitere Informationen

Introduction This thesis explores how clustering-based customer segmentation can enhance both the predictive performance and interpretability of churn prediction models in the online gaming industry. As acquiring new players is costlier than retaining existing ones, understanding and anticipating customer churn is vital for long-term profitability in this highly competitive, non-contractual environment. Research Question The central research question is: How can different customer segmentation methods, combined with daily behavioral and financial data, be used to develop and evaluate churn prediction models in the online gaming industry, and how do these segmentation approaches affect both model performance and interoperability? Method Using a design science research approach, I built a modular churn prediction pipeline that incorporated five segmentation methods: threshold-based segmentation, k-means, agglomerative clustering, Gaussian Mixture Models and HDBSCAN. Each was compared to a baseline with no segmentation, either as a feature in a single XGBoost model or through separate models per segment. The pipeline included behavioral, financial and trend-based features, SMOTE for class imbalance and SHAP for segment-level interpretability. All modeling was performed in Python using libraries such as scikit-learn, XGBoost and SHAP. Results Segmentation led to only minor improvements in predictive performance, with AUC and F1-score differences typically under 0.01 and no statistically significant gains in the single-model setup. However, segment-specific models revealed meaningful differences in churn behavior. HDBSCAN reached the highest accuracy of 0.8856 and AUC of 0.9366, although its interpretability was reduced by a large noise cluster. K-means and agglomerative clustering provided a more balanced trade-off between interpretability and performance, while threshold-based segmentation delivered intuitive business-aligned clusters. Feature importance analysis confi