Result: Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6–23 months in Ethiopia.

Title:
Data-driven machine learning algorithm model for pneumonia prediction and determinant factor stratification among children aged 6–23 months in Ethiopia.
Authors:
Demsash, Addisalem Workie1 (AUTHOR) addisalemworkie599@gmail.com, Abebe, Rediet1 (AUTHOR), Gezimu, Wubishet2 (AUTHOR), Kitil, Gemeda Wakgari2 (AUTHOR), Tizazu, Michael Amera1 (AUTHOR), Lambebo, Abera1 (AUTHOR), Bekele, Firomsa3 (AUTHOR), Alemu, Solomon Seyife4 (AUTHOR), Jarso, Mohammedamin Hajure4 (AUTHOR), Dube, Geleta Nenko2 (AUTHOR), Wedajo, Lema Fikadu3 (AUTHOR), Purohit, Sanju5,6 (AUTHOR), Kalayou, Mulugeta Hayelom7 (AUTHOR)
Source:
BMC Infectious Diseases. 6/26/2025, Vol. 25 Issue 1, p1-19. 19p.
Database:
Academic Search Index

Further Information

Introduction: Pneumonia is the leading cause of child morbidity and mortality and accounts for 5.6 million under-five child deaths. Pneumonia has a significant impact on the quality of life, the country's economy, and the survival of children. Therefore, this study aimed to develop data-driven predictive model using machine learning algorithms to predict pneumonia and stratify the determinant factors among children aged 6–23 months in Ethiopia. Methods: A total of 2035 samples of children were used from the 2016 Ethiopian Demographic and Health Survey dataset. Jupyter Notebook from Anaconda Navigators was used for data management and analysis. Important libraries such as Pandas, Seaborn, and Numpy were imported from Python. The data was pre-processed into a training and testing dataset with a 4:1 ratio, and tenfold cross-validation was used to reduce bias and enhance the models' performance. Six machine learning algorithms were used for model building and comparison, and confusion matrix elements were used to evaluate the performance of each algorithm. Principal component analysis and heatmap function were used for correlation detection between features. Feature importance score was used to identify and stratify the most important predictors of pneumonia. Results: From 2035 total samples, 16.6%, 20.1%, and 24.2% of children had short rapid breath, fever, and cough respectively. The overall magnitude of pneumonia among children aged 6–23 months was 31.3% based on the 2016 EDHS report. A random forest algorithm is the relatively best performance model to predict pneumonia and stratify its determinates with 91.3% accuracy. The health facility visits, child sex, initiation of breastfeeding, birth interval, birth weight, husbands' education, women's age, and region, are the top eight important predictors of pneumonia among children with important scores of more than 5% to 20% respectively. Conclusions: Random forest is the best model to predict pneumonia and stratify its determinant factors. The implications of this study are profound for advanced research methodology, tailored to promote effective health interventions such as lifestyle modification and behavioral intervention, based on individuals' unique features, specifically for stakeholders to take proactive childcare interventions. The study would serve as pioneering evidence for future research, and researchers are recommended to use deep learning algorithms to enhance prediction accuracy. [ABSTRACT FROM AUTHOR]