A Data-Mining Model for Predicting Low Birth Weight with a High AUC

  • Uzapi HangeEmail author
  • Rajalakshmi Selvaraj
  • Malatsi Galani
  • Keletso Letsholo
Part of the Studies in Computational Intelligence book series (SCI, volume 719)


Birth weight is a significant determinant of a newborn’s probability of survival. Data-mining models are receiving considerable attention for identifying low birth weight risk factors. However, prediction of actual birth weight values based on the identified risk factors, which can play a significant role in the identification of mothers at the risk of delivering low birth weight infants, remains unsolved. This paper presents a study of data-mining models that predict the actual birth weight, with particular emphasis on achieving a higher area under the receiver operating characteristic (AUC). The prediction is based on birth data from the North Carolina State Center for Health Statistics of 2006. The steps followed to extract meaningful patterns from the data were data selection, handling missing values, handling imbalanced data, model building, feature selection, and model evaluation. Decision trees were used for classifying birth weight and tested on the actual imbalanced dataset and the balanced dataset using synthetic minority oversampling technique (SMOTE). The results highlighted that models built with balanced datasets using the SMOTE algorithm produce a relatively higher AUC compared to models built with imbalanced datasets. The J48 model built with balanced data outperformed REPTree and Random tree with an AUC of 90.3%, and thus it was selected as the best model. In conclusion, the feasibility of using J48 in birth weight prediction would offer the possibility to reduce obstetric-related complications and thus improving the overall obstetric health care.


Birth weight Low birth weight Data-mining SMOTE Imbalanced dataset 


  1. 1.
    Reichman, N.E.: Low birth weight and school readiness. Future Child. 15(1), 91–116 (2005)CrossRefGoogle Scholar
  2. 2.
    United Nations Children’s Fund and World Health Organization: Low birth weight, country regional and global estimates (2004)Google Scholar
  3. 3.
    Yadav, H., Lee, N.: Maternal factors in predicting low birth weight babies. Med. J. Malays. 68(1), 44–47 (2013)Google Scholar
  4. 4.
    Senthilkumar, D., Paulraj, S.: Prediction of low birth weight infants and its risk factors using data mining techniques. In: Proceedings of the 2015 International Conference on Industrial Engineering and Operations Management, pp. 186–194 (2015)Google Scholar
  5. 5.
    Shittu, A.S., Kuti, O., Orji, E.O., Makinde, N.O., Ogunniyi, S.O., Ayoola, O.O., Sule, S.S.: Clinical versus sonographic estimation of foetal weight in Southwest Nigeria. J Heal. Popul. Nutr. 25(1), 14–23 (2007)Google Scholar
  6. 6.
    Desalegn, B.: Predicting Low Birth Weight Using Data Mining Techniques on Ethiopia Demographic and Health Survey Data Sets. Addis Ababa University (2011)Google Scholar
  7. 7.
    Salomon, L.J., Bernard, J.P., Ville, Y.: Estimation of fetal weight: reference range at 20–36 weeks’ gestation and comparison with actual birth-weight reference range. Ultrasound Obs. Gynecol. 29, 550–555 (2007)CrossRefGoogle Scholar
  8. 8.
    Torloni, M.R., Sass, N., Sato, J.L., Renzi, A.C.P., Fukuyama, M., de Lucca, P.R.: Clinical formulas, mother’ s opinion and ultrasound in predicting birth weight. Sao Paulo Med. J. 126(3), 145–149 (2008)CrossRefGoogle Scholar
  9. 9.
    Soni, J., Ansari, U., Sharma, D., Soni, S.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011)Google Scholar
  10. 10.
    Catley, C., Frize, M., Walker, C.R., Petriu, D.C.: Predicting high-risk preterm birth using artificial neural networks. IEEE Trans. Inf Technol. Biomed. 10(3), 540–549 (2006)CrossRefGoogle Scholar
  11. 11.
    Tefera, M.: Application of Data Mining to Predict Urinary Fistula Surgical Repair Outcome. Addis Ababa University (2012)Google Scholar
  12. 12.
    Kaur, H., Wasan, S.K.: Empirical study on applications of data mining techniques in healthcare. J. Comput. Sci. 2(2), 194–200 (2006)CrossRefGoogle Scholar
  13. 13.
    Jeyarani, D.S., Anushya, G., Rajeswari, R.R., Pethalakshmi, A.: A comparative study of decision tree and Naive Bayesian classifiers on medical datasets. Int. J. Comput. Appl. 5–7 (2013)Google Scholar
  14. 14.
    Gupta, S., Kumar, D., Sharma, A.: Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J. Comput. Sci. Eng. 2(2), 188–195 (2011)Google Scholar
  15. 15.
    Yahia, M.E., El-taher, M.E.: A new approach for evaluation of data mining techniques. Int. J. Comput. Sci. Inf. Issues 7(5), 181–186 (2010)Google Scholar
  16. 16.
    Marshall, G., Tapia, J.L., Ivonne, D., Grandi, C., Barros, C., Alegria, A., Standen, J., Panizza, R., Bancalari, A., Lacarruba, J., Fabres, J.: A new score for predicting neonatal very low birth weight mortality risk in the NEOCOSUR south American network. J. Perinatol. 25, 577–582 (2005)CrossRefGoogle Scholar
  17. 17.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  18. 18.
    Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11(51), 1–13 (2011)Google Scholar
  19. 19.
    Taft, L.M., Evans, R.S., Shyu, C.R., Egger, M.J., Chawla, N., Mitchell, J.A., Thornton, S.N., Bray, B., Varner, M.: Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. J. Biomed. Inform. 42, 356–364 (2009)CrossRefGoogle Scholar
  20. 20.
    Kumar, V., Minz, S.: Feature selection: a literature review. Smart Comput. Rev. 4(3), 211–229 (2014)CrossRefGoogle Scholar
  21. 21.
    Setiono, R.: Feature selection : an ever evolving frontier in data mining. In: JMLR: Workshop and Conference Proceedings, pp. 4–13 (2010)Google Scholar
  22. 22.
    Lakshmi, K.R., Kumar, S.P.: Utilization of data mining techniques for prediction of diabetes disease survivability. Int. J. Sci. Eng. Res. 4(6), 933–942 (2013)Google Scholar
  23. 23.
    Mazid, M.M., Ali, A.B.M.S., Tickle, K.S.: Improved C4.5 Algorithm for Rule Based ClassificationGoogle Scholar
  24. 24.
    Ravichandran, S., Srinivasan, V.B., Ramasamy, C.: Comparative study on decision tree techniques for mobile call detail record. J. Commun. Comput. 9, 1331–1335 (2012)Google Scholar
  25. 25.
    Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002)CrossRefGoogle Scholar
  26. 26.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  27. 27.
    Tanner, L., Schreiber, M., Low, J.G.H., Ong, A., Tolfvenstam, T., Lai, Y.L., Ng, L.C., Leo, Y.S., Puong, L.T., Vasudevan, S.G., Simmons, C.P., Martin, L., Ooi, E.E.: Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl. Trop. Dis. 2(3) (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Uzapi Hange
    • 1
    Email author
  • Rajalakshmi Selvaraj
    • 1
  • Malatsi Galani
    • 1
  • Keletso Letsholo
    • 1
  1. 1.Department of Computer Science & Information SystemsBotswana International University of Science and TechnologyPalapyeBotswana

Personalised recommendations