Predictive modeling of infant mortality


The Infant Mortality Rate (IMR) is defined as the number of infants for every thousand infants that do not survive until their first birthday. IMR is an important metric not only because it provides information about infant births in an area, but it also measures the general societal health status. In the United States of America, the IMR is higher than many other developed countries, despite the high level of prosperity. It is important to note here that the U.S.A. exhibits strong and persistent inequalities in the IMR across different racial and ethnic groups (Kochanek et al. in Natl Vital Stat Rep 65(4):1–122, 2006). In this paper, we study predictive models in the problem of infant mortality. We implement traditional machine learning models and state-of-the-art neural network models with various combinations of features extracted from birth certificates. Those combinations include features that can be summed as socio-economic and ethical features related to the mother and the father of the infant and medical measurements during the pregnancy and the delivery. We approach the classification problem of infant mortality, whether an infant will survive until her first birthday or not, both as binary and multi-class based on the time of death. We focus on understanding and exploring the importance of features extracted from the birth certificates. For example, we test the performance of models trained on the general population to models trained in subsets of the population, e.g., for individual races. We show in our experimental evaluation comparisons between different predictive models (including those used by epidemiology researchers), various combinations of features, different distributions in the training set and features’ importance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

  2. 2.


  1. Abrevaya J (2002) The effects of demographics and maternal behavior on the distribution of birth outcomes. In: Economic applications of quantile regression

  2. Acevedo-Garcia D, Soobader M, Berkman L (2007) Low birthweight among U.S. hispanic/latino subgroups: the effect of maternal foreign-born status and education. Soc Sci Med 65(12):2503–2516

    Article  Google Scholar 

  3. Acevedo-Garcia D, Soobader MJ, Berkman LF (2005) The differential effect of foreign-born status on low birth weight by race/ethnicity and education. Pediatrics 115(1):e20–e30

    Article  Google Scholar 

  4. Acevedo-Garcia D, Soobader MJ, Berkman LF (2007) Low birthweight among us hispanic/latino subgroups: the effect of maternal foreign-born status and education. Soc Sci Med 65(12):2503–2516

    Article  Google Scholar 

  5. Almond D, Chay KY, Lee DS (2005) The costs of low birth weight. Q J Econ 120:1031–1083

    Google Scholar 

  6. Callaghan WM, MacDorman MF, Rasmussen SA, Qin C, Lackritz EM (2006) The contribution of preterm birth to infant mortality rates in the united states. Pediatrics 118(4):1566–1573

    Article  Google Scholar 

  7. Casey BM, McIntire DD, Leveno KJ (2001) The continuing value of the Apgar score for the assessment of newborn infants. New Engl J Med 344:467–471

    Article  Google Scholar 

  8. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd SIGKDD 2016. ACM

  9. Doyle JM, Echevarria S, Frisbie WP (2003) Race/ethnicity, Apgar and infant mortality. Springer, Berlin

    Google Scholar 

  10. Finch BK (2003) Early origins of the gradient: the relationship between socioeconomic status and infant mortality in the united states. Demography 40(4):675–699

    Article  Google Scholar 

  11. Health (2006) United States, 2005: with chartbook on trends in the health of Americans. US Department of Health and Human Services, Washington

    Google Scholar 

  12. Hegyi T, Carbone T, Anwar M, Ostfeld B, Hiatt M, Koons A, Pinto-Martin J, Paneth N (1998) The Apgar score and its components in the preterm infant. Pediatrics 101(1 Pt 1):77–81

    Article  Google Scholar 

  13. Hessol NA, Fuentes-Afflick E (2005) Ethnic differences in neonatal and postneonatal mortality. Pediatrics 115(1):e44–e51

    Article  Google Scholar 

  14. Hessol NA, Fuentes-Afflick E, Bacchetti P (1998) Risk of low birth weight infants among black and white parents. Elsevier, Amsterdam

    Google Scholar 

  15. Hummer RA, Biegler M, De Turk PB, Forbes D, Frisbie WP, Hong Y, Pullum SG (1999) Race/ethnicity, nativity, and infant mortality in the United States. Soc Forc 77:1083–1118

    Article  Google Scholar 

  16. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence, UAI’95

  17. Kochanek KD, Murphy SL, Xu J, Tejada-Vera B (2006) Deaths: final data for 2014. Natl Vital Stat Rep 65(4):1–122

    Google Scholar 

  18. Ma S, Finch BK (2010) Birth outcome measures and infant mortality. Popul Res Policy Rev 29:865

    Article  Google Scholar 

  19. Macinko J, Guanais FC, de Souza M (2006) Evaluation of the impact of the family health program on infant mortality in brazil, 1990–2002. J Epidemiol Commun Health 60(1):13–19

    Article  Google Scholar 

  20. Mathews T, MacDorman MF (2007) Infant mortality statistics from the 2004 period linked birth/infant death data set. Natl Vital Stat Rep 55(14):1–32

    Google Scholar 

  21. McCormick MC (1985) The contribution of low birth weight to infant mortality and childhood morbidity. N Engl J Med 312:82–90

    Article  Google Scholar 

  22. Osypuk TL, Acevedo-Garcia D (2008) Are racial disparities in preterm birth larger in hypersegregated areas? Am J Epidemiol 167(11):1295–1304

    Article  Google Scholar 

  23. Osypuk TL, Acevedo-Garcia D (2008) Are racial disparities in preterm birth larger in hypersegregated areas? Am J Epidemiol 167(11):1295–304

    Article  Google Scholar 

  24. Papile LA (2001) The apgar score in the 21st century. N Engl J Med 344(7):519–520

    Article  Google Scholar 

  25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  26. Potash E, Brew J, Loewi A, Majumdar S, Reece A, Walsh J, Rozier E, Jorgenson E, Mansour R, Ghani R (2015) Predictive modeling for public health: Preventing childhood lead poisoning. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’15. ACM

  27. Powers D, Parker F (2006) Race/ethnic differences and age-variation in the effects of birth outcomes on infant mortality in the US. Demograph Res 14(10):179–216

    Article  Google Scholar 

  28. Rinta-Koski OP (2018) Machine learning in neonatal intensive care. Ph.D. Thesis, Aalto University, Helsinki.

  29. Rinta-Koski OP, Särkkä S, Hollmén J, Leskinen M, Andersson S (2018) Gaussian process classification for prediction of in-hospital mortality among preterm infants. Neurocomputing 298:134–141

    Article  Google Scholar 

  30. Saravanou A, Noelke C, Huntington N, Acevedo-Garcia D, Gunopulos D (2019) Infant mortality prediction using birth certificate data. DSHealth KDD workshop. arXiv preprint arXiv:1907.08968

  31. Saravanou A, Noelke C, Huntington N, Acevedo-Garcia D, Gunopulos D (2019b) Predicting infant mortality at the time of birth. Population Association Annual Meeting, Austin

    Google Scholar 

  32. Schölkopf B, Williamson RC, Smola AJ, Shawe-Taylor J, Platt JC (2000) Support vector method for novelty detection. In: Advances in neural information processing systems, pp 582–588

  33. Somanchi S, Adhikari S, Lin A, Eneva E, Ghani R (2015) Early prediction of cardiac arrest (code blue) using electronic medical records. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’15. ACM

  34. Wilcox AJ (2001) On the importance-and the unimportance-of birthweight. Int J Epidemiol 30:1233–1241

    Article  Google Scholar 

  35. Wilcox AJ, Skjaerven R (1992) Birth weight and perinatal mortality: the effect of gestational age. Am J Public Health 82:378–82

    Article  Google Scholar 

Download references


The authors would like to thank the anonymous reviewers for providing insightful feedback. This research has been financed by a Google Faculty Research Award, the EU Horizon 2020 research and innovation programme under grant agreement No. 734242 (Project LAMBDA), the ESPA Grant under the No. 16521, the Robert Wood Johnson Foundation Grant 71192 and the W.K. Kellogg Foundation Grant P3036220.

Author information



Corresponding author

Correspondence to Antonia Saravanou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editor: Myra Spiliopoulou and Panagiotis Papapetrou.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Saravanou, A., Noelke, C., Huntington, N. et al. Predictive modeling of infant mortality. Data Min Knowl Disc (2021).

Download citation


  • Data mining
  • Health applications
  • Infant mortality prediction