Emergency Department Readmission Risk Prediction: A Case Study in Chile

  • Arkaitz Artetxe
  • Manuel GrañaEmail author
  • Andoni Beristain
  • Sebastián Ríos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10338)


Short time readmission prediction in Emergency Departments (ED) is a valuable tool to improve both the ED management and the healthcare quality. It helps identifying patients requiring further post-discharge attention as well as reducing healthcare costs. As in many other medical domains, patient readmission data is heavily imbalanced, i.e. the minority class is very infrequent, which is a challenge for the construction of accurate predictors using machine learning tools. We have carried computational experiments on a dataset composed of ED admission records spanning more than 100000 patients in 3 years, with a highly imbalanced distribution. We employed various approaches for dealing with this highly imbalanced dataset in combination with different classification algorithms and compared their predictive power for the estimation of the ED readmission probability within 72 h after discharge. Results show that random undersampling and Bagging (RUSBagging) in combination with Random Forest achieves the best results in terms of Area Under ROC Curve (AUC).


Readmission risk Imbalanced data Classification Bagging 


  1. 1.
    Artetxe, A., Beristain, A., Graña, M., Besga, A.: Predicting 30-day emergency readmission risk. In: Graña, M., López-Guede, J.M., Etxaniz, O., Herrero, Á., Quintián, H., Corchado, E. (eds.) ICEUTE/SOCO/CISIS -2016. AISC, vol. 527, pp. 3–12. Springer, Cham (2017). doi: 10.1007/978-3-319-47364-2_1 CrossRefGoogle Scholar
  2. 2.
    Billings, J., Blunt, I., Steventon, A., Georghiou, T., Lewis, G., Bardsley, M.: Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (parr-30). BMJ Open 2(4), e001667 (2012)CrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  4. 4.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  5. 5.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 107–119. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-39804-2_12 CrossRefGoogle Scholar
  6. 6.
    Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRefGoogle Scholar
  7. 7.
    Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., Kripalani, S.: Risk prediction models for hospital readmission: a systematic review. JAMA 306(15), 1688–1698 (2011)CrossRefGoogle Scholar
  8. 8.
    Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11(1), 1 (2011)CrossRefGoogle Scholar
  9. 9.
    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRefGoogle Scholar
  10. 10.
    Mateo, F., Soria-Olivas, E., Martınez-Sober, M., Téllez-Plaza, M., Gómez-Sanchis, J., Redón, J.: Multi-step strategy for mortality assessment in cardiovascular risk patients with imbalanced data. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2016)Google Scholar
  11. 11.
    Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21(2), 427–436 (2008)CrossRefGoogle Scholar
  12. 12.
    Meadem, N., Verbiest, N., Zolfaghar, K., Agarwal, J., Chin, S.C., Roy, S.B.: Exploring preprocessing techniques for prediction of risk of readmission for congestive heart failure patients. In: International Conference on Knowledge Discovery and Data Mining (KDD), Data Mining and Healthcare (DMH) (2013)Google Scholar
  13. 13.
    Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, pp. 324–331. IEEE (2009)Google Scholar
  14. 14.
    Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006)CrossRefGoogle Scholar
  15. 15.
    Zheng, B., Zhang, J., Yoon, S.W., Lam, S.S., Khasawneh, M., Poranki, S.: Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst. Appl. 42(20), 7110–7120 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Arkaitz Artetxe
    • 1
    • 2
  • Manuel Graña
    • 2
    Email author
  • Andoni Beristain
    • 1
  • Sebastián Ríos
    • 3
  1. 1.Vicomtech-IK4 Research CentreSan SebastianSpain
  2. 2.Computation Intelligence GroupBasque University (UPV/EHU)San SebastianSpain
  3. 3.Business Intelligence Research Center (CEINE), Industrial Engineering DepartmentUniversity of ChileSantiagoChile

Personalised recommendations