Abstract
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge (1996)
Cios, K., Moore, G.: Uniqueness of Medical Data Mining. Artificial Intelligence in Medicine 26(1–2), 1–24 (2002)
Silva, Á., Cortez, P., Santos, M.F., Gomes, L., Neves, J.: Mortality assessment in intensive care units via adverse events using artificial neural networks. Artif. Intell. Med. 36(3), 223–234 (2006)
Silva, Á., Cortez, P., Santos, M.F., Gomes, L., Neves, J.: Rating organ failure via adverse events using data mining in the intensive care unit. Artif. Intell. Med. 43(3), 179–193 (2008)
Chiusano, G., Staglianò, A., Basso, C., Verri, A.: Unsupervised tissue segmentation from dynamic contrast-enhanced magnetic resonance imaging. Artif. Intell. Med. 61(1), 53–61 (2014)
Azari, A., Janeja, V.P., Mohseni, A.: Predicting hospital length of stay (phlos): a multi-tiered data mining approach. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 17–24. IEEE (2012)
Castillo, M.G.: Modelling patient length of stay in public hospitals in Mexico. PhD thesis, University of Southampton (2012)
Clifton, C., Thuraisingham, B.: Emerging standards for data mining. Comput. Stan. Interfaces 23(3), 187–193 (2001)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New york (2008)
Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences 225, 1–17 (2013)
Merom, D., Shohat, T., Harari, G., Oren, M., Green, M.S.: Factors associated with inappropriate hospitalization days in internal medicine wards in israel: a cross-national survey. Int. J. Qual. Health Care 10(2), 155–162 (1998)
Abelha, F., Maia, P., Landeiro, N., Neves, A., Barros, H.: Determinants of outcome in patients admitted to a surgical intensive care unit. Arq. Med. 21(5–6), 135–143 (2007)
Oliveira, A., Dias, O., Mello, M., Arajo, S., Dragosavac, D., Nucci, A., Falcão, A.: Fatores associados à maior mortalidade e tempo de internação prolongado em uma unidade de terapia intensiva de adultos. Rev. Bras. de Terapia Intensiva 22(3), 250–256 (2010)
Kalra, A.D., Fisher, R.S., Axelrod, P.: Decreased length of stay and cumulative hospitalized days despite increased patient admissions and readmissions in an area of urban poverty. J. Gen. Intern. Med. 25(9), 930–935 (2010)
Freitas, A., Silva-Costa, T., Lopes, F., Garcia-Lema, I., Teixeira-Pinto, A., Brazdil, P., Costa-Pereira, A.: Factors influencing hospital high length of stay outliers. BMC Health Serv. Res. 12(265), 1–10 (2012)
Sheikh-Nia, S.: An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration. University of Guelph, Ontario, Canada, Thesis for Master Science Degree (2012)
Cortez, P.: Data mining with neural networks and support vector machines using the R/rminer Tool. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 572–583. Springer, Heidelberg (2010)
Brown, M., Kros, J.: Data mining and the impact of missing data. Ind. Manage. Data Syst. 103(8), 611–621 (2003)
Menard, S.: Applied logistic regression analysis, vol. 106. Sage, Thousand Oaks (2002)
Witten, I.H., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Franscico (2011)
Bi, J., Bennett, K.: Regression error characteristic curves. In: Fawcett, T., Mishra, N. (eds.) Proceedings of 20th International Conference on Machine Learning (ICML). AAAI Press, Washington DC, USA (2003)
Acknowledgments
We wish to thank the physicians that participated in this study for their valuable feedback. Also, we would like to thank the anonymous reviewers for their helpful suggestions. The work of P. Cortez has been supported by FCT – Fundação para a Ciência e Tecnologia within the Project Scope: PEst-OE/EEI/UI0319/2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Caetano, N., Cortez, P., Laureano, R.M.S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2014. Lecture Notes in Business Information Processing, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-22348-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-22348-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22347-6
Online ISBN: 978-3-319-22348-3
eBook Packages: Computer ScienceComputer Science (R0)