Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study

  • Boris PérezEmail author
  • Camilo Castellanos
  • Darío Correal
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 833)


The prevention of students dropping out is considered very important in many educational institutions. In this paper we describe the results of an educational data analytics case study focused on detection of dropout of Systems Engineering (SE) undergraduate students after 6 years of enrollment in a Colombian university. Original data is extended and enriched using a feature engineering process. Our experimental results showed that simple algorithms achieve reliable levels of accuracy to identify predictors of dropout. Decision Trees, Logistic Regression, Naive Bayes and Random Forest results were compared in order to propose the best option. Also, Watson Analytics is evaluated to establish the usability of the service for a non expert user. Main results are presented in order to decrease the dropout rate by identifying potential causes. In addition, we present some findings related to data quality to improve the students data collection process.


Student drop out Student desertion prediction Educational data mining Prediction models 


  1. 1.
    Al-Radaideh, Q.A., Al-Shawakfa, E.M., Al-Najjar, M.I.: Mining student data using decision trees. In: International Arab Conference on Information Technology (ACIT 2006), Yarmouk University, Jordan (2006)Google Scholar
  2. 2.
    Aulck, L., Velagapudi, N., Blumenstock, J., West, J.: Predicting Student Dropout in Higher Education. arXiv preprint arXiv:1606.06364, June 2016
  3. 3.
    Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. arXiv preprint arXiv:1201.3417 (2012)
  4. 4.
    Bhardwaj, B.K., Pal, S.: Data mining: a prediction for performance improvement using classification. arXiv preprint arXiv:1201.3418 (2012)
  5. 5.
    Brunner, J.J., et al.: Higher Education in Regional and City Development Antioquia, Colombia (2016)Google Scholar
  6. 6.
    Brunsden, V., Davies, M., Shevlin, M., Bracken, M.: Why do he students dropout? A test of Tinto’s model. J. Furth. High. Educ. 24(3), 301–310 (2000). Scholar
  7. 7.
    Chapman, P., et al.: CRISP-DM 1.0. CRISP-DM Consortium 76, 3 (2000)Google Scholar
  8. 8.
    Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study. In: International Working Group on Educational Data Mining (2009).
  9. 9.
    Devasia, T., Vinushree T P, Hegde, V.: Prediction of students performance using educational data mining. In: 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), pp. 91–95. IEEE, March 2016.,
  10. 10.
    Durso, S.D.O., Cunha, J.V.A.D.: Determinant factors for undergraduate student’s dropout in an accounting studies department of a Brazilian public university. Educação em Revista 34 (2018)Google Scholar
  11. 11.
    de Educacion, M.: Spadies - sistema de prevencion y analisis a la desercion en las instituciones de educacion superior. Accessed 18 July 2017
  12. 12.
    Jing, L.: Data mining and its applications in higher education. New Dir. Inst. Res. 2002(113), 17–36 (2002)., Scholar
  13. 13.
    Kim, D., Kim, S.: Sustainable education: analyzing the determinants of university student dropout by nonlinear panel data models. Sustainability 10(4), 954 (2018)CrossRefGoogle Scholar
  14. 14.
    Kovacic, Z.: Early prediction of student success: mining students’ enrolment data. In: Proceedings of Informing Science & IT Education Conference (InSITE) (2010)Google Scholar
  15. 15.
    Márquez-Vera, C., Cano, A., Romero, C., Noaman, A.Y.M., Mousa Fardoun, H., Ventura, S.: Early dropout prediction using data mining: a case study with high school students. Expert Syst. 33(1), 107–124 (2016)., Scholar
  16. 16.
    Mishra, T., Kumar, D., Gupta, S.: Mining students’ data for prediction performance. In: 2014 Fourth International Conference on Advanced Computing & Communication Technologies (ACCT), pp. 255–262. IEEE (2014)Google Scholar
  17. 17.
    Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33(1), 135 – 146 (2007)., Scholar
  18. 18.
    Seidman, A.: Retention revisited: R= E, Id+ E & In, Iv. Coll. Univ. 71(4), 18–20 (1996)Google Scholar
  19. 19.
    Herzog, S.: Estimating student retention and degree completion time: decisiontrees and neural networks vis-á-vis regression. New Dir. Inst. Res. 2006(131), 17–33 (2006)., Scholar
  20. 20.
    Tekin, A.: Early prediction of students’ grade point averages at graduation: a data mining approach. Eurasian J. Educ. Res. 54, 207–226 (2014). Scholar
  21. 21.
    Tinto, V.: Dropout from higher education: a theoretical synthesis of recent research. Rev. Educ. Res. 45(1), 89–125 (1975)CrossRefGoogle Scholar
  22. 22.
    Wirth, R.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, pp. 29–39 (2000)Google Scholar
  23. 23.
    Yukselturk, E., Ozekes, S., Türel, Y.K.: Predicting dropout student: an application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 17(1), 118–133 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Univ. Francisco de Paula Stder.CúcutaColombia
  2. 2.Universidad de los AndesBogotáColombia

Personalised recommendations