Social Indicators Research

, Volume 146, Issue 1–2, pp 41–60 | Cite as

Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework

  • Annalina SarraEmail author
  • Lara Fontanella
  • Simone Di Zio


Data mining is widely considered a powerful instrument for searching and acquiring essential relationships among different variables/attributes in a database. Data mining applied in the educational framework is referred to as educational data mining (EDM). EDM enables to get insights into various higher education phenomena, such as students’ academic paths, learning behaviours and determinants of academic success or dropout. In this paper, we aim at evaluating the usefulness of a particular latent class model, the Bayesian Profile Regression, for the identification of students more likely to drop out. Considering students’ performance, motivation and resilience, this technique allows to draw the profiles of students with a higher risk of academic failure. The working example is based on real data collected through an online questionnaire filled in by undergraduate students of an Italian University.


Educational data mining Bayesian Profile Regression Dropout Higher education 


  1. Alva, S. A. (1991). Academic invulnerability among Mexican-American students: The importance of protective and resources and appraisals. Hispanic Journal of Behavioral Sciences, 13, 18–34.Google Scholar
  2. Appleton, J. J., Christensen, S. L., & Furlong, M. J. (2008). Student engagement with school: Critical conceptual and methodological issues of the construct. Psychology in the Schools, 45, 369–386.Google Scholar
  3. Baepler, P., & Murdoch, C. J. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning, 4(2), 1–9.Google Scholar
  4. Baker, R., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.Google Scholar
  5. Baldwin, T. T., Bedell, M. D., & Johnson, J. L. (1997). The social fabric of a team-based M.B.A. program: Network effects on student satisfaction and performance. Academy of Management Journal, 40(6), 1369–1397.Google Scholar
  6. Bound, J., & Turner, S. (2011). Dropouts and diplomas: The divergence in collegiate outcomes. In E. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the economics of education (Vol. 4). New York: Elsevier.Google Scholar
  7. Burt, R. S. (1997). The contingent value of social capital. Administrative Science Quarterly, 42(2), 339–365.Google Scholar
  8. Cho, H., Gay, G., Davidson, B., & Ingraffea, A. (2007). Social networks, communication styles, and learning performance in a CSCL community. Computers & Education, 49(2), 309–329.Google Scholar
  9. Cole, S. T. (2005). Comparing mail and web-based survey distribution methods: Results of surveys to leisure travel retailers. Journal of Travel Research, 43(4), 422–430.Google Scholar
  10. Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of Psychology, 51, 171–200.Google Scholar
  11. Downes-Le Guin, T., Baker, R., Mechling, J., Ruylea, E., & Ruylea, E. (2012). Myths and realities of respondent engagement in online surveys. Journal of Market Research, 54(5), 613–633.Google Scholar
  12. Drea, C. (2004). Student attrition and retention in Ontario’s colleges. College Quarterly, 07(2), 1–7.Google Scholar
  13. Eckles, J. E., & Stradley, G. (2012). A social network analysis of student retention using archival data. Social Psychology of Education, 15(2), 165–180.Google Scholar
  14. Edwards, M., Cangemi, J. P., & Kowalski, C. J. (1990). The college dropout and institutional responsibility. Education, 111(1), 107–116.Google Scholar
  15. Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with IS 100. International Journal of Machine Learning and Computing, 2(4), 476–481.Google Scholar
  16. Gilks, W., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.Google Scholar
  17. Hastie, D. I., Liverani, S., Azizi, L., Richardson, S., & Stücker, I. (2013). A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: Application to smoking and lung cancer. BMC Medical Research Methodology, 13, 129.Google Scholar
  18. Hu, S., & Kuh, G. D. (2002). Being (dis)engaged in educationally purposeful activities: The influences of student and institutional characteristics. Research in Higher Education, 43(5), 555–575.Google Scholar
  19. Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173.Google Scholar
  20. Kotsiantis, S. (2009). Educational data mining: A case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111.Google Scholar
  21. Larson, R. W. (2000). Toward a psychology of positive youth development. American Psychologist, 55(1), 170–183.Google Scholar
  22. Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M., & Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. Journal of Statistical Software, 64(7), 1–30.Google Scholar
  23. Locke, E. A., & Latham, G. P. (2002). Building practically useful theory of goal setting and task motivation. American Psychologist, 57(1), 705–717.Google Scholar
  24. Marsh, M. L., & Meyer, H. A. (1997). Understanding motivation and schooling: Where we’ve been, where we are, and where we need to go. Educational Psychology Review, 9, 399–427.Google Scholar
  25. Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43, 267–282.Google Scholar
  26. Martin, A. J., Marsh, H. W., Williamson, A., & Debus, R. L. (2003). Self-handicapping, defensive pessimism, and goal orientation: A qualitative study of university students. Journal of Educational Psychology, 95(3), 617–628.Google Scholar
  27. Masten, A. S. (1994). Resilience in individual development: Successful adaptation despite risk and adversity. In M. Wang & E. Gordon (Eds.), Risk and resilience in inner city America: Challenges and prospects (pp. 3–25). Hillsdale, NJ: Erlbaum.Google Scholar
  28. Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of student dropout using personal profile and data mining approach. In K. Lavangnananda, S. Phon-Amnuaisuk, W. Engchuan, & J. Chan (Eds.), Learning and optimization (Vol. 5, pp. 143–155). Cham: Springer.Google Scholar
  29. Molitor, J., Papathomas, M., Jerrett, M., & Richardson, S. (2010). Bayesian profile regression with an application to the National Survey of Children’s Health. Biostatistics, 11(3), 484–498.Google Scholar
  30. Nithya, P., Umamaheswari, B., & Umadevi, A. (2016). A survey on educational data mining in field of education. International Journal of Advanced Research in Computer Engineering & Technology, 5(1), 69–78.Google Scholar
  31. Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., & Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: Application to searching for gene x gene patterns. Genetic Epidemiology, 36(6), 663–674.Google Scholar
  32. Papathomas, M., Molitor, J., Richardson, S., Riboli, E., & Vineis, P. (2011). Examining the joint effect of multiple risk factors using exposure risk profiles: Lung cancer in non smokers. Environmental Health Perspectives, 119(1), 84–91.Google Scholar
  33. Pirani, M., Best, N., Blangiardo, M., Liverani, S., Atkinson, R. W., & Fuller, G. W. (2015). Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles. Environmental International, 79, 56–64.Google Scholar
  34. Quadri, M. M., & Kalyankar, N. (2010). Drop out feature of student data for academic performance using decision tree techniques. Global Journal of Computer Science and Technology, 10(2), 3–5.Google Scholar
  35. Romero, C., & Ventura, S. (2007). Educational data mining. A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.Google Scholar
  36. Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state-of-the-art. IEEE Transactions on Systems, Man, and Cybernetics Part C, 40, 601–618.Google Scholar
  37. Roster, C. A., Lucianetti, L., & Albaum, G. (2015). Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice, 11(1), D1.Google Scholar
  38. Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.Google Scholar
  39. Smith, J. P., & Naylor, R. A. (2001). Dropping out of university: A statistical analysis of the probability of withdrawal for UK university students. Journal of Royal Statistical Society Series A, 164, 389–405.Google Scholar
  40. Thomas, S. L. (2000). Ties that bind: A social network approach to understanding student integration and persistence. The Journal of Higher Education, 71(5), 591–615.Google Scholar
  41. Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89–125.Google Scholar
  42. Ulriksen, L., Madsen, L. M., & Holmegaard, H. T. (2010). What do we know about explanations for drop out/opt out among young people from STM higher education programmes? Studies in Science Education, 46(2), 209–244.Google Scholar
  43. Vandamme, J. P., Meskens, N., & Superby, J. F. (2007). Predicting academic performance by data mining methods. Education Economics, 15(4), 405–419.Google Scholar
  44. Vrijheid, M., Slama, R., Robinson, O., Chatzi, L., Coen, M., van den Hazel, P., et al. (2014). The human early-life exposome (HELIX): Project rationale and design. Environmental Health Perspectives, 122, 535–544.Google Scholar
  45. Yorke, M., & Longden, B. (2008). The first year experience of higher education in the UK: Final report. York, UK: Higher Education Academy Report.Google Scholar
  46. Zimmerman, B. J. (2002). Achieving self-regulation: The trial and triumph of adolescence. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 1–28). Greenwich, CT: Information Age.Google Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  • Annalina Sarra
    • 1
    Email author
  • Lara Fontanella
    • 1
  • Simone Di Zio
    • 1
  1. 1.Department of Legal and Social SciencesUniversity “G.d’Annunzio” of Chieti-PescaraPescaraItaly

Personalised recommendations