Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week’s Activities

  • Ahmed Alamri
  • Mohammad Alshehri
  • Alexandra CristeaEmail author
  • Filipe D. Pereira
  • Elaine Oliveira
  • Lei Shi
  • Craig Stewart
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11528)


While Massive Open Online Course (MOOCs) platforms provide knowledge in a new and unique way, the very high number of dropouts is a significant drawback. Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout. The jury is still out on which factors are the most appropriate predictors. However, the literature agrees that early prediction is vital to allow for a timely intervention. Whilst feature-rich predictors may have the best chance for high accuracy, they may be unwieldy. This study aims to predict learner dropout early-on, from the first week, by comparing several machine-learning approaches, including Random Forest, Adaptive Boost, XGBoost and GradientBoost Classifiers. The results show promising accuracies (82%94%) using as little as 2 features. We show that the accuracies obtained outperform state of the art approaches, even when the latter deploy several features.


Educational data mining Learning analytics Dropout prediction Machine learning MOOCs 



We would like to thank FAPEAM (Foundation for the State of Amazonas Research), through Edital 009/2017, for partially funding this research.


  1. 1.
    Ipaye, B., Ipaye, C.B.: Opportunities and challenges for open educational resources and massive open online courses: the case of Nigeria. Commonwealth of Learning. Educo-Health Project. Ilorin (2013)Google Scholar
  2. 2.
    Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 60–65 (2014)Google Scholar
  3. 3.
    Yang, D., Sinha, T., Adamson, D., Rose, C.P.: Turn on, tune in, drop out: anticipating student dropouts in massive open online courses. In: Proceedings of NIPS Work Data Driven Education, pp. 1–8 (2013)Google Scholar
  4. 4.
    Jordan, K.: MOOC completion rate: the data (2013)Google Scholar
  5. 5.
    Ye, C., Biswas, G.: Early prediction of student dropout and performance in MOOCs using higher granularity temporal information. J. Learn. Anal. 1, 169–172 (2014)CrossRefGoogle Scholar
  6. 6.
    Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of International Conference Document Anal. and Recognition ICDAR, pp. 440–445 (2011)Google Scholar
  7. 7.
    Wen, M., Yang, D., Ros, C.P., Rosé, C.P., Rose, C.P.: Linguistic reflections of student engagement in massive open online courses. In: Proceedings of 8th International Conference of Weblogs Social Media, ICWSM 2014, pp. 525–534 (2014)Google Scholar
  8. 8.
    Wen, M., Yang, D., Rosé, C.P.: Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In: Proceedings of the 7th International Conference on Educational Data Mining (EDM), pp. 1–8 (2014)Google Scholar
  9. 9.
    Gardner, J., Brooks, C.: Student success prediction in MOOCs. User Model. User-Adapt. Inter. 28, 127–203 (2018)CrossRefGoogle Scholar
  10. 10.
    Hong, B., Wei, Z., Yang, Y.: Discovering learning behavior patterns to predict dropout in MOOC. In: 12th International Conference on Computer Science and Education, ICCSE 2017, pp. 700–704. IEEE. (2017)Google Scholar
  11. 11.
    Xing, W., Chen, X., Stein, J., Marcinkowski, M.: Temporal predication of dropouts in MOOCs: reaching the low hanging fruit through stacking generalization. Comput. Hum. Behav. 58, 119–129 (2016)CrossRefGoogle Scholar
  12. 12.
    Halawa, S., Greene, D., Mitchell, J.: Dropout prediction in MOOCs using learner activity features. In: Proceedings of the Second European MOOC Stakeholder Summit, pp. 58–65 (2014)Google Scholar
  13. 13.
    Sharkey, M., Sanders, R.: A process for predicting MOOC attrition. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 50–54 (2014)Google Scholar
  14. 14.
    Nagrecha, S., Dillon, J.Z., Chawla, N.V.: MOOC dropout prediction: lessons learned from making pipelines interpretable. In: International World Wide Web Conferences Steering Committee Proceedings of the 26th International Conference on World Wide Web Companion, pp. 351–359 (2017)Google Scholar
  15. 15.
    Bote-Lorenzo, M.L., Gómez-Sánchez, E.: Predicting the decrease of engagement indicators in a MOOC. In: Proceedings of the Seventh International Learning Analytics and Knowledge Conference on LAK 2017. pp. 143–147. ACM Press, New York (2017)Google Scholar
  16. 16.
    Liang, J., Yang, J., Wu, Y., Li, C., Zheng, L.: Big data application in education: Dropout prediction in Edx MOOCs. In: Proceedings of 2016 IEEE 2nd International Conference on Multimedia Big Data, BigMM 2016, pp. 440–443, IEEE (2016)Google Scholar
  17. 17.
    Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, ACM. (2016)Google Scholar
  18. 18.
    Dietterich, Thomas G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). Scholar
  19. 19.
    Ruipérez-Valiente, J.A., Cobos, R., Muñoz-Merino, P.J., Andujar, Á., Delgado Kloos, C.: Early prediction and variable importance of certificate accomplishment in a MOOC. In: Delgado Kloos, C., Jermann, P., Pérez-Sanagustín, M., Seaton, D.T., White, S. (eds.) EMOOCs 2017. LNCS, vol. 10254, pp. 263–272. Springer, Cham (2017). Scholar
  20. 20.
    Cristea, A.I., Alamri, A., Kayama, M., Stewart, C., Alshehri, M., Shi, L.: Earliest predictor of dropout in MOOCs: a longitudinal study of futurelearn courses. In: 27th International Conference on Information Systems Development (ISD) (2018)Google Scholar
  21. 21.
    Alshehri, M., et al.: On the need for fine-grained analysis of gender versus commenting behaviour in MOOCs. In: Proceedings of the 2018 The 3rd International Conference on Information and Education Innovations, pp. 73–77. ACM (2018)Google Scholar
  22. 22.
    Cristea, A.I., Alshehri, M., Alamri, A., Kayama, M., Stewart, C., Shi, L.: How is learning fluctuating? futurelearn MOOCs fine-grained temporal analysis and feedback to teachers and designers. In: 27th International Conference on Information Systems Development (ISD2018). Association for Information Systems, Lund (2018)Google Scholar
  23. 23.
    Dorfman, R.: A formula for the Gini coefficient. Rev. Econ. Stat. 61, 146–149 (1979)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class adaboost. Statistics and its. Interface 2, 349–360 (2009)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Schapire, R.E., Freund, Y.: Boosting: Foundations and algorithms. MIT press, Cambridge (2012)zbMATHGoogle Scholar
  27. 27.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  28. 28.
    An, S., Liu, W., Venkatesh, S.: Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognit. 40, 2154–2162 (2007)CrossRefGoogle Scholar
  29. 29.
    Hinkley, D.V., Cox, D.: Theoretical Statistics. Chapman and Hall/CRC, London (1979)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ahmed Alamri
    • 1
  • Mohammad Alshehri
    • 1
  • Alexandra Cristea
    • 1
    Email author
  • Filipe D. Pereira
    • 2
  • Elaine Oliveira
    • 2
  • Lei Shi
    • 3
  • Craig Stewart
    • 4
  1. 1.Department of Computer ScienceDurham UniversityDurhamUK
  2. 2.Institute of ComputingFederal University of RoraimaBoa VistaBrazil
  3. 3.Centre for Educational DevelopmentUniversity of LiverpoolLiverpoolUK
  4. 4.School of Computing Electronics and MathematicsCoventry UniversityCoventryUK

Personalised recommendations