A Machine Learning Approach to Identify the Students at the Risk of Dropping Out of Secondary Education in India

  • Sagarika Nangia
  • Jhilmil Anurag
  • Ishani Gambhir
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1118)


Having a significant number of students dropping studies at higher education limits their possibilities of having better employment opportunities. According to a UNESCO report, India is fifty years behind in achieving the goal of universal secondary education. There is almost a 30% drop in gross enrolment ratio (GER) from secondary education to higher education. In this paper, we use machine learning models to identify the students who are likely to drop out in the ongoing session. We demonstrate, to the best of our knowledge, the first results on a secondary-level education in a developing nation like India to classify students as potential dropouts. We collect data from secondary-level institutions for the academic session 2016–17, which captures the variation in ethnic, social and financial background of students as well as their academic performances. We do feature analysis to find out the most dominant attributes in the dataset and use them in different machine learning algorithms on the task of binary classification for potential dropout. Our best algorithm is able to classify correctly with least number of false negatives.


Machine learning Secondary education Developing nation Dropouts Feature analysis Binary classification 


  1. 1.
    Basumatary, R.: School dropout across indian states and uts: aneconometric study. Int. Res. J. Soc. Sci. 1(4), 28–35 (2012)Google Scholar
  2. 2.
    Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study. International Working Group on Educational Data Mining (2009)Google Scholar
  3. 3.
    Zhang, G., Anderson, T.J., Ohland, M.W., Thorndyke, B.R.: Identifying factors influencing engineering student graduation: alongitudinal and cross-institutional study. J. Eng. Educ. 93(4), 313–320 (2004)Google Scholar
  4. 4.
    Gouda, S., Sekher, T.: Factors leading to school dropouts in india:An analysis of national family health survey-3 data. IOSR J. Res. Method Educ. 4(6), 75–83 (2014)Google Scholar
  5. 5.
    Nandeshwar, A., Menzies, T., Nelson, A.: Learning patterns of university student retention. Expert. Syst. Appl. 38(12), 14984–14996 (2011)Google Scholar
  6. 6.
    Kotsiantis, S.B., Pierrakeas, C., Pintelas, P.E.: Preventing student dropout in distance learning using machine learning techniques. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 267–274. Springer (2003)Google Scholar
  7. 7.
    Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., Loumos, V.: Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput. Educ. 53(3), 950–965 (2009)Google Scholar
  8. 8.
    Student data collection in sync with u-dise.
  9. 9.
    Pradeep, A., Das, S., Kizhekkethottam, J.J.: Students dropout fac- tor prediction using EDM techniques. In: 2015 International Conference on Soft-Computing and Networks Security (ICSNS), pp. 1–7. IEEE (2015)Google Scholar
  10. 10. [Online]. Available:
  11. 11.
    LeCun, Y.A., Bottou, L., Orr, G.B., Mu-ller, K.-R.: Efficient backprop. In: Neural Networks: Tricks of the Trade, pp. 9–48. Springer (2012)Google Scholar
  12. 12.
    Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)Google Scholar
  13. 13.
    Wikipedia contributors, K-nearest neighbors algorithm—Wikipedia, the free encyclopedia (2018).
  14. 14.
    McCullagh, P., Nelder, J.A.: Generalized Linear Models. CRC press, vol. 37 (1989)Google Scholar
  15. 15.
    Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)Google Scholar
  17. 17.
    Scholkopf, B., Smola, A.J., Bach, F., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press (2002)Google Scholar
  18. 18.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  19. 19.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)Google Scholar
  20. 20.
    Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3121–3124Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Sagarika Nangia
    • 1
  • Jhilmil Anurag
    • 2
  • Ishani Gambhir
    • 3
  1. 1.Oracle India Pvt. Ltd.BengaluruIndia
  2. 2.TATA MotorsPuneIndia
  3. 3.TATA MotorsLucknowIndia

Personalised recommendations