Predicting Academic Performance of International Students Using Machine Learning Techniques and Human Interpretable Explanations Using LIME—Case Study of an Indian University
- 25 Downloads
With the increasing globalization in higher education, universities are giving importance to attract international students to achieve diversity and good ratings from accreditation bodies. Machine learning techniques have the potential to predict the class of a dependent variable and can thus enable educational institutes to predict the academic performance of students and improve related learning processes. The purpose of this study is to predict the academic performance of international students studying at a university in North India. This study has explored the predictive potential of attributes like their attendance percentage, pending reappears, economy level, geographical region, etc. in developing a statistical model that can predict the likely performance of a student as satisfactory or poor. Machine learning algorithms like logistic regression, naïve Bayes, CART and random forests have been used. Classification accuracy, sensitivity, specificity and area under the ROC curve have been used for evaluation purpose. Interpretable explanations for model outcomes have also been obtained. Classification accuracy of above 90% was observed during experiments. Features like attendance percentage and pending reappears were observed to be contributing most towards prediction outcomes.
KeywordsAcademic performance Binary classification Educational data mining Human interpretability Machine learning Predictive models
The data set is of international students studying at Lovely Professional University (LPU), India, during 2015–16. The authors are thankful to Division of Research and Development, at LPU, for granting permission to use this data set for our research study.
- 4.F. Ahmad, N.H. Ismail, A.A. Aziz, The prediction of students’ academic performance using classification data mining techniques. Appl. Math. Sci. 9(129), 6415–6426 (2015)Google Scholar
- 5.G. Hughes, C. Dobbins, The utilization of data analysis techniques in predicting student performance in massive open online courses (MOOCs). Res. Pract. Technol. Enhanced Learn 10, 10 (2015). https://doi.org/10.1186/s41039-015-0007-z
- 6.O. Simpson, Predicting student success in open and distance learning. Open Learn.Google Scholar
- 7.A.M. Shahiria, W. Husaina, N.A. Rashida, A review on predicting student’s performance using data mining techniques. Third Inf. Syst. Int. Conf. Procedia Comput. Sci. 72(2015), 414–422 (2015)Google Scholar
- 10.D. Storcheus, A. Rostamizadeh, S. Kumar, A survey of modern questions and challenges in feature extraction, in The 1st International Workshop Feature Extraction: Modern Questions and Challenges, JMLR: Workshop and Conference Proceedings. vol. 44, (2015), pp. 1–18Google Scholar
- 11.I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 1157–1182 (2003)Google Scholar
- 12.K.L. Wagstaff, Machine learning that matters, in Proceedings of the 29th International Conference on Machine Learning (Edinburgh, Scotland, UK, 2012)Google Scholar
- 14.L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (CRC press, 1084)Google Scholar
- 16.M.T. Ribeiro, S. Singh, C. Guestrin, Why should i trust you? Explaining the predictions of any classifier, in KDD 2016 (San Francisco, CA, USA, 2016). http://dx.doi.org/10.1145/2939672.2939778
- 17.Z.C. Lipton, The mythos of model interpretability, in ICML Workshop on Human Interpretability in Machine Learning (WHI 2016) (New York, NY, USA, 2016)Google Scholar