Abstract
Temporal in-trouble student identification is a classification task at the program level that predicts a final study status of a current student at the end of his/her study time using the data gathered from the students in the past. Moreover, this task focuses on correct predictions for the in-trouble students whose predicted labels are at the lowest performance level. Educational datasets in this task have many challenging characteristics such as multiple classes, overlapping, and imbalance. Simultaneously handling these characteristics has not yet been investigated in educational data mining. For the existing general-purpose works, their methods are not straightforwardly applicable to the educational datasets. Therefore, in this paper, a novel method is proposed as an effective solution to the previously defined task. Combining the traditional k-nearest neighbors and clustering ensemble methods, our method is designed with three new features: relax the number k of the nearest neighbors, use a set of the cluster-based neighbors newly generated by partitioning the subspace of each class, and set four new criteria to decide a final class label rather than the majority voting scheme. As a result, it is a new lazy learning method able to provide correct predictions of more instances belonging to a positive minority class. In an empirical evaluation, higher Accuracy, Recall, and F-measure confirmed the effectiveness of our method as compared to some popular methods on our two real educational datasets and the benchmarking “Iris” dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
- 2.
Weka 3 [http://www.cs.waikato.ac.nz/ml/weka].
References
Academic Affairs Office: Ho Chi Minh City University of Technology, Vietnam. http://www.aao.hcmut.edu.vn. Accessed 29 June 2017
Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chujai, P., Chomboon, K., Chaiyakhan, K., Kerdprasop, K., Kerdprasop, N.: A cluster based classification of imbalanced data with overlapping regions between classes. In: Proceedings of the International Multi-Conference of Engineers and Computer Scientists I, pp. 1–6 (2017)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Das, B., Krishnan, N.C., Cook, D.J.: Handling class overlap and imbalance to detect prompt situations in smart homes. In: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 1–8 (2013)
Fernández, A., García, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Ho, T., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24, 289–300 (2002)
Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)
Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innov. 382, 401–410 (2012)
Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst. Appl. 98, 72–83 (2018)
Livieris, I.E., Drakopoulou, K., Tampakas, V.T., Mikropoulos, T.A., Pintelas, P.: Predicting secondary school students’ performance utilizing a semi-supervised learning approach. J. Educ. Comput. Res. (2018)
López, V., Fernández, A., Moreno-Torres, J.G., Herrera, F.: Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 39, 6585–6608 (2012)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics Probability, vol. 1, pp. 281–297 (1967)
Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)
Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)
Vorraboot, P., Rasmequan, S., Chinnasarn, K.: Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152, 429–443 (2015)
Acknowledgments
This research is funded by Vietnam National University Ho Chi Minh City, Vietnam, under grant number C2017-20-18.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vo, C., Nguyen, H.P. (2019). A Class-Cluster k-Nearest Neighbors Method for Temporal In-Trouble Student Identification. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11431. Springer, Cham. https://doi.org/10.1007/978-3-030-14799-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-14799-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14798-3
Online ISBN: 978-3-030-14799-0
eBook Packages: Computer ScienceComputer Science (R0)