Studying Weariness Prediction Using SMOTE and Random Forests
This article is aimed at the low accuracy of student weariness prediction in education and the poor prediction effect of traditional prediction models. It was established the SMOTE (Synthetic Minority Oversampling Technique) algorithm and random forest prediction models. This study puts forward to useing the SMOTE oversampling method to balance the data set and then use the random forest algorithm to train the classifier. By comparing the common single classifier with the ensemble learning classifier, it was found that the SMOTE and Random forest method performed more prominently, and the reasons for the increase in the AUC value after using the SMOTE method were analyzed. Using Massive Open Online Course (Mooc) synthesis student’s datasets, which mainly include the length of class, whether the mouse has moved, whether there is a job submitted, whether there is participating in discussions and completing the accuracy of the assignments. It is proved that this method can significantly improve the classification effect of classifiers, so teachers can choose appropriate teaching and teaching interventions to improve student’s learning outcomes.
KeywordsEducation SMOTE Random forest
This work was partly supported by the National Key R&D Program of China (No. 2017YFB0203102), the State Key Program of National Natural Science Foundation of China (No. 91530324).
- 3.He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2010)Google Scholar
- 5.He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2015)Google Scholar
- 6.Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, vol. 68, pp. 10–15 (2010)Google Scholar
- 10.Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
- 12.Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. Mach. Learn. 45(1), 157–176 (2004)Google Scholar
- 14.Chuanke, X., Chen, Y., Zhao, Y.: Prediction of protein-protein interaction based on improved pseudo amino acid composition. J. Shandong Univ.: Nat. Sci. 44(9), 17–21 (2016)Google Scholar
- 17.Groot, S., Kitsuregawa, M.: Jumbo: Beyond MapReduce for workload balancing. In: 36th International Conference on Very Large Data Bases (2010)Google Scholar