Automatic Essay Scoring Based on Coh-Metrix Feature Selection for Chinese English Learners
Automatic essay scoring can be based on essay’s content or form. We believe that both classes of features can reflect some aspects of an essay’s quality and they should be combined. In this paper, we use Coh-Metrix and importance measure to extract features that cover a wide range of features relating to the essay’s grammatical structure, content, form, cohesion, and so on, and more related to the Chinese English Learners. This is a more complete set of features than those used in the literature and it is expected to better cover an essay’s characteristics. SVM and C5.0 classification methods based on these features are used to predict the essay’s score. Our experiments show that this set of features can produce good results on Chinese English essays even when we use top 5 and top 15 features with higher importance score.
KeywordsAutomatic scoring of English essay Feature selection Machine learning
This work is supported by the National Science Foundation of China (61402119).
- 1.Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. 2, 319–330 (2003)Google Scholar
- 2.Attali, Y., Burstein, J.: Automated essay scoring with e-rater® V. 2. J. Technol. Learn. Assess. 4(3), 1–30 (2006)Google Scholar
- 3.Guang-Hui, M.: A contrastive analysis of the characteristics of Chinese and American college students’ English compositions. Foreign Lang. Teach. Learn. 34(5), 345–380 (2002)Google Scholar
- 4.Burstein, J., Chodorow, M.: Automated essay scoring for nonnative English speakers. In: Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing, pp. 68–75. Association for Computational Linguistics (1999)Google Scholar
- 7.Larkey, L., Croft, W.B.: A text categorization approach to automated essay scoring. In: Shermis, M.D., Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, pp. 55–70. Lawrence Erlbaum Associates, Inc., Hillsdale (2003)Google Scholar
- 8.Chen, H., He, B., Luo, T., Li, B.: A ranked-based learning approach to automated essay scoring. In: The Second International Conference on Cloud and Green Computing, pp. 448–455 (2012)Google Scholar
- 9.Rudner, L.M., Liang, T.: Automated essay scoring using Bayes’ theorem. J. Technol. Learn. Assess. 1(2), 3–21 (2002)Google Scholar
- 10.Zhou, Y., Fan, T., Huang, G.: An Automatic English Composition scoring model based on neural network algorithm. In: 13th International Conference on Computer and Information Science (ICIS), pp. 149–152. IEEE Press (2014)Google Scholar
- 14.Zou, D., Xie, H., Li, Q., Wang, F.L., Chen, W.: The load-based learner profile for incidental word learning task generation. In: International Conference on Web-Based Learning, pp. 190–200 (2014)Google Scholar
- 16.Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)Google Scholar
- 17.Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)Google Scholar
- 18.Gui, S., Yang, H.: Chinese English Learners Corpus. Shanghai Foreign Language Education Press, Shanghai (2002)Google Scholar