Automatic Essay Scoring Based on Coh-Metrix Feature Selection for Chinese English Learners

  • Xia LiEmail author
  • Jianda Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10108)


Automatic essay scoring can be based on essay’s content or form. We believe that both classes of features can reflect some aspects of an essay’s quality and they should be combined. In this paper, we use Coh-Metrix and importance measure to extract features that cover a wide range of features relating to the essay’s grammatical structure, content, form, cohesion, and so on, and more related to the Chinese English Learners. This is a more complete set of features than those used in the literature and it is expected to better cover an essay’s characteristics. SVM and C5.0 classification methods based on these features are used to predict the essay’s score. Our experiments show that this set of features can produce good results on Chinese English essays even when we use top 5 and top 15 features with higher importance score.


Automatic scoring of English essay Feature selection Machine learning 



This work is supported by the National Science Foundation of China (61402119).


  1. 1.
    Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. 2, 319–330 (2003)Google Scholar
  2. 2.
    Attali, Y., Burstein, J.: Automated essay scoring with e-rater® V. 2. J. Technol. Learn. Assess. 4(3), 1–30 (2006)Google Scholar
  3. 3.
    Guang-Hui, M.: A contrastive analysis of the characteristics of Chinese and American college students’ English compositions. Foreign Lang. Teach. Learn. 34(5), 345–380 (2002)Google Scholar
  4. 4.
    Burstein, J., Chodorow, M.: Automated essay scoring for nonnative English speakers. In: Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing, pp. 68–75. Association for Computational Linguistics (1999)Google Scholar
  5. 5.
    Shermis, M.D., Burstein, J.: Automated essay scoring: cross-disciplinary perspective. Comput. Linguist. 30(2), 245–246 (2004)CrossRefGoogle Scholar
  6. 6.
    Shermis, M., Mzumara, H.R., Olson, J., Harrington, S.: On-line grading of student essays: PEG goes on the world wide web. Assess. Eval. High. Educ. 26(3), 247–259 (2001)CrossRefGoogle Scholar
  7. 7.
    Larkey, L., Croft, W.B.: A text categorization approach to automated essay scoring. In: Shermis, M.D., Burstein, J. (eds.) Automated Essay Scoring: A Cross-Disciplinary Perspective, pp. 55–70. Lawrence Erlbaum Associates, Inc., Hillsdale (2003)Google Scholar
  8. 8.
    Chen, H., He, B., Luo, T., Li, B.: A ranked-based learning approach to automated essay scoring. In: The Second International Conference on Cloud and Green Computing, pp. 448–455 (2012)Google Scholar
  9. 9.
    Rudner, L.M., Liang, T.: Automated essay scoring using Bayes’ theorem. J. Technol. Learn. Assess. 1(2), 3–21 (2002)Google Scholar
  10. 10.
    Zhou, Y., Fan, T., Huang, G.: An Automatic English Composition scoring model based on neural network algorithm. In: 13th International Conference on Computer and Information Science (ICIS), pp. 149–152. IEEE Press (2014)Google Scholar
  11. 11.
    Bin, L., Jian-Min, Y.: Automated essay scoring using multi-classifier fusion. In: Wu, Y. (ed.) ICCIC 2011. CCIS, vol. 233, pp. 151–157. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-24010-2_21 CrossRefGoogle Scholar
  12. 12.
    McNamara, D.S., Crossley, S.A., Roscoe, R.D., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assessing Writ. 23, 35–59 (2015)CrossRefGoogle Scholar
  13. 13.
    Xie, H., Zou, D., Lau, R.Y., Wang, F.L., Wong, T.L.: Generating incidental word-learning tasks via topic-based and load-based profiles. IEEE Multimedia 23(1), 60–70 (2016)CrossRefGoogle Scholar
  14. 14.
    Zou, D., Xie, H., Li, Q., Wang, F.L., Chen, W.: The load-based learner profile for incidental word learning task generation. In: International Conference on Web-Based Learning, pp. 190–200 (2014)Google Scholar
  15. 15.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)Google Scholar
  17. 17.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)Google Scholar
  18. 18.
    Gui, S., Yang, H.: Chinese English Learners Corpus. Shanghai Foreign Language Education Press, Shanghai (2002)Google Scholar
  19. 19.
    Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Methods Instrum. Comput. 36(2), 193–202 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Key Laboratory of Language Engineering and ComputingGuangdong University of Foreign StudiesGuangzhouChina
  2. 2.National Key Research Center for Linguistics and Applied LinguisticsGuangdong University of Foreign StudiesGuangzhouChina

Personalised recommendations