Skip to main content

Advertisement

Log in

Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures

  • Published:
International Journal of Advances in Engineering Sciences and Applied Mathematics Aims and scope Submit manuscript

Abstract

Automatic short answer grading (ASAG) systems are designed to automatically assess short answers in natural language having a length of a few words to a few sentences. Many ASAG techniques have been proposed in the literature. In this paper, we critically analyse the role of evaluation measures used for assessing the quality of ASAG techniques. In real-world settings, multiple factors such as, difficulty level, and diversity of student answers, vary significantly across questions, leading to different ASAG techniques emerging as superior for different evaluation measures. Building upon this observation, we propose to automatic learning of a mapping from questions to ASAG techniques using minimal human (expert/crowd) feedback. We do this by formulating the learning task as a contextual bandits problem and providing a rigorous regret minimization algorithm that handles key practical considerations, such as, noisy experts and similarity between questions. Our approach offers the flexibility to include new ASAG systems on the fly and does not require the human expert to have knowledge of the working details of the system while providing feedback. With extensive simulations on a standard dataset, we demonstrate that our approach yields outcomes that are remarkably consistent with human evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)

    Article  Google Scholar 

  2. Roy, S., Narahari, Y., Deshmukh, O.D.: A perspective on computer assisted assessment techniques for short free-text answers. In: Computer Assisted Assessment. Research into E-Assessment, pp. 96–109. Springer (2015)

  3. Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL). Association for Computational Linguistics (2009)

  4. Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Dec 3–6, 2007, pp. 817–824 (2007)

  5. Stevens, S.S.: On the theory of scales of measurement (1946)

  6. Myroslava, O., et al.: Semeval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. Technical report (2013)

  7. Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)

    Article  Google Scholar 

  8. Mitchell, T., Russell, T., Broomhead, P., Aldridge, N.: Towards robust computerized marking of free-text responses. In: Proceedings of 6th International Computer Aided Assessment Conference (2002)

  9. Madnani, N., Burstein, J., Sabatini, J., OReilly, T.: Automated scoring of a summary writing task designed to measure reading comprehension. In: Proceedings of the 8th Workshop on Use of NLP for Building Educational Applications (2013)

  10. Hauke, J., Kossowski, T.: Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest. Geogr. 30(2), 87–93 (2011)

    Google Scholar 

  11. Mukaka, M.M.: A guide to appropriate use of correlation coefficient in medical research. Malawi Med. J. 24(3), 69–71 (2012)

    Google Scholar 

  12. Newson, R.: Parameters behind “nonparametric” statistics: Kendall’s tau, somers’ d and median differences. Stata J. 2(1), 45–64(20) (2002)

    Google Scholar 

  13. Powers, D.M.W.: The problem with kappa. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 345–355. The Association for Computer Linguistics (2012)

  14. Willmott, C.J.: Some comments on the evaluation of model performance. Bull. Am. Meteorol. Soc. 63, 1309–1369 (1982)

    Article  Google Scholar 

  15. Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)

    Article  Google Scholar 

  16. Chai, T., Draxler, R.R.: Root mean square error (rmse) or mean absolute error (mae)?—arguments against avoiding rmse in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)

    Article  Google Scholar 

  17. Ferri, C., Hernndez-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)

    Article  Google Scholar 

  18. Freitas, A.A.: Comprehensible classification models: a position paper. ACM SIGKDD Explor. Newslett. 15(1), 1–10 (2014)

    Article  Google Scholar 

  19. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)

    Article  Google Scholar 

  20. Cardoso, J.S., Sousa, R.G.: Measuring the performance of ordinal classification. Int. J. Pattern Recognit. Artif. Intell. 25(8), 1173–1195 (2011)

    Article  MathSciNet  Google Scholar 

  21. Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  22. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  MATH  Google Scholar 

  23. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Agarwal, A., Hsu, D.J., Kale, S., Langford, J., Li, L., Schapire, R.E.: Taming the monster: a fast and simple algorithm for contextual bandits. In: CoRR, abs/1402.0555 (2014)

  25. Hofmann, K., Whiteson, S., de Rijke, M.: Contextual bandits for information retrieval. In: NIPS 2011 Workshop on Bayesian Optimization, Experimental Design, and Bandits, vol. 12 (2011)

  26. Lan, A.S., Baraniuk, R.G.: A contextual bandits framework for personalized learning action selection. In: Proceedings of the 9th International Conference on Educational Data Mining, pp. 424–429 (2016)

  27. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  28. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. Technical report, IBM Research Report (2001)

  29. Perez, D., Alfonseca, E., Rodríguez, P.: Application of the bleu method for evaluating free-text answers in an e-learning environment. In: LREC, European Language Resources Association (2004)

  30. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  31. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics (1997)

  32. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  33. Roy, S., Bhatt, H.S., Narahari, Y.: Transfer learning for automatic short answer grading. In: Proceedings of the European Conference on Artificial Intelligence (ECAI) (2016)

  34. Lieven, E.V.M.: Conversations between mothers and young children: individual differences and their possible implication for the study of language learning. na (1978)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shourya Roy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, S., Rajkumar, A. & Narahari, Y. Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures. Int J Adv Eng Sci Appl Math 10, 105–113 (2018). https://doi.org/10.1007/s12572-017-0202-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12572-017-0202-9

Keywords

Navigation