Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures

Roy, Shourya; Rajkumar, Arun; Narahari, Y.

doi:10.1007/s12572-017-0202-9

Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures

Published: 20 February 2018

Volume 10, pages 105–113, (2018)
Cite this article

International Journal of Advances in Engineering Sciences and Applied Mathematics Aims and scope Submit manuscript

Shourya Roy¹,
Arun Rajkumar² &
Y. Narahari³

380 Accesses
2 Citations
Explore all metrics

Abstract

Automatic short answer grading (ASAG) systems are designed to automatically assess short answers in natural language having a length of a few words to a few sentences. Many ASAG techniques have been proposed in the literature. In this paper, we critically analyse the role of evaluation measures used for assessing the quality of ASAG techniques. In real-world settings, multiple factors such as, difficulty level, and diversity of student answers, vary significantly across questions, leading to different ASAG techniques emerging as superior for different evaluation measures. Building upon this observation, we propose to automatic learning of a mapping from questions to ASAG techniques using minimal human (expert/crowd) feedback. We do this by formulating the learning task as a contextual bandits problem and providing a rigorous regret minimization algorithm that handles key practical considerations, such as, noisy experts and similarity between questions. Our approach offers the flexibility to include new ASAG systems on the fly and does not require the human expert to have knowledge of the working details of the system while providing feedback. With extensive simulations on a standard dataset, we demonstrate that our approach yields outcomes that are remarkably consistent with human evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing the Practical Benefit of Automated Short-Answer Graders

Machine Learning Approach for Automatic Short Answer Grading: A Systematic Review

The Eras and Trends of Automatic Short Answer Grading

Article 23 October 2014

References

Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)
Article Google Scholar
Roy, S., Narahari, Y., Deshmukh, O.D.: A perspective on computer assisted assessment techniques for short free-text answers. In: Computer Assisted Assessment. Research into E-Assessment, pp. 96–109. Springer (2015)
Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL). Association for Computational Linguistics (2009)
Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Dec 3–6, 2007, pp. 817–824 (2007)
Stevens, S.S.: On the theory of scales of measurement (1946)
Myroslava, O., et al.: Semeval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. Technical report (2013)
Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)
Article Google Scholar
Mitchell, T., Russell, T., Broomhead, P., Aldridge, N.: Towards robust computerized marking of free-text responses. In: Proceedings of 6th International Computer Aided Assessment Conference (2002)
Madnani, N., Burstein, J., Sabatini, J., OReilly, T.: Automated scoring of a summary writing task designed to measure reading comprehension. In: Proceedings of the 8th Workshop on Use of NLP for Building Educational Applications (2013)
Hauke, J., Kossowski, T.: Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest. Geogr. 30(2), 87–93 (2011)
Google Scholar
Mukaka, M.M.: A guide to appropriate use of correlation coefficient in medical research. Malawi Med. J. 24(3), 69–71 (2012)
Google Scholar
Newson, R.: Parameters behind “nonparametric” statistics: Kendall’s tau, somers’ d and median differences. Stata J. 2(1), 45–64(20) (2002)
Google Scholar
Powers, D.M.W.: The problem with kappa. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 345–355. The Association for Computer Linguistics (2012)
Willmott, C.J.: Some comments on the evaluation of model performance. Bull. Am. Meteorol. Soc. 63, 1309–1369 (1982)
Article Google Scholar
Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)
Article Google Scholar
Chai, T., Draxler, R.R.: Root mean square error (rmse) or mean absolute error (mae)?—arguments against avoiding rmse in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)
Article Google Scholar
Ferri, C., Hernndez-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)
Article Google Scholar
Freitas, A.A.: Comprehensible classification models: a position paper. ACM SIGKDD Explor. Newslett. 15(1), 1–10 (2014)
Article Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Article Google Scholar
Cardoso, J.S., Sousa, R.G.: Measuring the performance of ordinal classification. Int. J. Pattern Recognit. Artif. Intell. 25(8), 1173–1195 (2011)
Article MathSciNet Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)
Article MathSciNet MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
Article MathSciNet MATH Google Scholar
Agarwal, A., Hsu, D.J., Kale, S., Langford, J., Li, L., Schapire, R.E.: Taming the monster: a fast and simple algorithm for contextual bandits. In: CoRR, abs/1402.0555 (2014)
Hofmann, K., Whiteson, S., de Rijke, M.: Contextual bandits for information retrieval. In: NIPS 2011 Workshop on Bayesian Optimization, Experimental Design, and Bandits, vol. 12 (2011)
Lan, A.S., Baraniuk, R.G.: A contextual bandits framework for personalized learning action selection. In: Proceedings of the 9th International Conference on Educational Data Mining, pp. 424–429 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. Technical report, IBM Research Report (2001)
Perez, D., Alfonseca, E., Rodríguez, P.: Application of the bleu method for evaluating free-text answers in an e-learning environment. In: LREC, European Language Resources Association (2004)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics (1997)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Roy, S., Bhatt, H.S., Narahari, Y.: Transfer learning for automatic short answer grading. In: Proceedings of the European Conference on Artificial Intelligence (ECAI) (2016)
Lieven, E.V.M.: Conversations between mothers and young children: individual differences and their possible implication for the study of language learning. na (1978)

Download references

Author information

Authors and Affiliations

Big Data Labs, American Express, Bangalore, India
Shourya Roy
Conduent Labs, India, Bangalore, India
Arun Rajkumar
Indian Institute of Science, Bangalore, India
Y. Narahari

Authors

Shourya Roy
View author publications
You can also search for this author in PubMed Google Scholar
Arun Rajkumar
View author publications
You can also search for this author in PubMed Google Scholar
Y. Narahari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shourya Roy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, S., Rajkumar, A. & Narahari, Y. Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures. Int J Adv Eng Sci Appl Math 10, 105–113 (2018). https://doi.org/10.1007/s12572-017-0202-9

Download citation

Published: 20 February 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s12572-017-0202-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures

Abstract

Access this article

Similar content being viewed by others

Assessing the Practical Benefit of Automated Short-Answer Graders

Machine Learning Approach for Automatic Short Answer Grading: A Systematic Review

The Eras and Trends of Automatic Short Answer Grading

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Selection of automatic short answer grading techniques using contextual bandits for different evaluation measures

Abstract

Access this article

Similar content being viewed by others

Assessing the Practical Benefit of Automated Short-Answer Graders

Machine Learning Approach for Automatic Short Answer Grading: A Systematic Review

The Eras and Trends of Automatic Short Answer Grading

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation