Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments
Large-scale assessment refers to tests that are administered to large numbers of students and are used at local, state, and national levels to measure the progress of schools with respect to educational standards. In order to have accurate and fair measurements, large-scale assessment systems need to include all available students, which means a high volume of students, with large numbers of exams to be marked. The amount of marking that is required is extensive; thus marking exams at this scale requires a lot of work, which means a high volume of exam scripts need to be marked by tens of thousands of examiners appointed by the exam boards. The need for large-scale assessments and the high cost of manual marking and limited “turn around” time have led to developments, over some years, of automated assessment and marking. This chapter reviews the history and development of automated assessment systems. It includes findings from empirical research as well as highlights the theoretical considerations that emerge from such developments. In addition, the practical aspects of developing such assessments are explored with examples primarily from the UK and USA, including the systems and tools available, the current capabilities of natural language processing (NLP) approaches, and their limitations, ethical concerns, and future potential.
KeywordsLarge-scale assessment Automated assessment Automated analysis of student scripts Ethical issues in automated scoring
- Adesina, A. O. (2016). A semi-automatic computer-aided assessment framework for primary mathematics. A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.Google Scholar
- Attali, Y. (2013). Validity and reliability of automated essay scoring. In M. D. Shermis & J. C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181–199). Oxon: Routledge.Google Scholar
- Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3), 1–31.Google Scholar
- Attali, Y., & Powers, D. (2008). A developmental writing scale. ETS Research Report Series, 2008(1), i–59.Google Scholar
- Bektik, D. (2017). Learning analytics for academic writing through automatic identification of meta-discourse (Doctoral Thesis). The Open University, UK.Google Scholar
- Bennett, R. E. (2011). Automated scoring of constructed-response literacy and mathematics items. Retrieved April, 14, 2011.Google Scholar
- Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python -analyzing text with the natural language toolkit (1st ed.). Beijing: O’Reilly.Google Scholar
- Bloom, B. S. (1971). Handbook on formative and summative evaluation of student learning. New York: McGraw-Hill Book Company.Google Scholar
- Bridgeman, B. (2013). Human ratings and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (1st ed., pp. 221–232). Oxon: Routledge.Google Scholar
- Burstein, J., & Chodorow, M. (2010). Progress and new directions in technology for automated essay evaluation. In R. Kaplan (Ed.), The Oxford handbook of applied linguistics (2nd ed., pp. 487–497). Oxford: Oxford University Press.Google Scholar
- Crossley, S. A., Salsbury, T., McCarthy, P. M., & McNamara, D. S. (2008). LSA as a measure of second language natural discourse. In V. Sloutsky, B. Love, & K. McRae (Eds.), In Proceedings of the 30th annual conference of the cognitive science society (pp. 1906–1911). Washington, D.C. Cognitive Science Society.Google Scholar
- Duffin, C. (2013). Two-thirds of exams marked online, report finds. (2017, June 1). Retrieved from http://www.telegraph.co.uk/education/educationnews/10103816/Two-thirds-of-exams-marked-online-report-finds.html.
- Frost, J. (2008). Automated marking of exam papers: using semantic parsing. Oxford University.Google Scholar
- Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. Paper presented at the proceedings of EdMedia, Seattle.Google Scholar
- Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. Paper presented at the IJCAI.Google Scholar
- Harris, S. (2009). Computers being used to mark exams in trial that could be rolled out to GCSEs and A-Levels. (2017, June 1). Retrieved from http://www.dailymail.co.uk/news/article-1215986/Computers-used-mark-exams-trial-rolled-GCSEs-A-Levels.html.
- Harrison, A. (2013). Growth of online marking ‘could improve reliability’. (2017, June 1). Retrieved from http://www.bbc.co.uk/news/education-22798009.
- Herrington, A., & Moran, C. (2012). Writing to a machine is not writing at all. In N. Elliot & L. Perelman (Eds.), Writing assessment in the 21st century: Essays in honor of Edward M. White (pp. 219–232). New York: Hampton Press.Google Scholar
- Hyland, K., & Tse, P. (2007). Is there an“ academic vocabulary”? TESOL Quarterly, 235–253.Google Scholar
- Irvin, C. (2009). Computers to mark English exams. (2017, June 1). Retrieved from http://www.telegraph.co.uk/education/educationnews/6229225/Computers-to-mark-English-exams.html.
- Johnson, J., Shum, S.B., Willis, A., Bishop, S., Zamenopoulos, T., Swithenby, S., MacKay, R., Merali, Y., Lorincz, A., Costea, C., & Bourgine, P. (2013). The FuturICT education accelerator. The European Physical Journal Special Topics, 214(1):215–243.Google Scholar
- Kurvinen, E., Linden, R., Lokkila, E., & Laakso, M. J. (2015). Computer-assisted learning: Using automatic assessment and immediate feedback in first grade mathematics. 7th International Conference on Education and New Learning Technologies, Barcelona, 2303–2312.Google Scholar
- Landauer, T. K. (2003). Automatic essay assessment. Assessment in education: Principles, policy & practice, 10(3), 295–308. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
- Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the intelligent essay assessor. Automated essay scoring: A cross-disciplinary perspective, 87–112.Google Scholar
- Loper, E., Yi, S.-T., & Palmer, M. (2007). Combining lexical resources: Mapping between propbank and verbnet. Paper presented at the Proceedings of the 7th International Workshop on Computational Linguistics, Tilburg.Google Scholar
- Mason, O., & Grove-Stephensen, I. (2002). Automated free text marking with paperless school.Google Scholar
- Mayfield, E., & Rosé, C. P. (2013). LightSIDE: Open source machine learning for text. In M. D. Shermis & J. C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 124–135). Oxon: Routledge.Google Scholar
- McNamara, D. S., & Graesser, A. C. (2012). Coh-mMetrix: An automated tool for theoretical and applied natural language processing. In Applied natural language processing and content analysis: Identification, investigation, and resolution. Hershey: IGI Global.Google Scholar
- McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press.Google Scholar
- Ofqual (2013). Reforms to GCSEs in England from 2015. https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/529385/2013-11-01-reforms-to-gcses-in-england-from-2015-summary.pdf
- Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238–243.Google Scholar
- Page, E. B., & Paulus, D. H. (1968). The analysis of essays by computer. Final report. Retrieved from. Storrs: University of Connecticut.Google Scholar
- Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561.Google Scholar
- PISA. (2013) Draft reading literacy framework. (2017, June 1). Retrieved from https://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20Reading%20Framework%20.pdf.
- Programme for International Student Assessment. (2017, June 1). Retrieved from http://www.oecd.org/pisa/.
- Ras, E., Whitelock, D., & Kalz, M. (2015). The promise and potential of e- assessment for learning. In P. Reimann, S. Bull, M. Kickmeier-Rust, R. Vatrapu, & B. Wasson (Eds.), Measuring and visualizing learning in the information-rich classroom (pp. 21–40). New York: Routledge. ISBN-10: 113802113X.Google Scholar
- Rivers, B. A., Whitelock, D., Richardson, J. T., Field, D., & Pulman, S. (2014). Functional, frustrating and full of potential: Learners’ experiences of a prototype for automated essay feedback. In M. Kalz & E. Ras (Eds.), Computer assisted assessment: Research into e-assessment. Communications in computer and information science (439) (pp. 40–52). Cham: Springer.Google Scholar
- Shermis, M. D., & Burstein, J. C. (2003). Automated essay scoring: A cross-disciplinary perspective. Mahwah: Lawrence Erlbaum.Google Scholar
- Shermis, M. D., & Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and new directions. Oxon: Routledge.Google Scholar
- Sinha, R., & Mihalcea, R. (2009). Combining lexical resources for contextual synonym expansion. Paper presented at the Proceedings of the International Conference RANLP.Google Scholar
- Sukkarieh, J. Z., Pulman, S. G., & Raikes, N. (2003, October). Auto-marking: Using computational linguistics to score short, free text responses. In the annual conference of the International Association for Educational Assessment (IAEA), Manchester.Google Scholar
- Thurlow, M., Quenemoen, R., Thompson, S., & Lehr, C. (2001). Principles and characteristics of inclusive assessment and accountability systems. Synthesis report.Google Scholar
- Whitelock, D., Field, D., Pulman, S., Richardson, J. T., & Van Labeke, N. (2014). Designing and testing visual representations of draft essays for higher education students. Paper presented at the 2nd International Workshop on Discourse-Centric Learning Analytics, 4th Conference on Learning Analytics and Knowledge, Indianapolis.Google Scholar
- Whitelock, D., Twiner, A., Richardson, J. T., Field, D., & Pulman, S. (2015). OpenEssayist: A supply and demand learning analytics tool for drafting academic essays. Paper presented at the Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, Poughkeepsie.Google Scholar