Abstract
Empirical research exploring test fairness in scoring written performance has been mostly conducted in the North American context. There has been little research conducted in Asian countries such as China. Considering the extreme high stakes of large-scale testing in this context, this study examines what and how raters’ scoring decisions were affected by the features of writing intended (or unintended) to be measured in the National Matriculation English Test (NMET) in China. The study further explores whether there was any difference in rating behaviours between novice and experienced NMET raters. The results highlight the extent to which raters attended to the NMET rating scale which led to a deeper understanding of scoring fairness involved in large-scale high-stakes tests within the Chinese context and has implications on scoring fairness in other similar contexts internationally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this chapter, scoring and rating (singular) are used interchangeably.
- 2.
Cronbach alpha for the 13 Likert-scaled questions was 0.72.
- 3.
Plagiarism was a new writing feature reported by the NMET raters. By “plagiarism”, the raters referred to the fact that certain test-takers either copied sentences from reading passages in the section of Reading Comprehension in the same NMET test paper or wrote down sentences that they recited from NMET writing templates, the content of which may be loosely related to the NMET writing task.
- 4.
Relevance was another new writing feature reported by the NMET raters. By “relevance”, the raters referred to the degree to which the test-takers’ writing matched the NMET writing topic.
- 5.
Experienced NMET raters refer to raters who have had NMET rating experience more than once; novice NMET raters refer to raters who are recruited and trained as NMET rater for the first time.
- 6.
Experienced EFL teacher raters refer to raters whose EFL teaching experience is at least 5 years; novice ESL teacher raters are raters who have taught ESL for less than 5 years.
References
Alderson, J. C., & Urquhart, A. H. (1985). The effect of students’ academic discipline on their performance in ESP reading tests. Language Testing, 2, 192–204.
American Educational Research Association (AERA), American Psychological Association (APA), & National Council for Measurement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bai, L. (2011). 我国科举录取名额分配制度的历史与反思——兼谈我国高考录取中的考试剬平与区域剬平 [Reflecting on the history of Chinese imperial examination enrolment quota distribution system: Fairness in test outcomes and regional parity in Gaokao enrolment system]. Educational Innovation, (6), 6–7.
Barkaoui, K. (2007). Participants, texts, and processes in ESL/EFL essay tests: A narrative review of the literature. Canadian. Modern Language Review, 64, 99–134. doi:10.3138/cmlr.64.1.099.
Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44, 31–57. doi:10.5054/tq.2010.214047.
Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28, 51–75. doi:10.1177/0265532210376379.
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20, 1–25.
Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221–256). Westport: Praeger Publishers.
Cao, Y., & Zhang, H. (1999). Detection of differential item functioning in a Chinese vocabulary test. Acta Psychologica Sinica, 31, 460–467.
Cheng, L. (2008). The key to success: English language testing in China. Language Testing, 25, 15–37. doi:10.1177/0265532207083743.
Cheng, L. (2010). The history of examinations: Why, how, what, whom to select? In L. Cheng & A. Curtis (Eds.), English language assessment and the Chinese learner (pp. 13–26). New York: Routledge: Taylor & Francis Group.
Clapham, C. (1998). The effect of language proficiency and background knowledge on EAP students’ reading comprehension. In A. J. Kunnan (Ed.), Validation in language assessment (pp. 141–168). Mahwah: Lawrence Erlbaum.
Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369–382. doi:10.1111/j.1745-3984.2001.tb01132.x.
Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7, 31–51. doi:10.1177/026553229000700104.
Cumming, A., Kantor, R., & Powers, D. (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework (TOEFL Monograph Series MS-22). Retrieved from Educational Testing Service website: http://www.ets.org/Media/Research/pdf/RM-01-04.pdf
Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67–96.
Dai, J., Wei, X., & Liu, F. (2010). 教育考试剬平性的基本理论研究 [Fundamental theoretical research on educational test fairness]. China Higher Education Research, (8), 27–29.
Dong, S., & Ma, S. (2011). Fairness analysis on assessment with score report from measurement perspective. Examinations Research, 1, 59–64.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.
Guo, F. (2009). Fairness of automated essay scoring of GMAT® AWA (GMAC® Research Reports: RR-09-01). Retrieved from the Graduate Management Admission Council® website: http://www.gmac.com/NR/rdonlyres/FACE0811-B6F7-45A9-B57D-ED3703984B9A/0/RR0901_AWAFairness.pdf
Guo, G. (2010). 高考剬平性的影响要素分析 [Analysis of influential factors in achieving fairness in Gaokao]. Theory and Practice of Education, 30(17), 15–17.
Hale, G. (1988). Student major field and text content: Interactive effects on reading comprehension in the TOEFL. Language Testing, 5, 49–61.
Huang, J. (2007). Examining the fairness of rating ESL students’ writing on large-scale assessments. Unpublished doctoral dissertation. Queen’s University, Kingston.
Huang, J. (2011). Generalizability theory as evidence of concerns about fairness in large-scale ESL writing assessments. TESOL Journal, 2, 423–443. doi:10.5054/tj.2011.269751.
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European language testing in a global context (pp. 27–48). Cambridge: Cambridge University Press.
Kunnan, A. J. (2008). Large scale language assessment. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Language testing and assessment 2nd ed., Vol. 7, pp. 135–155). New York: Springer.
Lee, H. K. (2004). A comparative study of ESL writers’ performance in a paper-based and a computer-delivered writing test. Assessing Writing, 9, 4–26.
Li, L. (2007). 教育剬正视野中的高考录取制度改革—兼论考试剬平与区域剬平之争 [Gaokao enrolment system reform in visions of educational justice: On the dispute between fairness in test outcomes and regional parity]. Hubei Social Sciences, (9), 156–158.
Lu, Y. (2011). Fairness in writing assessment: A survey of factors that affect rater bias. Foreign Language Testing and Teaching, 2, 30–36.
Ma, S. (2009). 建国60年来我国大学入学考试制度的沿革与发展 [History and development of the university entrance examination system in China in 60 years]. Educational Measurement and Evaluation, (10), 49–52.
May, L. A. (2007). Interaction in a paired speaking test: The rater's perspective. Unpublished doctoral dissertation. The University of Melbourne, Melbourne.
Milanovic, M., Saville, N., & Shen, S. (1996). A study of the decision-making behaviour of composition markers. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment (pp. 92–114). Cambridge, UK: Cambridge University Press.
O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19, 169–192.
Qi, L. (2006). Some reflections on washback. Foreign Languages and Their Teaching, 8, 29–32.
Qi, L. (2007). Is testing an efficient agent for pedagogical change? Examining the intended washback of the writing task in a high-stakes English test in China. Assessment in Education: Principles, Policy and Practice, 14(1), 51–74.
Stricker, L. J., Rock, D. A., & Lee, Y. W. (2005). Factor structure of the LanguEdgeTM Test across language groups (TOEFL Monograph Series MS-32). Retrieved from Educational Testing Service website: http://www.ets.org/Media/Research/pdf/RR-05-12.pdf
Swinton, S. S., & Powers, D. E. (1980). Factor analysis of the TOEFL® test for several language groups (TOEFL Research Report: RR-06). Retrieved from Educational Testing Service website: http://www.ets.org/Media/Research/pdf/RR-80-32.pdf
Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111–125). Norwood: Ablex.
Wang, J. W. (2011). Study on present situation of examination fairness and countermeasure. China Examinations, 5, 53–57.
Wolfe, E. W., & Manalo, J. R. (2005). An investigation of the impact of composition medium on the quality of TOEFL writing scores (TOEFL Research Report RR-04-29). Retrieved from https://www.ets.org/Media/Research/pdf/RR-04-29.pdf
Yang, Y. (2001). 学业成绩评定的激励作用分析 [Analysis of incentives of attainment assessment]. Journal of Teaching and Management, (4), 15–16.
Zeidner, M. (1986). Are English language aptitude tests biased towards culturally different minority groups? Some Israeli findings. Language Testing, 3, 80–95.
Zeng, X., & Meng, Q. (1999). 目功能差异及其检测方法 [Differential item functioning and its detection methods]. Journal of Developments in Psychology, 7(2), 41–47.
Zhou, H., & Shen, G. (2006). Review and reflection on the history of enrolment by examination in China. Educational Research, 4, 43–48.
Zhou, J., Ding, X., Zhang, Q., & Wen, H. (2010). Empirical analysis on the fairness in national uniformed entrance examination in general colleges and universities in Beijing. Educational Research, 10, 46–52.
Zou, S. (2011). On enhancing test fairness. Foreign Language Testing and Teaching, 1, 42–50.
Acknowledgements
The study was supported by a SEED research grant (Liying Cheng: Principal Investigator) from Faculty of Education, Queen’s University, Kingston, Ontario, Canada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Mei, Y., Cheng, L. (2014). Scoring Fairness in Large-Scale High-Stakes English Language Testing: An Examination of the National Matriculation English Test. In: Coniam, D. (eds) English Language Education and Assessment. Springer, Singapore. https://doi.org/10.1007/978-981-287-071-1_11
Download citation
DOI: https://doi.org/10.1007/978-981-287-071-1_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-070-4
Online ISBN: 978-981-287-071-1
eBook Packages: Humanities, Social Sciences and LawEducation (R0)