Abstract
Assessing speaking in a useful way has been attended with some concerns, such as scoring subjectivity and test bias. Most often, scoring the speaking performance might result in some unfairness that could possibly emanate from these two issues. This, in turn, could harm the life of test-takers and many other stakeholders. This article investigated the assessment of speaking among learners of English in an ESP program in an EFL context. To this end, 213 test-takers were assessed on their speaking ability by 12 raters using a five-rubric scale (task achievement, fluency, grammar, vocabulary and phonology). Each candidate was assessed by two raters, totaling six pairs of two raters each). The speaking exam included four parts only of which two parts were graded, thus excluding the opening and closing sections. All the exam questions were pre-formulated and teachers were instructed to stick to the frame of the exam. The results of the study showed that generally the test-takers’ speaking ability was scored more leniently than harshly and that raters were biased towards the speaking rubrics, which indicated a fuzzy idea about such rubrics. Assuredly, the different statistical tests of the FACETS showed that the speaking exam was neither valid nor reliable. The study had relevant implications for rater training in how to score speaking in an objective way and for recommending writing a list of test specifications (specs) to design useful and fair speaking exams in similar-related contexts.
References
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115–129.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. https://doi.org/10.1080/0969595980050102.
Blanche, P. (1990). Using standardized achievement and oral proficiency tests for self-assessment purposes: the DLIFLC study. Language Testing, 7(2), 202–229.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Bonk, W. J., & Ockey, G. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89–110. https://doi.org/10.1177/026553229901600105.
Brown, A. (1993). The role of test-taker feedback in the test development process: Test-takers’ reactions to a tape-mediated test of proficiency in spoken Japanese. Language Testing, 10, 277–301.
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20(1), 1–25. https://doi.org/10.1191/0265532203lt242oa.
Brown, H. D. (2004). Language assessment: Principle and classroom practices. White Plains, NY: Longman.
Chalhoub-Deville, M. (1995). A contextualized approach to describing oral language proficiency. Language Learning: A Journal of Research in Language Studies, 45(2), 251–281. https://doi.org/10.1111/j.1467-1770.1995.tb00440.x.
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. https://doi.org/10.1002/9781405198431.wbeal0126.
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias on language proficiency tests. Language Testing, 2, 155–163.
Clark, J. L. D. (1988). Validation of a tape-mediated ACTFL/ILR-scale based test of Chinese speaking proficiency. Language Testing, 5(2), 187–205.
Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing, 11, 125–144.
Ducasse, A. M., & Brown, A. (2009). Assessing paired orals: Raters’ orientation to interaction. Language Testing, 26(3), 423–443. https://doi.org/10.1177/0265532209104669.
Elder, C., Iwashita, N., & McNamara, T. (2001). Estimating the difficulty of oral proficiency tasks: What does the teat-taker have to offer. Language Testing, 19(4), 347–368. https://doi.org/10.1191/0265532202lt235oa.
Fulcher, G. (1996a). Testing tasks: Issues in task design and the group oral. Language Testing, 13, 23–51.
Fulcher, G. (1996b). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13, 208–238.
Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399. https://doi.org/10.1177/0265532210364407.
Grant, L. (1997). Testing the language proficiency of bilingual teachers: Arizona’s Spanish proficiency test. Language Testing, 14(1), 23–46. https://doi.org/10.1177/026553229701400103.
Hidri, S. (2015). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
Hidri, S. (2017). Specs validation of a dynamic reading comprehension test for EAP learners in an EFL context. In S. Hidri & C. Coombe (Eds.), Evaluation in foreign language education (pp. 315–337). Cham: Springer.
Hultsijn, J. H., Schoonen, R., de Jong, N. H., Steinel, M. P., & Florijn, A. (2011). Linguistic competences of learners of Dutch as a second language at the B1 and B2 levels of speaking proficiency of the Common European Framework of Reference for Languages (CEFR). Language Testing, 29(2), 203–221. https://doi.org/10.1177/0265532211419826.
Jin, Y. (2000). The washback of CET-SET on teaching. Foreign Languages World, 4, 57–62.
Jin, T., & Mak, B. (2012). Distinguishing features in scoring L2 Chinese speaking performance: How do they work? Language Testing, 30(1), 23–47. https://doi.org/10.1177/0265532212442637.
Kormos, J. (1999). Simulating conversations in oral- proficiency assessment: A conversation analysis of role plays and non-scripted interviews in language exams. Language Testing, 16(2), 163–188. https://doi.org/10.1177/026553229901600203.
Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 151–172. https://doi.org/10.1177/026553229601300202.
Leaper, D. A., & Riazi, M. (2014). The influence of prompt on group oral tests. Language Testing, 31(2), 177–204. https://doi.org/10.1177/0265532213498237.
Linacre, J. M. (2011). A user’s guide to FACETS Rasch Model computer program. Available online at www.winsteps.com
Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. https://doi.org/10.1177/0265532214530699.
Lumley, T., & Brown, A. (1997). Interviewer variability in specific-purpose language performance tests. In V. Kohonen, A. Huhta, L. Kurki-Suonio, & S. Luoma (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 96 (pp. 137–150). Jyvaskyla: University of Jyvaskyla and University of Tampere.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. https://doi.org/10.1177/026553229801500202.
Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. https://doi.org/10.1191/0265532202lt221oa.
McNamara, T. (1996). Measuring second language performance. London: Addison Wesley Longman.
McNamara, T. F. (1997). ‘Interaction’ in second language performance assessment: Whose performance? Applied Linguistics, 18, 446–465. https://doi.org/10.1111/j.1467-9922.2009.00496.x.
McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333–349. https://doi.org/10.1177/026553220101800402.
McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14(2), 140–156. https://doi.org/10.1177/026553229701400202.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.
Myford, C. M., & Wolf, E. W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs (TOEFL Technical Report, N 15). Princeton, NJ: Educational Testing Service.
Nitta, R., & Nakatsuhara, F. (2014). A multifaceted approach to investigating pre-task planning effects on paired oral test performance. Language Testing, 31(2), 147–175. https://doi.org/10.1177/0265532213514401.
O’Loughlin, K. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. Language Testing, 12, 217–237.
O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169–192. https://doi.org/10.1191/0265532202lt226oa.
O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33–56. https://doi.org/10.1191/0265532202lt219oa.
Ockey, G. J., Koyama, D., Setoguchi, E., & Sun, A. (2015). The extent to which TOEFL iBT speaking scores are associated with performance on oral language tasks and oral ability components for Japanese university students. Language Testing, 32(1), 39–62. https://doi.org/10.1177/0265532214538014.
Pinget, A., Bosker, H. R., Quené, H., & de Jong, N. (2014). Native speakers’ perceptions of fluency and accent in L2 speech. Language Testing, 31(3), 349–365. https://doi.org/10.1177/0265532214526177.
Powers, D. E., Schedl, M. A., & Leung, W. (1999). Validating the revised Test of Spoken English against a criterion of communicative success. Language Testing, 16(4), 399–425. https://doi.org/10.1191/026553299673108653.
Ross, S. (1992). Accommodative questions in oral proficiency interviews. Language Testing, 9, 173–185.
Salaberry, R. (2000). Revising the revised format of the ACTFL oral proficiency interview. Language Testing, 17(3), 289–310. https://doi.org/10.1177/026553220001700301.
Sato, T. (2011). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. https://doi.org/10.1177/0265532211421162.
Scott, M. L. (1986). Student affective reactions to oral language tests. Language Testing, 3(1), 99–118.
Shin, S.-K. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Testing, 22(1), 31–57. https://doi.org/10.1191/0265532205lt296oa.
Shohamy, E. (1982). Affective consideration in language testing. Modern Language Journal, 66, 13–17.
Shohamy, E. (1994). The validity of direct versus semi-direct oral tests. Language Testing, 11, 99–123.
Shohamy, E., & Reves, T. (1985). Authentic language tests: Where from and where to? Language Testing, 2, 48–59.
Shohamy, E., Reves, T., & Bejarano, T. (1986). Introducing a new comprehensive test of oral proficiency. ELT Journal, 40, 212–220.
Spolsky, B. (1990). Oral examinations: An historical note. Language Testing, 7(2), 158–173. https://doi.org/10.1177/026553229000700203.
Swain, M. (1993). Second language testing and second language acquisition: Is there a conflict with traditional psychometrics? Language Testing, 10, 193–207.
Swain, M. (2001). Examining dialogue: Another approach to content specification and to validating inferences drawn from test scores. Language Testing, 18(3), 275–302.
Swain, M., Brooks, L., & Tocalli-Beller, A. (2002). Peer-peer dialogue as a means of second language learning. Annual Review of Applied Linguistics, 22, 171–185. https://doi.org/10.1017/S0267190502000090.
Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second language speaking ability: Test method and learner discourse. Language Testing, 16, 82–111.
Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 411–440. https://doi.org/10.1191/0265532206lt336oa.
Van Moere, A. (2012). A psycholinguistic approach to oral language proficiency. Language Testing, 29(3), 325–344. https://doi.org/10.1177/0265532211424478.
Wesche, M. B. (1987). Second language performance testing: The Ontario Test of ESL as an example. Language Testing, 4, 28–47.
Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14(1), 85–106. https://doi.org/10.1177/026553229701400105.
Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31(4), 501–527. https://doi.org/10.1177/0265532214536171.
Zeidner, M., & Bensoussan, M. (1988). College students’ attitudes towards written versus oral tests of English as a Foreign Language. Language Testing, 5, 100–114.
Zhao, Z. (2013). Diagnosing the English-speaking ability of college students in China–Validation of the Diagnostic College English Speaking Test. RELC Journal, 44(3), 341–359. https://doi.org/10.1177/0033688213500581.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1 Notes for Oral Assessors
There are two distinct tasks and it is important that each candidate gets a fair chance at both. Timings and question types should be adhered to strictly.
The mark sheet is designed to permit separate task-by-task marking. This is the preferred method as the tasks have different focuses. However, this does place a strain on the assessor who has to make four separate assessments (Candidate A task 1, Candidate B task 1, Candidate A task 2, Candidate B task 2). If assessors find this unwieldy they may mark holistically, i.e., give marks for the two assessed tasks overall.
The speaking topics are arranged so that the paired discussion questions are related to the individual question boxes to the immediate left of them.
Appendix 2 Paired Format
1.1 Interlocutor
1.1.1 Assessor
Phase A | Introduction | Not assessed |
---|---|---|
Phase B | Task 1 | Assessed |
Phase C | Task 2 | Assessed |
Phase D | Closing | Not assessed |
Phase A | Introduction | |
---|---|---|
Interlocutor • Good morning/afternoon • In this interview, we are going to ask each of you some questions relating to the topics you have studies so far this term. You should direct your answers to your partner. After that, we will introduce a similar topic for discussion, and we would like you to discuss it together. You are mainly going to be speaking together and my colleague and I will be listening to you • Any questions? The candidates come into the room. The assessors greet and invite them to sit down, smiling and putting them at their ease |
Phase B | Task 1 (individual) | 2 min each |
---|---|---|
Interlocutor • In this part of the speaking test, we would like each of you to answer a question that relates to one of the topics you have studied so far • Your question is … [from question bank provided—Task 1] This phase offers the students a chance to talk to their partner to express their opinion on their own before they discuss together in Phase C The assessors should make sure marks for this task are recorded on mark sheets |
Phase C | Task 2 (paired discussion) | 3–4 min |
---|---|---|
Interlocutor • In this part of the test, we are going to give you a question relating to one of the questions you have just answered, and we would like you to discuss it together. You are encouraged to interact with each other by asking questions and responding appropriately. Ok, your question is … (from question bank provided—Task 2) • Is that clear? Ok you have about 3–4 min for this, so please begin • Thank you The assessor should make sure marks for this task are recorded on the mark sheets |
Phase D | Closing |
---|---|
Interlocutor and Assessor • That is the end of the test • Thank you and goodbye Candidates exit and Interlocutor and Assessor finalize marks using the mark sheets and assessment criteria provided |
Appendix 3: Speaking Test Assessment Criteria
Score | Task Achievement (Task 1) | Interactive communication (Task 2) | Fluency | Grammar | Vocabulary | Phonology |
---|---|---|---|---|---|---|
5 | Task accomplished fully and effectively | Able to initiate and respond with ease in the interaction Takes and gives turns appropriately | Able to sustain flow of language necessary to accomplish the tasks with occasional pauses to think | Candidate has the range of grammar necessary to accomplish the tasks and is generally accurate | Candidate has the range of vocabulary necessary to accomplish the tasks | L1 features do not intrude |
4 | Task accomplished adequately | Able to initiate and respond sufficiently to keep the interaction going. Takes and gives turns appropriately | Able to sustain flow of language to accomplish the tasks but with occasional pauses to think and pauses to search for words | Candidate lacks either the full range necessary to accomplish the tasks or lacks necessary accuracy | Candidate lacks the full range of vocabulary necessary but achieves communication through paraphrase | L1 features present but do not obstruct understanding |
3 | Task accomplished to a limited degree | Able to initiate and respond with difficulty. Struggles to keep the interaction going. Takes and gives turns not appropriately | Pauses to search for words are frequent and the flow of language necessary to accomplish the tasks is not always sustained | Candidate lacks both the range and accuracy necessary to accomplish the tasks | Candidate lacks the range of vocabulary necessary to accomplish the tasks and has limited ability to paraphrase | L1 features present and occasionally obstruct understanding |
2 | Task attempted but not accomplished | Able to respond but not initiate. Relies on partner to keep the interaction going | Speech seems a little disconnected and sometimes difficult to follow | Candidate has range and accuracy sufficient to attempt but not to accomplish the tasks | Candidate has sufficient range to accomplish the tasks and cannot paraphrase | L1 features present and often obstruct understanding |
1 | Task not attempted | Limited responses and no initiations | Speech is disconnected and difficult to follow | Range and accuracy inadequate for the test | Range wholly inadequate for the test | L1 features constantly obstruct understanding |
0 | No adequate sample of language |
Appendix 4
Speaking test score sheet: English 1111/ 1222
Student name | |
---|---|
Student ID |
Criterion | Task achievement/Interactive communication | Fluency | Grammar | Vocabulary | Phonology | Total |
---|---|---|---|---|---|---|
Score Task 1 | ||||||
Score Task 2 |
Appendix 5
Bias interaction rater by rubrics
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Hidri, S. (2018). Assessing Spoken Language Ability: A Many-Facet Rasch Analysis. In: Hidri, S. (eds) Revisiting the Assessment of Second Language Abilities: From Theory to Practice. Second Language Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-319-62884-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-62884-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62883-7
Online ISBN: 978-3-319-62884-4
eBook Packages: EducationEducation (R0)