Assessing Spoken Language Ability: A Many-Facet Rasch Analysis

Hidri, Sahbi

doi:10.1007/978-3-319-62884-4_2

Sahbi Hidri³

Part of the book series: Second Language Learning and Teaching ((SLLT))

2075 Accesses
5 Citations

Abstract

Assessing speaking in a useful way has been attended with some concerns, such as scoring subjectivity and test bias. Most often, scoring the speaking performance might result in some unfairness that could possibly emanate from these two issues. This, in turn, could harm the life of test-takers and many other stakeholders. This article investigated the assessment of speaking among learners of English in an ESP program in an EFL context. To this end, 213 test-takers were assessed on their speaking ability by 12 raters using a five-rubric scale (task achievement, fluency, grammar, vocabulary and phonology). Each candidate was assessed by two raters, totaling six pairs of two raters each). The speaking exam included four parts only of which two parts were graded, thus excluding the opening and closing sections. All the exam questions were pre-formulated and teachers were instructed to stick to the frame of the exam. The results of the study showed that generally the test-takers’ speaking ability was scored more leniently than harshly and that raters were biased towards the speaking rubrics, which indicated a fuzzy idea about such rubrics. Assuredly, the different statistical tests of the FACETS showed that the speaking exam was neither valid nor reliable. The study had relevant implications for rater training in how to score speaking in an objective way and for recommending writing a list of test specifications (specs) to design useful and fair speaking exams in similar-related contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115–129.
Article Google Scholar
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. https://doi.org/10.1080/0969595980050102.
Article Google Scholar
Blanche, P. (1990). Using standardized achievement and oral proficiency tests for self-assessment purposes: the DLIFLC study. Language Testing, 7(2), 202–229.
Article Google Scholar
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.
Book Google Scholar
Bonk, W. J., & Ockey, G. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89–110. https://doi.org/10.1177/026553229901600105.
Article Google Scholar
Brown, A. (1993). The role of test-taker feedback in the test development process: Test-takers’ reactions to a tape-mediated test of proficiency in spoken Japanese. Language Testing, 10, 277–301.
Article Google Scholar
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20(1), 1–25. https://doi.org/10.1191/0265532203lt242oa.
Article Google Scholar
Brown, H. D. (2004). Language assessment: Principle and classroom practices. White Plains, NY: Longman.
Google Scholar
Chalhoub-Deville, M. (1995). A contextualized approach to describing oral language proficiency. Language Learning: A Journal of Research in Language Studies, 45(2), 251–281. https://doi.org/10.1111/j.1467-1770.1995.tb00440.x.
Article Google Scholar
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. https://doi.org/10.1002/9781405198431.wbeal0126.
Article Google Scholar
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias on language proficiency tests. Language Testing, 2, 155–163.
Article Google Scholar
Clark, J. L. D. (1988). Validation of a tape-mediated ACTFL/ILR-scale based test of Chinese speaking proficiency. Language Testing, 5(2), 187–205.
Article Google Scholar
Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing, 11, 125–144.
Article Google Scholar
Ducasse, A. M., & Brown, A. (2009). Assessing paired orals: Raters’ orientation to interaction. Language Testing, 26(3), 423–443. https://doi.org/10.1177/0265532209104669.
Article Google Scholar
Elder, C., Iwashita, N., & McNamara, T. (2001). Estimating the difficulty of oral proficiency tasks: What does the teat-taker have to offer. Language Testing, 19(4), 347–368. https://doi.org/10.1191/0265532202lt235oa.
Article Google Scholar
Fulcher, G. (1996a). Testing tasks: Issues in task design and the group oral. Language Testing, 13, 23–51.
Article Google Scholar
Fulcher, G. (1996b). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13, 208–238.
Article Google Scholar
Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399. https://doi.org/10.1177/0265532210364407.
Article Google Scholar
Grant, L. (1997). Testing the language proficiency of bilingual teachers: Arizona’s Spanish proficiency test. Language Testing, 14(1), 23–46. https://doi.org/10.1177/026553229701400103.
Article Google Scholar
Hidri, S. (2015). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
Google Scholar
Hidri, S. (2017). Specs validation of a dynamic reading comprehension test for EAP learners in an EFL context. In S. Hidri & C. Coombe (Eds.), Evaluation in foreign language education (pp. 315–337). Cham: Springer.
Google Scholar
Hultsijn, J. H., Schoonen, R., de Jong, N. H., Steinel, M. P., & Florijn, A. (2011). Linguistic competences of learners of Dutch as a second language at the B1 and B2 levels of speaking proficiency of the Common European Framework of Reference for Languages (CEFR). Language Testing, 29(2), 203–221. https://doi.org/10.1177/0265532211419826.
Article Google Scholar
Jin, Y. (2000). The washback of CET-SET on teaching. Foreign Languages World, 4, 57–62.
Google Scholar
Jin, T., & Mak, B. (2012). Distinguishing features in scoring L2 Chinese speaking performance: How do they work? Language Testing, 30(1), 23–47. https://doi.org/10.1177/0265532212442637.
Article Google Scholar
Kormos, J. (1999). Simulating conversations in oral- proficiency assessment: A conversation analysis of role plays and non-scripted interviews in language exams. Language Testing, 16(2), 163–188. https://doi.org/10.1177/026553229901600203.
Article Google Scholar
Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 151–172. https://doi.org/10.1177/026553229601300202.
Article Google Scholar
Leaper, D. A., & Riazi, M. (2014). The influence of prompt on group oral tests. Language Testing, 31(2), 177–204. https://doi.org/10.1177/0265532213498237.
Article Google Scholar
Linacre, J. M. (2011). A user’s guide to FACETS Rasch Model computer program. Available online at www.winsteps.com
Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. https://doi.org/10.1177/0265532214530699.
Article Google Scholar
Lumley, T., & Brown, A. (1997). Interviewer variability in specific-purpose language performance tests. In V. Kohonen, A. Huhta, L. Kurki-Suonio, & S. Luoma (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 96 (pp. 137–150). Jyvaskyla: University of Jyvaskyla and University of Tampere.
Google Scholar
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. https://doi.org/10.1177/026553229801500202.
Article Google Scholar
Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. https://doi.org/10.1191/0265532202lt221oa.
Article Google Scholar
McNamara, T. (1996). Measuring second language performance. London: Addison Wesley Longman.
Google Scholar
McNamara, T. F. (1997). ‘Interaction’ in second language performance assessment: Whose performance? Applied Linguistics, 18, 446–465. https://doi.org/10.1111/j.1467-9922.2009.00496.x.
Article Google Scholar
McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333–349. https://doi.org/10.1177/026553220101800402.
Article Google Scholar
McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14(2), 140–156. https://doi.org/10.1177/026553229701400202.
Article Google Scholar
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.
Article Google Scholar
Myford, C. M., & Wolf, E. W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs (TOEFL Technical Report, N 15). Princeton, NJ: Educational Testing Service.
Google Scholar
Nitta, R., & Nakatsuhara, F. (2014). A multifaceted approach to investigating pre-task planning effects on paired oral test performance. Language Testing, 31(2), 147–175. https://doi.org/10.1177/0265532213514401.
Article Google Scholar
O’Loughlin, K. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. Language Testing, 12, 217–237.
Article Google Scholar
O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169–192. https://doi.org/10.1191/0265532202lt226oa.
Article Google Scholar
O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33–56. https://doi.org/10.1191/0265532202lt219oa.
Article Google Scholar
Ockey, G. J., Koyama, D., Setoguchi, E., & Sun, A. (2015). The extent to which TOEFL iBT speaking scores are associated with performance on oral language tasks and oral ability components for Japanese university students. Language Testing, 32(1), 39–62. https://doi.org/10.1177/0265532214538014.
Article Google Scholar
Pinget, A., Bosker, H. R., Quené, H., & de Jong, N. (2014). Native speakers’ perceptions of fluency and accent in L2 speech. Language Testing, 31(3), 349–365. https://doi.org/10.1177/0265532214526177.
Article Google Scholar
Powers, D. E., Schedl, M. A., & Leung, W. (1999). Validating the revised Test of Spoken English against a criterion of communicative success. Language Testing, 16(4), 399–425. https://doi.org/10.1191/026553299673108653.
Article Google Scholar
Ross, S. (1992). Accommodative questions in oral proficiency interviews. Language Testing, 9, 173–185.
Article Google Scholar
Salaberry, R. (2000). Revising the revised format of the ACTFL oral proficiency interview. Language Testing, 17(3), 289–310. https://doi.org/10.1177/026553220001700301.
Article Google Scholar
Sato, T. (2011). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. https://doi.org/10.1177/0265532211421162.
Article Google Scholar
Scott, M. L. (1986). Student affective reactions to oral language tests. Language Testing, 3(1), 99–118.
Article Google Scholar
Shin, S.-K. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Testing, 22(1), 31–57. https://doi.org/10.1191/0265532205lt296oa.
Article Google Scholar
Shohamy, E. (1982). Affective consideration in language testing. Modern Language Journal, 66, 13–17.
Article Google Scholar
Shohamy, E. (1994). The validity of direct versus semi-direct oral tests. Language Testing, 11, 99–123.
Article Google Scholar
Shohamy, E., & Reves, T. (1985). Authentic language tests: Where from and where to? Language Testing, 2, 48–59.
Article Google Scholar
Shohamy, E., Reves, T., & Bejarano, T. (1986). Introducing a new comprehensive test of oral proficiency. ELT Journal, 40, 212–220.
Article Google Scholar
Spolsky, B. (1990). Oral examinations: An historical note. Language Testing, 7(2), 158–173. https://doi.org/10.1177/026553229000700203.
Article Google Scholar
Swain, M. (1993). Second language testing and second language acquisition: Is there a conflict with traditional psychometrics? Language Testing, 10, 193–207.
Article Google Scholar
Swain, M. (2001). Examining dialogue: Another approach to content specification and to validating inferences drawn from test scores. Language Testing, 18(3), 275–302.
Article Google Scholar
Swain, M., Brooks, L., & Tocalli-Beller, A. (2002). Peer-peer dialogue as a means of second language learning. Annual Review of Applied Linguistics, 22, 171–185. https://doi.org/10.1017/S0267190502000090.
Article Google Scholar
Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second language speaking ability: Test method and learner discourse. Language Testing, 16, 82–111.
Article Google Scholar
Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 411–440. https://doi.org/10.1191/0265532206lt336oa.
Article Google Scholar
Van Moere, A. (2012). A psycholinguistic approach to oral language proficiency. Language Testing, 29(3), 325–344. https://doi.org/10.1177/0265532211424478.
Article Google Scholar
Wesche, M. B. (1987). Second language performance testing: The Ontario Test of ESL as an example. Language Testing, 4, 28–47.
Article Google Scholar
Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14(1), 85–106. https://doi.org/10.1177/026553229701400105.
Article Google Scholar
Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31(4), 501–527. https://doi.org/10.1177/0265532214536171.
Article Google Scholar
Zeidner, M., & Bensoussan, M. (1988). College students’ attitudes towards written versus oral tests of English as a Foreign Language. Language Testing, 5, 100–114.
Article Google Scholar
Zhao, Z. (2013). Diagnosing the English-speaking ability of college students in China–Validation of the Diagnostic College English Speaking Test. RELC Journal, 44(3), 341–359. https://doi.org/10.1177/0033688213500581.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Human and Social Sciences of Tunis, Tunis University, Tunis, Tunisia
Sahbi Hidri

Authors

Sahbi Hidri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sahbi Hidri .

Editor information

Editors and Affiliations

Department of English, Faculty of Human and Social Sciences of Tunis, Tunis, Tunisia
Sahbi Hidri

Appendices

Appendix 1 Notes for Oral Assessors

There are two distinct tasks and it is important that each candidate gets a fair chance at both. Timings and question types should be adhered to strictly.

The mark sheet is designed to permit separate task-by-task marking. This is the preferred method as the tasks have different focuses. However, this does place a strain on the assessor who has to make four separate assessments (Candidate A task 1, Candidate B task 1, Candidate A task 2, Candidate B task 2). If assessors find this unwieldy they may mark holistically, i.e., give marks for the two assessed tasks overall.

The speaking topics are arranged so that the paired discussion questions are related to the individual question boxes to the immediate left of them.

Appendix 2 Paired Format

1.1 Interlocutor

1.1.1 Assessor

Phase A	Introduction	Not assessed
Phase B	Task 1	Assessed
Phase C	Task 2	Assessed
Phase D	Closing	Not assessed

Phase A	Introduction
Interlocutor • Good morning/afternoon • In this interview, we are going to ask each of you some questions relating to the topics you have studies so far this term. You should direct your answers to your partner. After that, we will introduce a similar topic for discussion, and we would like you to discuss it together. You are mainly going to be speaking together and my colleague and I will be listening to you • Any questions? The candidates come into the room. The assessors greet and invite them to sit down, smiling and putting them at their ease

Phase B	Task 1 (individual)	2 min each
Interlocutor • In this part of the speaking test, we would like each of you to answer a question that relates to one of the topics you have studied so far • Your question is … [from question bank provided—Task 1] This phase offers the students a chance to talk to their partner to express their opinion on their own before they discuss together in Phase C The assessors should make sure marks for this task are recorded on mark sheets

Phase C	Task 2 (paired discussion)	3–4 min
Interlocutor • In this part of the test, we are going to give you a question relating to one of the questions you have just answered, and we would like you to discuss it together. You are encouraged to interact with each other by asking questions and responding appropriately. Ok, your question is … (from question bank provided—Task 2) • Is that clear? Ok you have about 3–4 min for this, so please begin • Thank you The assessor should make sure marks for this task are recorded on the mark sheets

Phase D	Closing
Interlocutor and Assessor • That is the end of the test • Thank you and goodbye Candidates exit and Interlocutor and Assessor finalize marks using the mark sheets and assessment criteria provided

*Interlocutor has primary responsibility for managing the interaction. Assessor should take no part in this but concentrate on assessing and completing score sheets. The Interlocutor should be prepared to offer confirmation/querying of Assessor’s marks

Appendix 3: Speaking Test Assessment Criteria

Score	Task Achievement (Task 1)	Interactive communication (Task 2)	Fluency	Grammar	Vocabulary	Phonology
5	Task accomplished fully and effectively	Able to initiate and respond with ease in the interaction Takes and gives turns appropriately	Able to sustain flow of language necessary to accomplish the tasks with occasional pauses to think	Candidate has the range of grammar necessary to accomplish the tasks and is generally accurate	Candidate has the range of vocabulary necessary to accomplish the tasks	L1 features do not intrude
4	Task accomplished adequately	Able to initiate and respond sufficiently to keep the interaction going. Takes and gives turns appropriately	Able to sustain flow of language to accomplish the tasks but with occasional pauses to think and pauses to search for words	Candidate lacks either the full range necessary to accomplish the tasks or lacks necessary accuracy	Candidate lacks the full range of vocabulary necessary but achieves communication through paraphrase	L1 features present but do not obstruct understanding
3	Task accomplished to a limited degree	Able to initiate and respond with difficulty. Struggles to keep the interaction going. Takes and gives turns not appropriately	Pauses to search for words are frequent and the flow of language necessary to accomplish the tasks is not always sustained	Candidate lacks both the range and accuracy necessary to accomplish the tasks	Candidate lacks the range of vocabulary necessary to accomplish the tasks and has limited ability to paraphrase	L1 features present and occasionally obstruct understanding
2	Task attempted but not accomplished	Able to respond but not initiate. Relies on partner to keep the interaction going	Speech seems a little disconnected and sometimes difficult to follow	Candidate has range and accuracy sufficient to attempt but not to accomplish the tasks	Candidate has sufficient range to accomplish the tasks and cannot paraphrase	L1 features present and often obstruct understanding
1	Task not attempted	Limited responses and no initiations	Speech is disconnected and difficult to follow	Range and accuracy inadequate for the test	Range wholly inadequate for the test	L1 features constantly obstruct understanding
0				No adequate sample of language

Appendix 4

Speaking test score sheet: English 1111/ 1222

Student name
Student ID

Criterion	Task achievement/Interactive communication	Fluency	Grammar	Vocabulary	Phonology	Total
Score Task 1
Score Task 2

• Enter a mark 0–5 for each of the criteria
• Enter grand total in box to the right

Appendix 5

Bias interaction rater by rubrics

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hidri, S. (2018). Assessing Spoken Language Ability: A Many-Facet Rasch Analysis. In: Hidri, S. (eds) Revisiting the Assessment of Second Language Abilities: From Theory to Practice. Second Language Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-319-62884-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-62884-4_2
Published: 12 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62883-7
Online ISBN: 978-3-319-62884-4
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics