Assessing Spoken Language Ability: A Many-Facet Rasch Analysis

  • Sahbi HidriEmail author
Part of the Second Language Learning and Teaching book series (SLLT)


Assessing speaking in a useful way has been attended with some concerns, such as scoring subjectivity and test bias. Most often, scoring the speaking performance might result in some unfairness that could possibly emanate from these two issues. This, in turn, could harm the life of test-takers and many other stakeholders. This article investigated the assessment of speaking among learners of English in an ESP program in an EFL context. To this end, 213 test-takers were assessed on their speaking ability by 12 raters using a five-rubric scale (task achievement, fluency, grammar, vocabulary and phonology). Each candidate was assessed by two raters, totaling six pairs of two raters each). The speaking exam included four parts only of which two parts were graded, thus excluding the opening and closing sections. All the exam questions were pre-formulated and teachers were instructed to stick to the frame of the exam. The results of the study showed that generally the test-takers’ speaking ability was scored more leniently than harshly and that raters were biased towards the speaking rubrics, which indicated a fuzzy idea about such rubrics. Assuredly, the different statistical tests of the FACETS showed that the speaking exam was neither valid nor reliable. The study had relevant implications for rater training in how to score speaking in an objective way and for recommending writing a list of test specifications (specs) to design useful and fair speaking exams in similar-related contexts.


Speaking Rubrics Rater ESP Candidate Interaction Curriculum Textbooks Measurement Construct RASCH FACETS 


  1. Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115–129.CrossRefGoogle Scholar
  2. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. Scholar
  3. Blanche, P. (1990). Using standardized achievement and oral proficiency tests for self-assessment purposes: the DLIFLC study. Language Testing, 7(2), 202–229.CrossRefGoogle Scholar
  4. Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental measurement in the human sciences (3rd ed.). New York: Routledge.CrossRefGoogle Scholar
  5. Bonk, W. J., & Ockey, G. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89–110. Scholar
  6. Brown, A. (1993). The role of test-taker feedback in the test development process: Test-takers’ reactions to a tape-mediated test of proficiency in spoken Japanese. Language Testing, 10, 277–301.CrossRefGoogle Scholar
  7. Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20(1), 1–25. Scholar
  8. Brown, H. D. (2004). Language assessment: Principle and classroom practices. White Plains, NY: Longman.Google Scholar
  9. Chalhoub-Deville, M. (1995). A contextualized approach to describing oral language proficiency. Language Learning: A Journal of Research in Language Studies, 45(2), 251–281. Scholar
  10. Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. Scholar
  11. Chen, Z., & Henning, G. (1985). Linguistic and cultural bias on language proficiency tests. Language Testing, 2, 155–163.CrossRefGoogle Scholar
  12. Clark, J. L. D. (1988). Validation of a tape-mediated ACTFL/ILR-scale based test of Chinese speaking proficiency. Language Testing, 5(2), 187–205.CrossRefGoogle Scholar
  13. Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing, 11, 125–144.CrossRefGoogle Scholar
  14. Ducasse, A. M., & Brown, A. (2009). Assessing paired orals: Raters’ orientation to interaction. Language Testing, 26(3), 423–443. Scholar
  15. Elder, C., Iwashita, N., & McNamara, T. (2001). Estimating the difficulty of oral proficiency tasks: What does the teat-taker have to offer. Language Testing, 19(4), 347–368. Scholar
  16. Fulcher, G. (1996a). Testing tasks: Issues in task design and the group oral. Language Testing, 13, 23–51.CrossRefGoogle Scholar
  17. Fulcher, G. (1996b). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13, 208–238.CrossRefGoogle Scholar
  18. Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399. Scholar
  19. Grant, L. (1997). Testing the language proficiency of bilingual teachers: Arizona’s Spanish proficiency test. Language Testing, 14(1), 23–46. Scholar
  20. Hidri, S. (2015). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.Google Scholar
  21. Hidri, S. (2017). Specs validation of a dynamic reading comprehension test for EAP learners in an EFL context. In S. Hidri & C. Coombe (Eds.), Evaluation in foreign language education (pp. 315–337). Cham: Springer.Google Scholar
  22. Hultsijn, J. H., Schoonen, R., de Jong, N. H., Steinel, M. P., & Florijn, A. (2011). Linguistic competences of learners of Dutch as a second language at the B1 and B2 levels of speaking proficiency of the Common European Framework of Reference for Languages (CEFR). Language Testing, 29(2), 203–221. Scholar
  23. Jin, Y. (2000). The washback of CET-SET on teaching. Foreign Languages World, 4, 57–62.Google Scholar
  24. Jin, T., & Mak, B. (2012). Distinguishing features in scoring L2 Chinese speaking performance: How do they work? Language Testing, 30(1), 23–47. Scholar
  25. Kormos, J. (1999). Simulating conversations in oral- proficiency assessment: A conversation analysis of role plays and non-scripted interviews in language exams. Language Testing, 16(2), 163–188. Scholar
  26. Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 151–172. Scholar
  27. Leaper, D. A., & Riazi, M. (2014). The influence of prompt on group oral tests. Language Testing, 31(2), 177–204. Scholar
  28. Linacre, J. M. (2011). A user’s guide to FACETS Rasch Model computer program. Available online at
  29. Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. Scholar
  30. Lumley, T., & Brown, A. (1997). Interviewer variability in specific-purpose language performance tests. In V. Kohonen, A. Huhta, L. Kurki-Suonio, & S. Luoma (Eds.), Current developments and alternatives in language assessment: Proceedings of LTRC 96 (pp. 137–150). Jyvaskyla: University of Jyvaskyla and University of Tampere.Google Scholar
  31. Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. Scholar
  32. Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. Scholar
  33. McNamara, T. (1996). Measuring second language performance. London: Addison Wesley Longman.Google Scholar
  34. McNamara, T. F. (1997). ‘Interaction’ in second language performance assessment: Whose performance? Applied Linguistics, 18, 446–465. Scholar
  35. McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333–349. Scholar
  36. McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14(2), 140–156. Scholar
  37. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.CrossRefGoogle Scholar
  38. Myford, C. M., & Wolf, E. W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs (TOEFL Technical Report, N 15). Princeton, NJ: Educational Testing Service.Google Scholar
  39. Nitta, R., & Nakatsuhara, F. (2014). A multifaceted approach to investigating pre-task planning effects on paired oral test performance. Language Testing, 31(2), 147–175. Scholar
  40. O’Loughlin, K. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. Language Testing, 12, 217–237.CrossRefGoogle Scholar
  41. O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169–192. Scholar
  42. O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33–56. Scholar
  43. Ockey, G. J., Koyama, D., Setoguchi, E., & Sun, A. (2015). The extent to which TOEFL iBT speaking scores are associated with performance on oral language tasks and oral ability components for Japanese university students. Language Testing, 32(1), 39–62. Scholar
  44. Pinget, A., Bosker, H. R., Quené, H., & de Jong, N. (2014). Native speakers’ perceptions of fluency and accent in L2 speech. Language Testing, 31(3), 349–365. Scholar
  45. Powers, D. E., Schedl, M. A., & Leung, W. (1999). Validating the revised Test of Spoken English against a criterion of communicative success. Language Testing, 16(4), 399–425. Scholar
  46. Ross, S. (1992). Accommodative questions in oral proficiency interviews. Language Testing, 9, 173–185.CrossRefGoogle Scholar
  47. Salaberry, R. (2000). Revising the revised format of the ACTFL oral proficiency interview. Language Testing, 17(3), 289–310. Scholar
  48. Sato, T. (2011). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. Scholar
  49. Scott, M. L. (1986). Student affective reactions to oral language tests. Language Testing, 3(1), 99–118.CrossRefGoogle Scholar
  50. Shin, S.-K. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Testing, 22(1), 31–57. Scholar
  51. Shohamy, E. (1982). Affective consideration in language testing. Modern Language Journal, 66, 13–17.CrossRefGoogle Scholar
  52. Shohamy, E. (1994). The validity of direct versus semi-direct oral tests. Language Testing, 11, 99–123.CrossRefGoogle Scholar
  53. Shohamy, E., & Reves, T. (1985). Authentic language tests: Where from and where to? Language Testing, 2, 48–59.CrossRefGoogle Scholar
  54. Shohamy, E., Reves, T., & Bejarano, T. (1986). Introducing a new comprehensive test of oral proficiency. ELT Journal, 40, 212–220.CrossRefGoogle Scholar
  55. Spolsky, B. (1990). Oral examinations: An historical note. Language Testing, 7(2), 158–173. Scholar
  56. Swain, M. (1993). Second language testing and second language acquisition: Is there a conflict with traditional psychometrics? Language Testing, 10, 193–207.CrossRefGoogle Scholar
  57. Swain, M. (2001). Examining dialogue: Another approach to content specification and to validating inferences drawn from test scores. Language Testing, 18(3), 275–302.CrossRefGoogle Scholar
  58. Swain, M., Brooks, L., & Tocalli-Beller, A. (2002). Peer-peer dialogue as a means of second language learning. Annual Review of Applied Linguistics, 22, 171–185. Scholar
  59. Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second language speaking ability: Test method and learner discourse. Language Testing, 16, 82–111.CrossRefGoogle Scholar
  60. Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 411–440. Scholar
  61. Van Moere, A. (2012). A psycholinguistic approach to oral language proficiency. Language Testing, 29(3), 325–344. Scholar
  62. Wesche, M. B. (1987). Second language performance testing: The Ontario Test of ESL as an example. Language Testing, 4, 28–47.CrossRefGoogle Scholar
  63. Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14(1), 85–106. Scholar
  64. Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31(4), 501–527. Scholar
  65. Zeidner, M., & Bensoussan, M. (1988). College students’ attitudes towards written versus oral tests of English as a Foreign Language. Language Testing, 5, 100–114.CrossRefGoogle Scholar
  66. Zhao, Z. (2013). Diagnosing the English-speaking ability of college students in China–Validation of the Diagnostic College English Speaking Test. RELC Journal, 44(3), 341–359. Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Faculty of Human and Social Sciences of Tunis, Tunis UniversityTunisTunisia

Personalised recommendations