Abstract
Test validation methods are at the heart of language testing research. The way in which validity is conceptualized determines the scope and nature of validity investigations and hence the methods to gather evidence. Validation frameworks specify the process used to prioritize, integrate, and evaluate evidence collected using various methods. This review charts the evolution of validity theory and validation frameworks and provides a brief review of current methodologies for language test validation, organized by the validity inferences to which they are related in an argument-based validation framework. It discusses some problems and challenges associated with our current test validation research and practice and proposes some major areas of research that could help move the field forward.
The argument-based approach to test validation, initially developed for large-scale assessment, will continue to be refined to make it more applicable to test developers and practitioners. Alternative validation approaches for classroom assessment are emerging but could benefit from more empirical verifications to make them theoretically sound as well as practically useful. We are in an exciting era when new conceptualizations of communicative language use such as English as a lingua franca and use of new technologies in real-world communication are pushing the boundaries of the constructs of language assessments. These developments have introduced new conceptual challenges and complexity in redefining the constructs of language assessments and in designing validation research in light of the expanded constructs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (part 1) state-of-the-art review. Language Teaching, 34(4), 213–236.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34.
Bachman, L. F., & Eignor, D. R. (1997). Recent advances in quantitative test analysis. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 227–242). Dordrecht: Kluwer Academic.
Bachman, L. F., & Palmer, A. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.
Banerjee, J., & Luoma, S. (1997). Qualitative approaches to test validation. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 275–287). Dordrecht: Kluwer Academic.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multi-dimensional comparison. TESOL Quarterly, 36(1), 9–48.
Brown, A., Iwashita, N., & McNamara, T. (2005). An examination of rater orientations and test taker performance on english for academic purposes speaking tasks (TOEFL monograph series (TOEFL-MS-29)). Princeton: Educational Testing Service.
Carr, N. T., Pan, M. J., & Xi, X. (2002). Construct refinement and automated scoring in web-based testing. Paper presented at the 24th annual language testing research colloquium, Hong Kong, December.
Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33.
Chapelle, C., & Voss, E. (2013). Evolution of language tests through validation research. In A. Kunnan (Ed.), The companion to language assessment (pp. 1081–1097). Boston: Wiley-Blackwell.
Chapelle, C., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.
Clark, J. L. D. (Ed.). (1978). Direct testing of speaking proficiency: Theory and practice. Princeton: Educational Testing Service.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale: Erlbaum Associates.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Cumming, A. (1997). The testing of writing in a second language. In C. Clapham & D. Corson (Eds.), Encyclopedia of Language and Education (Language testing and assessment, Vol. 7). Dordrecht: Kluwer Academic.
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (1st ed., pp. 621–694). Washington, DC: American Council on Education.
Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Mahwah: Lawrence Erlbaum Associates.
Green, A. (1997). Verbal protocol analysis in language testing research. Cambridge: Cambridge University Press.
Grotjahn, R. (1986). Test validation and cognitive psychology: Some methodological considerations. Language Testing, 3(2), 159–185.
In’nami, Y., & Koizumi, R. (2011). Structural equation modeling in language testing and learning research: A review. Language Assessment Quarterly, 8(3), 250–276.
Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information processing approach to task design. Language Learning, 51(3), 401–436.
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26(1), 31–73.
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2, 135–170.
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5–17.
Kunnan, A. J. (Ed.). (1998). Special issue: Structural equation modeling. Language Testing, 15(3). London: Edward Arnold.
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European language testing in a global context: Proceedings of the ALTE Barcelona Conference (pp. 27–48). Cambridge: Cambridge University Press.
Lado, R. (1961). Language testing. New York: McGraw-Hill.
Lazaraton, A. (2002). A qualitative approach to the validation of oral tests. Cambridge: Cambridge University Press.
Lee, Y.-W. (2006). Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks. Language Testing, 23(2), 131–166.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.
Lynch, B. K., & MaNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180.
McNamara, T. F. (1996). Measuring second language performance. London: Longman.
McNamara, T. F. (2006). Validity in language testing: The challenge of Sam Messick’s legacy. Language Assessment Quarterly, 3(1), 31–51.
McNamara, T. (2014). 30 years on – Evolution or revolution? Language Assessment Quarterly, 11, 226–232.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.
Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17, 6–12.
Moss, P. (2003). Reconceptualizing validity for classroom assessment. Educational Measurement: Issues and Practices, 22(4), 13–25.
Moss, P. (2013). Validity in action: Lessons from studies of data use. Journal of Educational Measurement, 50, 91–98.
Moss, P., Girard, B., & Haniford, L. (2006). Validity in educational assessment. Review of Research in Education, 30, 109–162.
Norris, J. (2008). Validity evaluation in language assessment. Frankfurt: Peter Lang.
O’Loughlin, K. (2001). In M. Milanovic & C. Weir (Eds.), The equivalence of direct and semi-direct speaking tests (Series: Studies in language testing). Cambridge: Cambridge University Press.
Palmer, A. S., Groot, P. J. M., & Trosper, G. A. (Eds.). (1981). The construct validation of tests of communicative competence. Washington, DC: TESOL.
Plakans, L., & Gebril, A. (2012). A close investigation into source use in integrated second language writing tasks. Assessing Writing, 17, 18–34.
Poehner, M. E. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting second language development. Berlin: Springer.
Sawaki, Y., & Sinharay, S. (2013). Investigating the value of TOEFL iBT section scores (TOEFL iBT research report (No. TOEFLiBT-21)). Princeton: Educational Testing Service.
Sawaki, Y., & Xi, X. (2005). Standard setting for the next generation TOEFL. Paper presented at the 2005 TESOL Convention, San Antonio, March.
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405–450.
Shohamy, E. (2001). The power of tests: A critical perspective of the uses of language tests. London: Longman.
Shute, V. J., & Ventura, M. (2013). Measuring and supporting learning in games: Stealth assessment. Cambridge, MA: The MIT Press.
Stansfield, C. W., & Hewitt, W. E. (2005). Examining the predictive validity of a screening test for court interpreters. Language Testing, 22(4), 438–462.
Taylor, C., Kirsch, I., Eignor, D., & Jamieson, J. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49(2), 219–274.
Turner, C. E. (2014). Mixed methods research. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1403–1417). New York: Wiley.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language Testing, 10(1), 41–69.
Weir, C. J. (1983). The associated examining board’s test in English for academic purposes: An exercise in content validation events. In A. Hughes & D. Porter (Eds.), Current developments in language testing (pp. 147–153). London: Academic.
Xi, X. (2005). Do visual chunks and planning impact performance on the graph description task in the SPEAK exam? Language Testing, 22(4), 463–508.
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.
Xi, X., & Davis, L. (2016). Quality factors in language assessment. In D. Tsagari & B. Jayanti (Eds.), Handbook of second language assessment. Berlin: De Gruyter Mouton.
Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for TOEFL Academic Speaking Test (TAST) (TOEFL iBT Research Report Series (TOEFLiBT-RR-01)). Princeton: Educational Testing Service.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this entry
Cite this entry
Xi, X., Sawaki, Y. (2017). Methods of Test Validation. In: Shohamy, E., Or, I., May, S. (eds) Language Testing and Assessment. Encyclopedia of Language and Education. Springer, Cham. https://doi.org/10.1007/978-3-319-02261-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-02261-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02260-4
Online ISBN: 978-3-319-02261-1
eBook Packages: EducationReference Module Humanities and Social SciencesReference Module Education