Methods of Test Validation

Xi, Xiaoming; Sawaki, Yasuyo

doi:10.1007/978-3-319-02261-1_14

Xiaoming Xi⁵ &
Yasuyo Sawaki⁶

Part of the book series: Encyclopedia of Language and Education ((ELE))

7639 Accesses
3 Citations
1 Altmetric

Abstract

Test validation methods are at the heart of language testing research. The way in which validity is conceptualized determines the scope and nature of validity investigations and hence the methods to gather evidence. Validation frameworks specify the process used to prioritize, integrate, and evaluate evidence collected using various methods. This review charts the evolution of validity theory and validation frameworks and provides a brief review of current methodologies for language test validation, organized by the validity inferences to which they are related in an argument-based validation framework. It discusses some problems and challenges associated with our current test validation research and practice and proposes some major areas of research that could help move the field forward.

The argument-based approach to test validation, initially developed for large-scale assessment, will continue to be refined to make it more applicable to test developers and practitioners. Alternative validation approaches for classroom assessment are emerging but could benefit from more empirical verifications to make them theoretically sound as well as practically useful. We are in an exciting era when new conceptualizations of communicative language use such as English as a lingua franca and use of new technologies in real-world communication are pushing the boundaries of the constructs of language assessments. These developments have introduced new conceptual challenges and complexity in redefining the constructs of language assessments and in designing validation research in light of the expanded constructs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (part 1) state-of-the-art review. Language Teaching, 34(4), 213–236.
Article Google Scholar
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Google Scholar
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42.
Article Google Scholar
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Book Google Scholar
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34.
Article Google Scholar
Bachman, L. F., & Eignor, D. R. (1997). Recent advances in quantitative test analysis. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 227–242). Dordrecht: Kluwer Academic.
Chapter Google Scholar
Bachman, L. F., & Palmer, A. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.
Google Scholar
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.
Google Scholar
Banerjee, J., & Luoma, S. (1997). Qualitative approaches to test validation. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 275–287). Dordrecht: Kluwer Academic.
Chapter Google Scholar
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multi-dimensional comparison. TESOL Quarterly, 36(1), 9–48.
Article Google Scholar
Brown, A., Iwashita, N., & McNamara, T. (2005). An examination of rater orientations and test taker performance on english for academic purposes speaking tasks (TOEFL monograph series (TOEFL-MS-29)). Princeton: Educational Testing Service.
Google Scholar
Carr, N. T., Pan, M. J., & Xi, X. (2002). Construct refinement and automated scoring in web-based testing. Paper presented at the 24th annual language testing research colloquium, Hong Kong, December.
Google Scholar
Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33.
Article Google Scholar
Chapelle, C., & Voss, E. (2013). Evolution of language tests through validation research. In A. Kunnan (Ed.), The companion to language assessment (pp. 1081–1097). Boston: Wiley-Blackwell.
Google Scholar
Chapelle, C., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.
Google Scholar
Clark, J. L. D. (Ed.). (1978). Direct testing of speaking proficiency: Theory and practice. Princeton: Educational Testing Service.
Google Scholar
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale: Erlbaum Associates.
Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Article Google Scholar
Cumming, A. (1997). The testing of writing in a second language. In C. Clapham & D. Corson (Eds.), Encyclopedia of Language and Education (Language testing and assessment, Vol. 7). Dordrecht: Kluwer Academic.
Google Scholar
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (1st ed., pp. 621–694). Washington, DC: American Council on Education.
Google Scholar
Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Green, A. (1997). Verbal protocol analysis in language testing research. Cambridge: Cambridge University Press.
Google Scholar
Grotjahn, R. (1986). Test validation and cognitive psychology: Some methodological considerations. Language Testing, 3(2), 159–185.
Article Google Scholar
In’nami, Y., & Koizumi, R. (2011). Structural equation modeling in language testing and learning research: A review. Language Assessment Quarterly, 8(3), 250–276.
Article Google Scholar
Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information processing approach to task design. Language Learning, 51(3), 401–436.
Article Google Scholar
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26(1), 31–73.
Article Google Scholar
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Article Google Scholar
Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2, 135–170.
Google Scholar
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Google Scholar
Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5–17.
Article Google Scholar
Kunnan, A. J. (Ed.). (1998). Special issue: Structural equation modeling. Language Testing, 15(3). London: Edward Arnold.
Google Scholar
Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European language testing in a global context: Proceedings of the ALTE Barcelona Conference (pp. 27–48). Cambridge: Cambridge University Press.
Google Scholar
Lado, R. (1961). Language testing. New York: McGraw-Hill.
Google Scholar
Lazaraton, A. (2002). A qualitative approach to the validation of oral tests. Cambridge: Cambridge University Press.
Google Scholar
Lee, Y.-W. (2006). Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks. Language Testing, 23(2), 131–166.
Article Google Scholar
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.
Article Google Scholar
Lynch, B. K., & MaNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180.
Article Google Scholar
McNamara, T. F. (1996). Measuring second language performance. London: Longman.
Google Scholar
McNamara, T. F. (2006). Validity in language testing: The challenge of Sam Messick’s legacy. Language Assessment Quarterly, 3(1), 31–51.
Article Google Scholar
McNamara, T. (2014). 30 years on – Evolution or revolution? Language Assessment Quarterly, 11, 226–232.
Article Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.
Google Scholar
Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17, 6–12.
Google Scholar
Moss, P. (2003). Reconceptualizing validity for classroom assessment. Educational Measurement: Issues and Practices, 22(4), 13–25.
Article Google Scholar
Moss, P. (2013). Validity in action: Lessons from studies of data use. Journal of Educational Measurement, 50, 91–98.
Article Google Scholar
Moss, P., Girard, B., & Haniford, L. (2006). Validity in educational assessment. Review of Research in Education, 30, 109–162.
Article Google Scholar
Norris, J. (2008). Validity evaluation in language assessment. Frankfurt: Peter Lang.
Book Google Scholar
O’Loughlin, K. (2001). In M. Milanovic & C. Weir (Eds.), The equivalence of direct and semi-direct speaking tests (Series: Studies in language testing). Cambridge: Cambridge University Press.
Google Scholar
Palmer, A. S., Groot, P. J. M., & Trosper, G. A. (Eds.). (1981). The construct validation of tests of communicative competence. Washington, DC: TESOL.
Google Scholar
Plakans, L., & Gebril, A. (2012). A close investigation into source use in integrated second language writing tasks. Assessing Writing, 17, 18–34.
Article Google Scholar
Poehner, M. E. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting second language development. Berlin: Springer.
Book Google Scholar
Sawaki, Y., & Sinharay, S. (2013). Investigating the value of TOEFL iBT section scores (TOEFL iBT research report (No. TOEFLiBT-21)). Princeton: Educational Testing Service.
Google Scholar
Sawaki, Y., & Xi, X. (2005). Standard setting for the next generation TOEFL. Paper presented at the 2005 TESOL Convention, San Antonio, March.
Google Scholar
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405–450.
Google Scholar
Shohamy, E. (2001). The power of tests: A critical perspective of the uses of language tests. London: Longman.
Google Scholar
Shute, V. J., & Ventura, M. (2013). Measuring and supporting learning in games: Stealth assessment. Cambridge, MA: The MIT Press.
Google Scholar
Stansfield, C. W., & Hewitt, W. E. (2005). Examining the predictive validity of a screening test for court interpreters. Language Testing, 22(4), 438–462.
Article Google Scholar
Taylor, C., Kirsch, I., Eignor, D., & Jamieson, J. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49(2), 219–274.
Article Google Scholar
Turner, C. E. (2014). Mixed methods research. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1403–1417). New York: Wiley.
Google Scholar
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language Testing, 10(1), 41–69.
Article Google Scholar
Weir, C. J. (1983). The associated examining board’s test in English for academic purposes: An exercise in content validation events. In A. Hughes & D. Porter (Eds.), Current developments in language testing (pp. 147–153). London: Academic.
Google Scholar
Xi, X. (2005). Do visual chunks and planning impact performance on the graph description task in the SPEAK exam? Language Testing, 22(4), 463–508.
Article Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.
Article Google Scholar
Xi, X., & Davis, L. (2016). Quality factors in language assessment. In D. Tsagari & B. Jayanti (Eds.), Handbook of second language assessment. Berlin: De Gruyter Mouton.
Google Scholar
Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for TOEFL Academic Speaking Test (TAST) (TOEFL iBT Research Report Series (TOEFLiBT-RR-01)). Princeton: Educational Testing Service.
Google Scholar

Download references

Author information

Authors and Affiliations

New Product Development, Educational Testing Service, Rosedale Road 666, Princeton, NJ, 08541, USA
Xiaoming Xi
Faculty of Education and Integrated Arts and Sciences, Waseda University, 1-6-1 Nishiwaseda, Shinjuku-ku, Tokyo, 169-8050, Japan
Yasuyo Sawaki

Authors

Xiaoming Xi
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyo Sawaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Xi .

Editor information

Editors and Affiliations

School of Education, Tel Aviv University, Tel Aviv, Israel
Elana Shohamy
School of Education, Tel Aviv University, Tel Aviv, Israel
Iair G. Or
Faculty of Education and Social Work, The University of Auckland, Auckland, New Zealand
Stephen May

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Xi, X., Sawaki, Y. (2017). Methods of Test Validation. In: Shohamy, E., Or, I., May, S. (eds) Language Testing and Assessment. Encyclopedia of Language and Education. Springer, Cham. https://doi.org/10.1007/978-3-319-02261-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-02261-1_14
Published: 16 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02260-4
Online ISBN: 978-3-319-02261-1
eBook Packages: EducationReference Module Humanities and Social SciencesReference Module Education

Publish with us

Policies and ethics