Skip to main content

Methods of Test Validation

  • Reference work entry
  • First Online:
Language Testing and Assessment

Part of the book series: Encyclopedia of Language and Education ((ELE))

Abstract

Test validation methods are at the heart of language testing research. The way in which validity is conceptualized determines the scope and nature of validity investigations and hence the methods to gather evidence. Validation frameworks specify the process used to prioritize, integrate, and evaluate evidence collected using various methods. This review charts the evolution of validity theory and validation frameworks and provides a brief review of current methodologies for language test validation, organized by the validity inferences to which they are related in an argument-based validation framework. It discusses some problems and challenges associated with our current test validation research and practice and proposes some major areas of research that could help move the field forward.

The argument-based approach to test validation, initially developed for large-scale assessment, will continue to be refined to make it more applicable to test developers and practitioners. Alternative validation approaches for classroom assessment are emerging but could benefit from more empirical verifications to make them theoretically sound as well as practically useful. We are in an exciting era when new conceptualizations of communicative language use such as English as a lingua franca and use of new technologies in real-world communication are pushing the boundaries of the constructs of language assessments. These developments have introduced new conceptual challenges and complexity in redefining the constructs of language assessments and in designing validation research in light of the expanded constructs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (part 1) state-of-the-art review. Language Teaching, 34(4), 213–236.

    Article  Google Scholar 

  • Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

    Google Scholar 

  • Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42.

    Article  Google Scholar 

  • Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34.

    Article  Google Scholar 

  • Bachman, L. F., & Eignor, D. R. (1997). Recent advances in quantitative test analysis. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 227–242). Dordrecht: Kluwer Academic.

    Chapter  Google Scholar 

  • Bachman, L. F., & Palmer, A. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.

    Google Scholar 

  • Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.

    Google Scholar 

  • Banerjee, J., & Luoma, S. (1997). Qualitative approaches to test validation. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education (Language testing and assessment, Vol. 7, pp. 275–287). Dordrecht: Kluwer Academic.

    Chapter  Google Scholar 

  • Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multi-dimensional comparison. TESOL Quarterly, 36(1), 9–48.

    Article  Google Scholar 

  • Brown, A., Iwashita, N., & McNamara, T. (2005). An examination of rater orientations and test taker performance on english for academic purposes speaking tasks (TOEFL monograph series (TOEFL-MS-29)). Princeton: Educational Testing Service.

    Google Scholar 

  • Carr, N. T., Pan, M. J., & Xi, X. (2002). Construct refinement and automated scoring in web-based testing. Paper presented at the 24th annual language testing research colloquium, Hong Kong, December.

    Google Scholar 

  • Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33.

    Article  Google Scholar 

  • Chapelle, C., & Voss, E. (2013). Evolution of language tests through validation research. In A. Kunnan (Ed.), The companion to language assessment (pp. 1081–1097). Boston: Wiley-Blackwell.

    Google Scholar 

  • Chapelle, C., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.

    Google Scholar 

  • Clark, J. L. D. (Ed.). (1978). Direct testing of speaking proficiency: Theory and practice. Princeton: Educational Testing Service.

    Google Scholar 

  • Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale: Erlbaum Associates.

    Google Scholar 

  • Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

    Article  Google Scholar 

  • Cumming, A. (1997). The testing of writing in a second language. In C. Clapham & D. Corson (Eds.), Encyclopedia of Language and Education (Language testing and assessment, Vol. 7). Dordrecht: Kluwer Academic.

    Google Scholar 

  • Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (1st ed., pp. 621–694). Washington, DC: American Council on Education.

    Google Scholar 

  • Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Green, A. (1997). Verbal protocol analysis in language testing research. Cambridge: Cambridge University Press.

    Google Scholar 

  • Grotjahn, R. (1986). Test validation and cognitive psychology: Some methodological considerations. Language Testing, 3(2), 159–185.

    Article  Google Scholar 

  • In’nami, Y., & Koizumi, R. (2011). Structural equation modeling in language testing and learning research: A review. Language Assessment Quarterly, 8(3), 250–276.

    Article  Google Scholar 

  • Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information processing approach to task design. Language Learning, 51(3), 401–436.

    Article  Google Scholar 

  • Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26(1), 31–73.

    Article  Google Scholar 

  • Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2, 135–170.

    Google Scholar 

  • Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.

    Google Scholar 

  • Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18(2), 5–17.

    Article  Google Scholar 

  • Kunnan, A. J. (Ed.). (1998). Special issue: Structural equation modeling. Language Testing, 15(3). London: Edward Arnold.

    Google Scholar 

  • Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European language testing in a global context: Proceedings of the ALTE Barcelona Conference (pp. 27–48). Cambridge: Cambridge University Press.

    Google Scholar 

  • Lado, R. (1961). Language testing. New York: McGraw-Hill.

    Google Scholar 

  • Lazaraton, A. (2002). A qualitative approach to the validation of oral tests. Cambridge: Cambridge University Press.

    Google Scholar 

  • Lee, Y.-W. (2006). Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks. Language Testing, 23(2), 131–166.

    Article  Google Scholar 

  • Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.

    Article  Google Scholar 

  • Lynch, B. K., & MaNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180.

    Article  Google Scholar 

  • McNamara, T. F. (1996). Measuring second language performance. London: Longman.

    Google Scholar 

  • McNamara, T. F. (2006). Validity in language testing: The challenge of Sam Messick’s legacy. Language Assessment Quarterly, 3(1), 31–51.

    Article  Google Scholar 

  • McNamara, T. (2014). 30 years on – Evolution or revolution? Language Assessment Quarterly, 11, 226–232.

    Article  Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.

    Google Scholar 

  • Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17, 6–12.

    Google Scholar 

  • Moss, P. (2003). Reconceptualizing validity for classroom assessment. Educational Measurement: Issues and Practices, 22(4), 13–25.

    Article  Google Scholar 

  • Moss, P. (2013). Validity in action: Lessons from studies of data use. Journal of Educational Measurement, 50, 91–98.

    Article  Google Scholar 

  • Moss, P., Girard, B., & Haniford, L. (2006). Validity in educational assessment. Review of Research in Education, 30, 109–162.

    Article  Google Scholar 

  • Norris, J. (2008). Validity evaluation in language assessment. Frankfurt: Peter Lang.

    Book  Google Scholar 

  • O’Loughlin, K. (2001). In M. Milanovic & C. Weir (Eds.), The equivalence of direct and semi-direct speaking tests (Series: Studies in language testing). Cambridge: Cambridge University Press.

    Google Scholar 

  • Palmer, A. S., Groot, P. J. M., & Trosper, G. A. (Eds.). (1981). The construct validation of tests of communicative competence. Washington, DC: TESOL.

    Google Scholar 

  • Plakans, L., & Gebril, A. (2012). A close investigation into source use in integrated second language writing tasks. Assessing Writing, 17, 18–34.

    Article  Google Scholar 

  • Poehner, M. E. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting second language development. Berlin: Springer.

    Book  Google Scholar 

  • Sawaki, Y., & Sinharay, S. (2013). Investigating the value of TOEFL iBT section scores (TOEFL iBT research report (No. TOEFLiBT-21)). Princeton: Educational Testing Service.

    Google Scholar 

  • Sawaki, Y., & Xi, X. (2005). Standard setting for the next generation TOEFL. Paper presented at the 2005 TESOL Convention, San Antonio, March.

    Google Scholar 

  • Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405–450.

    Google Scholar 

  • Shohamy, E. (2001). The power of tests: A critical perspective of the uses of language tests. London: Longman.

    Google Scholar 

  • Shute, V. J., & Ventura, M. (2013). Measuring and supporting learning in games: Stealth assessment. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Stansfield, C. W., & Hewitt, W. E. (2005). Examining the predictive validity of a screening test for court interpreters. Language Testing, 22(4), 438–462.

    Article  Google Scholar 

  • Taylor, C., Kirsch, I., Eignor, D., & Jamieson, J. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49(2), 219–274.

    Article  Google Scholar 

  • Turner, C. E. (2014). Mixed methods research. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1403–1417). New York: Wiley.

    Google Scholar 

  • Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language Testing, 10(1), 41–69.

    Article  Google Scholar 

  • Weir, C. J. (1983). The associated examining board’s test in English for academic purposes: An exercise in content validation events. In A. Hughes & D. Porter (Eds.), Current developments in language testing (pp. 147–153). London: Academic.

    Google Scholar 

  • Xi, X. (2005). Do visual chunks and planning impact performance on the graph description task in the SPEAK exam? Language Testing, 22(4), 463–508.

    Article  Google Scholar 

  • Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.

    Article  Google Scholar 

  • Xi, X., & Davis, L. (2016). Quality factors in language assessment. In D. Tsagari & B. Jayanti (Eds.), Handbook of second language assessment. Berlin: De Gruyter Mouton.

    Google Scholar 

  • Xi, X., & Mollaun, P. (2006). Investigating the utility of analytic scoring for TOEFL Academic Speaking Test (TAST) (TOEFL iBT Research Report Series (TOEFLiBT-RR-01)). Princeton: Educational Testing Service.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Xi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this entry

Cite this entry

Xi, X., Sawaki, Y. (2017). Methods of Test Validation. In: Shohamy, E., Or, I., May, S. (eds) Language Testing and Assessment. Encyclopedia of Language and Education. Springer, Cham. https://doi.org/10.1007/978-3-319-02261-1_14

Download citation

Publish with us

Policies and ethics