Skip to main content
Book cover

Looking Back pp 259–272Cite as

Holland’s Advice for the Fourth Generation of Test Theory: Blood Tests Can Be Contests

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Statistics ((LNSP,volume 202))

Abstract

According to Holland (2008) in The First Four Generations of Test Theory, testing as a scientific enterprise is not more than 120 years old. Holland divides this enterprise into four overlapping generations. The first generation, which was influenced by concepts such as error of measurement and correlation that were developed in other fields, focused on test scores and saw developments in the areas of reliability, classical test theory, generalizability theory, and validity. This generation began in the early twentieth century and continues today, but most of its major developments were achieved by 1970. The second generation, which focused on models for item level data, began in the 1940s and peaked in the 1970s but continues into the present as well. The third generation started in the 1970s and continues into today. It is characterized by the application of statistical ideas and sophisticated computational methods to item level models, as well as models of sets of items.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Alderman, D. L., & Holland, P. W. (1981). Item performance across native language groups on the Test of English as a Foreign Language (ETS Research Rep. No. RR-81-16) Princeton, NJ: ETS.

    Google Scholar 

  • Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.

    Article  Google Scholar 

  • College Board. (2005). 2005 college bound seniors: Total group profile report. New York, NY: Author.

    Google Scholar 

  • Dorans, N. J. (1982). Technical review of item fairness studies: 1975–1979 (ETS Statistical Rep. No. SR-82-90). Princeton, NJ: ETS.

    Google Scholar 

  • Dorans, N. J. (2002). Recentering the SAT score distributions: How and why. Journal of Educational Measurement, 39(1), 59–84.

    Article  Google Scholar 

  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37(4), 281–306.

    Article  Google Scholar 

  • Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (ETS Research Rep. No. RR-83-09). Princeton, NJ: ETS.

    Google Scholar 

  • Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.

    Article  Google Scholar 

  • Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the MMSE: An application of the Mantel-Haenzel and standardization procedures. Medical Care, 44 S3, S107–S114.

    Google Scholar 

  • Dorans, N. J., Lyu, C. F., Pommerich, M., & Houston, M. (1997). Concordance between ACT assessment and recentered SAT I sum scores. Colleges and Universities, 73, 24–34.

    Google Scholar 

  • Dorans, N. J., Pommerich, M., & Holland, P. W. (Eds.). (2007). Linking and aligning scores and scales. New York, NY: Springer.

    MATH  Google Scholar 

  • Feuer, M. J., Holland, P. W., Green, B. F., Bertenthal, M. W., & Hemphill, F. C. (Eds.). (1999). Uncommon measures: Equivalence and linkage among educational tests (Report of the Committee on Equivalency and Linkage of Educational Tests, National Research Council). Washington, DC: National Academy Press.

    Google Scholar 

  • Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.

    Book  Google Scholar 

  • Hackett, R. K., Holland, P. W., Pearlman, M., & Thayer, D. T. (1987). Test construction manipulating scores differences between Black and White examinees: Properties of the resulting tests (ETS Research Rep. No. RR-87-30). Princeton, NJ: ETS.

    Google Scholar 

  • Holland, P. W. (1994). Measurements or contests? Comments on Zwick, Bond and Allen/Donoghue. Proceedings of the Social Statistics Section of the American Statistical Association, 1994, 27–29.

    Google Scholar 

  • Holland, P. W. (2008, March). The first four generations of test theory. Paper presented at the Association of Test Publishers on Innovations in Testing, Dallas, TX.

    Google Scholar 

  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education/Prager.

    Google Scholar 

  • Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika, 68, 123–149.

    Article  MathSciNet  Google Scholar 

  • Holland, P. W., & Rubin, D. B. (Eds.). (1982). Test equating. New York, NY: Academic Press.

    Google Scholar 

  • Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Kelley, T. L. (1927). Interpretation of educational measurements. New York, NY: World Book.

    Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Meredith, W., & Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57(2), 289–311.

    Article  MathSciNet  MATH  Google Scholar 

  • Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression: A second course in statistics. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Schmitt, A. P., Holland, P. W., & Dorans, N. J. (1993). Evaluating hypotheses about differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 281–315). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Shealy, R. T., & Stout, W. F. (1993). A model based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometika, 58, 197–239.

    Google Scholar 

  • Tucker, L. R. (1971). Relations of factor score estimates to their use. Psychometrika, 36(4), 427–436.

    Article  MathSciNet  MATH  Google Scholar 

  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.

    MATH  Google Scholar 

  • Wainer, H. (2007). The world’s most dangerous equation. American Scientist, 95, 249–256.

    Google Scholar 

Download references

Acknowledgements

The author thanks Paul Holland for being the mentor, colleague, and friend who had the most impact on my career. Tim Moses provided valuable advice. Any opinions expressed here are those of the author and not necessarily of Educational Testing Service.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neil J. Dorans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this paper

Cite this paper

Dorans, N.J. (2011). Holland’s Advice for the Fourth Generation of Test Theory: Blood Tests Can Be Contests. In: Dorans, N., Sinharay, S. (eds) Looking Back. Lecture Notes in Statistics(), vol 202. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9389-2_14

Download citation

Publish with us

Policies and ethics