Abstract
According to Holland (2008) in The First Four Generations of Test Theory, testing as a scientific enterprise is not more than 120 years old. Holland divides this enterprise into four overlapping generations. The first generation, which was influenced by concepts such as error of measurement and correlation that were developed in other fields, focused on test scores and saw developments in the areas of reliability, classical test theory, generalizability theory, and validity. This generation began in the early twentieth century and continues today, but most of its major developments were achieved by 1970. The second generation, which focused on models for item level data, began in the 1940s and peaked in the 1970s but continues into the present as well. The third generation started in the 1970s and continues into today. It is characterized by the application of statistical ideas and sophisticated computational methods to item level models, as well as models of sets of items.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alderman, D. L., & Holland, P. W. (1981). Item performance across native language groups on the Test of English as a Foreign Language (ETS Research Rep. No. RR-81-16) Princeton, NJ: ETS.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.
College Board. (2005). 2005 college bound seniors: Total group profile report. New York, NY: Author.
Dorans, N. J. (1982). Technical review of item fairness studies: 1975–1979 (ETS Statistical Rep. No. SR-82-90). Princeton, NJ: ETS.
Dorans, N. J. (2002). Recentering the SAT score distributions: How and why. Journal of Educational Measurement, 39(1), 59–84.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hillsdale, NJ: Lawrence Erlbaum Associates.
Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37(4), 281–306.
Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach (ETS Research Rep. No. RR-83-09). Princeton, NJ: ETS.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.
Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the MMSE: An application of the Mantel-Haenzel and standardization procedures. Medical Care, 44 S3, S107–S114.
Dorans, N. J., Lyu, C. F., Pommerich, M., & Houston, M. (1997). Concordance between ACT assessment and recentered SAT I sum scores. Colleges and Universities, 73, 24–34.
Dorans, N. J., Pommerich, M., & Holland, P. W. (Eds.). (2007). Linking and aligning scores and scales. New York, NY: Springer.
Feuer, M. J., Holland, P. W., Green, B. F., Bertenthal, M. W., & Hemphill, F. C. (Eds.). (1999). Uncommon measures: Equivalence and linkage among educational tests (Report of the Committee on Equivalency and Linkage of Educational Tests, National Research Council). Washington, DC: National Academy Press.
Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.
Hackett, R. K., Holland, P. W., Pearlman, M., & Thayer, D. T. (1987). Test construction manipulating scores differences between Black and White examinees: Properties of the resulting tests (ETS Research Rep. No. RR-87-30). Princeton, NJ: ETS.
Holland, P. W. (1994). Measurements or contests? Comments on Zwick, Bond and Allen/Donoghue. Proceedings of the Social Statistics Section of the American Statistical Association, 1994, 27–29.
Holland, P. W. (2008, March). The first four generations of test theory. Paper presented at the Association of Test Publishers on Innovations in Testing, Dallas, TX.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education/Prager.
Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika, 68, 123–149.
Holland, P. W., & Rubin, D. B. (Eds.). (1982). Test equating. New York, NY: Academic Press.
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kelley, T. L. (1927). Interpretation of educational measurements. New York, NY: World Book.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Meredith, W., & Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57(2), 289–311.
Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression: A second course in statistics. Reading, MA: Addison-Wesley.
Schmitt, A. P., Holland, P. W., & Dorans, N. J. (1993). Evaluating hypotheses about differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 281–315). Hillsdale, NJ: Lawrence Erlbaum Associates.
Shealy, R. T., & Stout, W. F. (1993). A model based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometika, 58, 197–239.
Tucker, L. R. (1971). Relations of factor score estimates to their use. Psychometrika, 36(4), 427–436.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.
Wainer, H. (2007). The world’s most dangerous equation. American Scientist, 95, 249–256.
Acknowledgements
The author thanks Paul Holland for being the mentor, colleague, and friend who had the most impact on my career. Tim Moses provided valuable advice. Any opinions expressed here are those of the author and not necessarily of Educational Testing Service.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this paper
Cite this paper
Dorans, N.J. (2011). Holland’s Advice for the Fourth Generation of Test Theory: Blood Tests Can Be Contests. In: Dorans, N., Sinharay, S. (eds) Looking Back. Lecture Notes in Statistics(), vol 202. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9389-2_14
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9389-2_14
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9388-5
Online ISBN: 978-1-4419-9389-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)