Explanatory Item Response Theory Models: Impact on Validity and Test Development?

  • Susan EmbretsonEmail author
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


Many explanatory item response theory (IRT) models have been developed since Fischer’s (Acta Psychologica 37:359–374, 1973) linear logistic test model was published. However, despite their applicability to typical test data, actual impact on test development and validation has been limited. The purpose of this chapter is to explicate the importance of explanatory IRT models in the context of a framework that interrelates the five aspects of validity (Embretson in Educ Meas Issues Pract 35, 6–22, 2016). In this framework, the response processes aspect of validity impacts other aspects. Studies on a fluid intelligence test are presented to illustrate the relevancy of explanatory IRT models to validity, as well as to test development.


Item response theory Explanatory models Validity 


  1. American Educational Research Association, American Psychological Association & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  2. Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: A theoretical account of processing in the Raven’s Progressive Matrices Test. Psychological Review, 97, 404–431.CrossRefGoogle Scholar
  3. De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.CrossRefGoogle Scholar
  4. Embretson, S. E. (1984). A general multicomponent latent trait model for response processes. Psychometrika, 49, 175–186.CrossRefGoogle Scholar
  5. Embretson, S. E. (1997). Multicomponent latent trait models. In W. van der Linden & R. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–322). New York: Springer.CrossRefGoogle Scholar
  6. Embretson, S. E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433.CrossRefGoogle Scholar
  7. Embretson, S. E. (2016). Understanding examinees’ responses to items: Implications for measurement. Educational Measurement: Issues and Practice, 35, 6–22.CrossRefGoogle Scholar
  8. Embretson, S. E. (2017). An integrative framework for construct validity. In A. Rupp & J. Leighton (Eds.), The handbook of cognition and assessment (pp. 102–123). New York: Wiley-Blackwell.Google Scholar
  9. Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.CrossRefGoogle Scholar
  10. Fischer, G. H., & Ponocny, I. (1995). Extended rating scale and partial credit models for assessing change. In G. H. Fischer & I. W. Molenaar (eds.), Rasch models: Foundations, recent developments, and applications (pp. 353– 70). New York: Springer.Google Scholar
  11. Glas, C. A. W., van der Linden, W. J., & Geerlings, H. (2010). Estimation of the parameters in an item cloning model for adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 289–314). New York: Springer.Google Scholar
  12. Janssen, R. (2016). Linear logistic models. In W. van der Linden (Ed.), Handbook of item response theory: Models, statistics and applications. New York: Taylor & Francis Inc.Google Scholar
  13. Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.CrossRefGoogle Scholar
  14. Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion—referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.CrossRefGoogle Scholar
  15. Linacre, J. M. (1989). Multi-facet Rasch measurement. Chicago: MESA Press.Google Scholar
  16. Molenaar, D., & De Boeck, P. (2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83, 279–297.MathSciNetCrossRefGoogle Scholar
  17. Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.CrossRefGoogle Scholar
  18. Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 3, 271–282.CrossRefGoogle Scholar
  19. van der Linden, W. (Ed.). (2016). Handbook of item response theory: Models, statistics and applications. New York: Taylor & Francis Inc.Google Scholar
  20. von Davier, M., & Rost, J. (1995). Polytomous mixed Rasch models. In G. Fischer & I. Molenaar (Eds.), Rasch models: Foundation, recent developments and applications (pp. 371–379). New York: Springer.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations