Test Dimensionality

  • Gideon J. MellenberghEmail author


The dimensionality of a test is the number of latent variables that is measured by the test. An essential unidimensional test measures predominantly one latent variable, and a multidimensional test measures more than one latent variable. Three types of multidimensionality are described. First, a simple structure. The test measures two or more latent variables, and falls apart into two or more essential unidimensional subtests. Second, a complex structure. Each of the items measures the same two or more latent variables. Third, a bi-factor structure. Each of the items measures the same general latent variable, while subgroups of items also measure specific latent variables. The interpretation of study results is straightforward if the dependent variable is a simple-structure test, but is hard if it is a complex-structure test. Factor Analysis (FA) and Principle Component Analysis (PCA) of inter-item product moment correlations (pmcs) and the reliability are often applied to assess the dimensionality of a test. FA and PCA of inter-item pmcs fail, especially if the number of answer categories of the items is small, and high reliability does not guarantee that the test is essential unidimensional. Appropriate methods to assess test dimensionality are FA of inter-item tetrachoric (dichotomous items) and polychoric (more than two ordered answer categories) correlations, Mokken scale analysis, and full-information FA. The factor analytic methods make stronger assumptions than Mokken’s method, but Mokken’s method is not capable to assess the type of multidimensionality. Measurement invariance of an item with respect to a variable (e.g., E- and C-condition membership) means that the same item response model applies to all values of that variable (e.g., the same model in E- and C-conditions). Test scores should be measurement invariant to interpret study results, for example, measurement invariant with respect to condition membership to compare the difference of E- and C-condition test score means.


Full-information factor analysis Homogeneity (H-) coefficient Measurement invariance Mokken scale analysis Polychoric correlation Reliability and test dimensionality Simple-, bi-factor, and complex structure Tetrachoric correlation 


  1. Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.CrossRefGoogle Scholar
  2. Bock, R. D., Gibbons, R., Schilling, S. G., Muraki, E., Wilson, D., & Wood, R. (2000). TESTFACT 3.0: Test scoring, item statistics, and full-information item factor analysis. Chicago, Il: Scientific Software.Google Scholar
  3. Boomsma, A. (1993). On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen, The Netherlands.Google Scholar
  4. Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221–255). Westport, CT: Praeger Publishers.Google Scholar
  5. Camilli, G., Prowker, A., Dossey, J. A., Lindquist, M. M., Chiu, T.-W., Vargas, S., et al. (2008). Summarizing item difficulty variation with parcel scores. Journal of Educational Measurement, 45, 363–389.CrossRefGoogle Scholar
  6. Carroll, J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10, 1–19.CrossRefGoogle Scholar
  7. Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5, and 7 response categories: A comparison of categorical variable estimators using simulated data. British Journal of Mathematical and Statistical Psychology, 49, 309–326.CrossRefGoogle Scholar
  8. Ettema, T. P. (2007). The construction of a dementia-specific Quality of Life instrument rated by professional caregivers in residential settings: The QUALIDEM. Unpublished doctoral dissertation, Free University at Amsterdam, The Netherlands.Google Scholar
  9. Ettema, T. P., Dröes, R. M., de Lange, J., Mellenbergh, G. J., & Ribbe, M. W. (2007). QUALIDEM: Development and evaluation of a dementia specific Quality of Life instrument. Scalability, reliability, and internal structure. International Journal of Geriatric Psychiatry, 22, 549–556.CrossRefGoogle Scholar
  10. Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827–838.CrossRefGoogle Scholar
  11. Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling. Sociological Methods & Research, 26, 329–367.CrossRefGoogle Scholar
  12. Houtkoop, B. L., & Plak, S. (2015). Nonparametric item response theory for dichotomous item scores. In H. J. Adèr & G. J. Mellenbergh (Eds.), Advising on research methods: Selected topics 2015 (pp. 51–67). Huizen, The Netherlands: van Kessel.Google Scholar
  13. Jöreskog, K. G., & Sörbom, D. (2001). PRELIS: A program for multivariate data screening and data summarization. Chicago, Il: Scientific Software.Google Scholar
  14. McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189–216). Mahwah, NJ: Erlbaum.Google Scholar
  15. Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.CrossRefGoogle Scholar
  16. Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis, and application of psychological and educational tests. The Hague, The Netherlands: Eleven International Publishing.Google Scholar
  17. Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge.Google Scholar
  18. Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. Berlin, Germany: De Gruyter.CrossRefGoogle Scholar
  19. Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351–367). New York, NY: Springer.CrossRefGoogle Scholar
  20. Molenaar, I. W. (1991). A weighted Loevinger H-coefficient extending Mokken scaling to multicategory items. Kwantitatieve Methoden, 12, 97–117.Google Scholar
  21. Molenaar, I. W. (1997). Nonparametric models for polytomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 369–380). New York, NY: Springer.CrossRefGoogle Scholar
  22. Molenaar, I. W., & Sijtsma, K. (2000). User’s manual MSP5 for Windows. Groningen, The Netherlands: iec ProGAMMA.Google Scholar
  23. Muraki, E. (1993). POLYFACT [Computer program]. Princeton, NJ: Educational Testing Service.Google Scholar
  24. Muraki, E., & Carlson, J. E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73–90.CrossRefGoogle Scholar
  25. Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus user’s guide (Version 7.11). Los Angeles, CA: Author.Google Scholar
  26. Oort, F. J. (1993). Theory of violators: Assessing unidimensionality of psychological measures. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.), Psychometric methodology, Proceeding of the 7th European Meeting of the Psychometric Society in Trier (pp. 377–381). Stuttgart, Germany: Fischer.Google Scholar
  27. Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.CrossRefGoogle Scholar
  28. Sijtsma, K. (2009). On the use, misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.CrossRefGoogle Scholar
  29. Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.CrossRefGoogle Scholar
  30. Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.CrossRefGoogle Scholar
  31. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  32. Swygert, K. A., McLeod, L. D., & Thissen, D. (2001). Factor analysis for items or testlets scored in more than two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 217–250). Mahwah, NJ: Erlbaum.Google Scholar
  33. Thissen, D. (2001). IRTLRDIF [Computer program]. University of North Carolina at Chapel Hill: L. L. Thurstone Psychometric Laboratory.Google Scholar
  34. Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.CrossRefGoogle Scholar
  35. Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696–716.CrossRefGoogle Scholar
  36. Wicherts, J. M., Dolan, C. V., Hessen, D. J., Oosterveld, P., van Baal, G. C. M., Boomsma, D. L., et al. (2004). Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence, 32, 509–537.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Emeritus Professor Psychological Methods, Department of PsychologyUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations