Abstract
The dimensionality of a test is the number of latent variables that is measured by the test. An essential unidimensional test measures predominantly one latent variable, and a multidimensional test measures more than one latent variable. Three types of multidimensionality are described. First, a simple structure . The test measures two or more latent variables, and falls apart into two or more essential unidimensional subtests. Second, a complex structure . Each of the items measures the same two or more latent variables. Third, a bi-factor structure . Each of the items measures the same general latent variable, while subgroups of items also measure specific latent variables. The interpretation of study results is straightforward if the dependent variable is a simple-structure test, but is hard if it is a complex-structure test. Factor Analysis (FA) and Principle Component Analysis (PCA) of inter-item product moment correlations (pmcs) and the reliability are often applied to assess the dimensionality of a test . FA and PCA of inter-item pmcs fail, especially if the number of answer categories of the items is small, and high reliability does not guarantee that the test is essential unidimensional. Appropriate methods to assess test dimensionality are FA of inter-item tetrachoric (dichotomous items) and polychoric (more than two ordered answer categories) correlations, Mokken scale analysis, and full-information FA . The factor analytic methods make stronger assumptions than Mokken’s method, but Mokken’s method is not capable to assess the type of multidimensionality. Measurement invariance of an item with respect to a variable (e.g., E- and C-condition membership) means that the same item response model applies to all values of that variable (e.g., the same model in E- and C-conditions). Test scores should be measurement invariant to interpret study results, for example, measurement invariant with respect to condition membership to compare the difference of E- and C-condition test score means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.
Bock, R. D., Gibbons, R., Schilling, S. G., Muraki, E., Wilson, D., & Wood, R. (2000). TESTFACT 3.0: Test scoring, item statistics, and full-information item factor analysis. Chicago, Il: Scientific Software.
Boomsma, A. (1993). On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen, The Netherlands.
Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221–255). Westport, CT: Praeger Publishers.
Camilli, G., Prowker, A., Dossey, J. A., Lindquist, M. M., Chiu, T.-W., Vargas, S., et al. (2008). Summarizing item difficulty variation with parcel scores. Journal of Educational Measurement, 45, 363–389.
Carroll, J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10, 1–19.
Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5, and 7 response categories: A comparison of categorical variable estimators using simulated data. British Journal of Mathematical and Statistical Psychology, 49, 309–326.
Ettema, T. P. (2007). The construction of a dementia-specific Quality of Life instrument rated by professional caregivers in residential settings: The QUALIDEM. Unpublished doctoral dissertation, Free University at Amsterdam, The Netherlands.
Ettema, T. P., Dröes, R. M., de Lange, J., Mellenbergh, G. J., & Ribbe, M. W. (2007). QUALIDEM: Development and evaluation of a dementia specific Quality of Life instrument. Scalability, reliability, and internal structure. International Journal of Geriatric Psychiatry, 22, 549–556.
Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827–838.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling. Sociological Methods & Research, 26, 329–367.
Houtkoop, B. L., & Plak, S. (2015). Nonparametric item response theory for dichotomous item scores. In H. J. Adèr & G. J. Mellenbergh (Eds.), Advising on research methods: Selected topics 2015 (pp. 51–67). Huizen, The Netherlands: van Kessel.
Jöreskog, K. G., & Sörbom, D. (2001). PRELIS: A program for multivariate data screening and data summarization. Chicago, Il: Scientific Software.
McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189–216). Mahwah, NJ: Erlbaum.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis, and application of psychological and educational tests. The Hague, The Netherlands: Eleven International Publishing.
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge.
Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. Berlin, Germany: De Gruyter.
Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351–367). New York, NY: Springer.
Molenaar, I. W. (1991). A weighted Loevinger H-coefficient extending Mokken scaling to multicategory items. Kwantitatieve Methoden, 12, 97–117.
Molenaar, I. W. (1997). Nonparametric models for polytomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 369–380). New York, NY: Springer.
Molenaar, I. W., & Sijtsma, K. (2000). User’s manual MSP5 for Windows. Groningen, The Netherlands: iec ProGAMMA.
Muraki, E. (1993). POLYFACT [Computer program]. Princeton, NJ: Educational Testing Service.
Muraki, E., & Carlson, J. E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73–90.
Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus user’s guide (Version 7.11). Los Angeles, CA: Author.
Oort, F. J. (1993). Theory of violators: Assessing unidimensionality of psychological measures. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.), Psychometric methodology, Proceeding of the 7th European Meeting of the Psychometric Society in Trier (pp. 377–381). Stuttgart, Germany: Fischer.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
Sijtsma, K. (2009). On the use, misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Swygert, K. A., McLeod, L. D., & Thissen, D. (2001). Factor analysis for items or testlets scored in more than two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 217–250). Mahwah, NJ: Erlbaum.
Thissen, D. (2001). IRTLRDIF [Computer program]. University of North Carolina at Chapel Hill: L. L. Thurstone Psychometric Laboratory.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.
Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696–716.
Wicherts, J. M., Dolan, C. V., Hessen, D. J., Oosterveld, P., van Baal, G. C. M., Boomsma, D. L., et al. (2004). Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence, 32, 509–537.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mellenbergh, G.J. (2019). Test Dimensionality. In: Counteracting Methodological Errors in Behavioral Research. Springer, Cham. https://doi.org/10.1007/978-3-030-12272-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-12272-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74352-3
Online ISBN: 978-3-030-12272-0
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)