Multidimensional IRT Models to Analyze Learning Outcomes of Italian Students at the End of Lower Secondary School

  • Mariagiulia MatteucciEmail author
  • Stefania Mignani
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 89)


In this paper, different multidimensional IRT models are compared in order to choose the best approach to explain response data on Italian student assessment at the end of lower secondary school. The results show that the additive model with three specific dimensions (reading comprehension, grammar, and mathematics abilities) and an overall ability is able to recover the test structure meaningfully. In this model, the overall ability compensates for the specific ability (or vice versa) in order to determine the probability of a correct response. Given the item characteristics, the overall ability is interpreted as a reasoning and thinking capability. Model estimation is conducted via Gibbs sampler within a Bayesian approach, which allows the use of Bayesian model comparison techniques such as posterior predictive model checking for model comparison and fit.


Item response theory Multidimensional models Gibbs sampling Student assessment 



This research has been partially funded by the Italian Ministry of Education with the FIRB (“Futuro in ricerca”) 2012 project on “Mixture and latent variable models for causal-inference and analysis of socio-economic data.”


  1. Albert JH (1992) Bayesian estimation of normal ogive item response curves using Gibbs sampling. J Educ Stat 17:251–269CrossRefGoogle Scholar
  2. Bafumi J, Gelman A, Park DK, Kaplan N (2005) Practical issues in implementing and understanding Bayesian ideal point estimation. Polit Anal 13:171–187CrossRefGoogle Scholar
  3. Béguin AA, Glas CAW (2001) MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika 66:541–562CrossRefzbMATHMathSciNetGoogle Scholar
  4. Chen FF, West SG, Sousa KH (2006) A comparison of bifactor and second-order models of quality of life. Multivar Behav Res 41(2):189–225CrossRefGoogle Scholar
  5. Cowles MK, Carlin BP (1996) Markov chain Monte Carlo convergence diagnostics: a comparative review. J Am Stat Assoc 91:883–904CrossRefzbMATHMathSciNetGoogle Scholar
  6. de la Torre J, Patz RJ (2005) Making the most of what we have: a practical application of multidimensional item response theory in test scoring. J Educ Behav Stat 30(3):295–311CrossRefGoogle Scholar
  7. de la Torre J, Song H (2009) Simultaneous estimation of overall and domain abilities: a higher-order IRT model approach. Appl Psychol Meas 33:620–639CrossRefGoogle Scholar
  8. Edwards MC (2010) A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika 75:474–497CrossRefzbMATHMathSciNetGoogle Scholar
  9. Fox JP, Glas CAW (2001) Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66:271–288CrossRefzbMATHMathSciNetGoogle Scholar
  10. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741CrossRefzbMATHGoogle Scholar
  11. Gibbons RD, Hedeker DR (1992) Full-information item bi-factor analysis. Psychometrika 57:423–436CrossRefzbMATHGoogle Scholar
  12. Grek S (2009) Governing by numbers: the PISA effect in Europe. J Educ Policy 24(1):23–37CrossRefGoogle Scholar
  13. Hartig J, Hohler J (2009) Multidimensional IRT models for the assessment of competencies. Stud Educ Eval 35:57–63CrossRefGoogle Scholar
  14. Holzinger KJ, Swineford F (1937) The bi-factor method. Psychometrika 2:41–54CrossRefGoogle Scholar
  15. Huang HY, Wang WC, Chen PH, Su CM (2013) Higher-order item response models for hierarchical latent traits. Appl Psychol Meas 37(8):619–637Google Scholar
  16. Koeppen K, Hartig J, Klieme E, Leutner D (2008) Current issues in competence modeling and assessment. J Psychol 216(2):61–73Google Scholar
  17. Lord FM, Novick MR (1968) Statistical theories of mental test scores. Addison-Wesley, ReadingzbMATHGoogle Scholar
  18. Reckase M (2009) Multidimensional item response theory. Springer, New YorkCrossRefGoogle Scholar
  19. Reise SP, Moore TN, Haviland MG (2010) Bi-factor models and rotations: exploring the extent to which multidimensional data yield univocal scale scores. J Pers Assess 92(6):544–559CrossRefGoogle Scholar
  20. Sahu SK (2002) Bayesian estimation and model choice in item response models. J Stat Comput Simulat 72:217–232CrossRefzbMATHMathSciNetGoogle Scholar
  21. Schmid J, Leiman JM (1957) The development of hierarchical factor solutions. Psychometrika 22:53–61CrossRefzbMATHGoogle Scholar
  22. Sheng Y (2008a) Markov chain Monte Carlo estimation of normal ogive IRT models in MATLAB. J Stat Softw 25(8):1–15Google Scholar
  23. Sheng Y (2008b) A MATLAB package for Markov chain Monte Carlo with a multi-unidimensional IRT model. J Stat Soft 28(10):1–20Google Scholar
  24. Sheng Y (2010) Bayesian estimation of MIRT models with general and specific latent traits in MATLAB. J Stat Soft 34(10):1–27Google Scholar
  25. Sheng Y, Wikle C (2007) Comparing multiunidimensional and unidimensional item response theory models. Educ Psychol Meas 67(6):899–919CrossRefMathSciNetGoogle Scholar
  26. Sheng Y, Wikle C (2008) Bayesian multidimensional IRT models with an hierarchical structure. Educ Psychol Meas 68(3):413–430CrossRefMathSciNetGoogle Scholar
  27. Sheng Y, Wikle C (2009) Bayesian IRT models incorporating general and specific abilities. Behaviormetrika 36(1):27–48CrossRefzbMATHMathSciNetGoogle Scholar
  28. Sinharay S, Stern HS (2003) Posterior predictive model checking in hierarchical models. J Stat Plan Inf 111:209–221CrossRefzbMATHMathSciNetGoogle Scholar
  29. Sinharay S, Johnson MS, Stern HS (2006) Posterior predictive assessment of item response theory models. Appl Psychol Meas 30:298–321CrossRefMathSciNetGoogle Scholar
  30. Spiegelhalter D, Best N, Carlin B, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc: Ser B 64:583–640CrossRefzbMATHGoogle Scholar
  31. van der Linden WJ, Hambleton RK (1997) Handbook of modern item response theory. Springer, New YorkzbMATHGoogle Scholar
  32. Wang W-C, Chen P-H, Cheng Y-Y (2004) Improving measurement precision of test batteries using multidimensional item response models. Psychol Methods 9:116–136CrossRefGoogle Scholar
  33. Wang W-C, Yao G, Tsai Y-J, Wang J-D, Hsieh C-L (2006) Validating, improving reliability, and estimating correlation of the four subscales in the WHOQOL-BREF using multidimensional Rasch analysis. Qual Life Res 15:607–620CrossRefGoogle Scholar
  34. Yung YF, Thissen D, McLeod LD (1999) On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika 64:113–128CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Statistical SciencesUniversity of BolognaBolognaItaly

Personalised recommendations