, Volume 83, Issue 3, pp 538–562 | Cite as

Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

  • Yunxiao Chen
  • Xiaoou Li
  • Jingchen Liu
  • Zhiliang Ying


Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.


item response theory local dependence robust measurement differential item functioning graphical model Ising model pseudo-likelihood regularized estimator Eysenck personality questionnaire-revised 



This research was funded by NSF grant DMS-1712657, NSF grant SES-1323977, NSF grant IIS-1633360, Army Research Office grant W911NF-15-1-0159, and NIH grant R01GM047845.

Supplementary material

11336_2018_9610_MOESM1_ESM.pdf (183 kb)
Supplementary material 1 (pdf 183 KB)


  1. Anderson, C. J., & Vermunt, J. K. (2000). Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociological Methodology, 30, 81–121.CrossRefGoogle Scholar
  2. Anderson, C. J., & Yu, H.-T. (2007). Log-multiplicative association models as item response models. Psychometrika, 72, 5–23.CrossRefGoogle Scholar
  3. Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9, 567–607.CrossRefGoogle Scholar
  4. Belloni, A., & Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli, 19, 521–547.CrossRefGoogle Scholar
  5. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological), 36, 192–236.Google Scholar
  6. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.Google Scholar
  7. Boschloo, L., van Borkulo, C. D., Rhemtulla, M., Keyes, K. M., Borsboom, D., & Schoevers, R. A. (2015). The network structure of symptoms of the diagnostic and statistical manual of mental disorders. PLoS One, 10, e0137621.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.CrossRefGoogle Scholar
  9. Braeken, J. (2011). A boundary mixture approach to violations of conditional independence. Psychometrika, 76, 57–76.CrossRefGoogle Scholar
  10. Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copula functions for residual dependency. Psychometrika, 72, 393–411.CrossRefGoogle Scholar
  11. Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248.CrossRefPubMedPubMedCentralGoogle Scholar
  12. Chen, Y. (2016). Latent variable modeling and statistical learning. Ph.D. thesis, Columbia University. Available at
  13. Chen, Y., Li, X., Liu, J., & Ying, Z. (2016) A fused latent and graphical model for multivariate binary data. Available at arXiv:1606.08925v1.pdf. ArXiv preprint.
  14. Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.CrossRefGoogle Scholar
  15. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015a). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.CrossRefPubMedGoogle Scholar
  16. Chen, Y., Liu, J., & Ying, Z. (2015b). Online item calibration for Q-matrix in CD-CAT. Applied Psychological Measurement, 39, 5–15.CrossRefPubMedGoogle Scholar
  17. Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.CrossRefGoogle Scholar
  18. Cramer, A. O., Sluis, S., Noordhof, A., Wichers, M., Geschwind, N., Aggen, S. H., et al. (2012). Dimensions of normal personality as networks in search of equilibrium: You can’t like parties if you don’t like people. European Journal of Personality, 26, 414–431.CrossRefGoogle Scholar
  19. Cramer, A. O., Waldorp, L. J., van der Maas, H. L., & Borsboom, D. (2010). Complex realities require complex theories: Refining and extending the network approach to mental disorders. Behavioral and Brain Sciences, 33, 178–193.CrossRefGoogle Scholar
  20. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.Google Scholar
  21. Epskamp, S., Maris, G. K., Waldorp, L. J., & Borsboom, D. (2016). Network psychometrics. arXiv preprint arXiv:1609.02818.
  22. Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network pschometrics: Combining network and latent variable models. Psychometrika, 82, 904–927.CrossRefPubMedGoogle Scholar
  23. Eysenck, S., & Barrett, P. (2013). Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences, 54, 485–489.CrossRefGoogle Scholar
  24. Eysenck, S. B., Eysenck, H. J., & Barrett, P. (1985). A revised version of the psychoticism scale. Personality and Individual Differences, 6, 21–29.CrossRefGoogle Scholar
  25. Ferrara, S., Huynh, H., & Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large scale hands-on science performance assessment. Journal of Educational Measurement, 36, 119–140.CrossRefGoogle Scholar
  26. Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems (pp 604–612).Google Scholar
  27. Fried, E. I., Bockting, C., Arjadi, R., Borsboom, D., Amshoff, M., Cramer, A. O., et al. (2015). From loss to loneliness: The relationship between bereavement and depressive symptoms. Journal of Abnormal Psychology, 124, 256–265.CrossRefPubMedGoogle Scholar
  28. Gibbons, R. D., Bock, R. D., Hedeker, D., Weiss, D. J., Segawa, E., Bhaumik, D. K., et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.CrossRefGoogle Scholar
  29. Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.CrossRefGoogle Scholar
  30. Holland, P. W. (1990). The Dutch identity: A new tool for the study of item response models. Psychometrika, 55, 5–18.CrossRefGoogle Scholar
  31. Holland, P. W., & Wainer, H. (2012). Differential item functioning. New York, NY: Routledge.CrossRefGoogle Scholar
  32. Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.CrossRefGoogle Scholar
  33. Ip, E. H. (2002). Locally dependent latent trait model and the Dutch identity revisited. Psychometrika, 67, 367–386.CrossRefGoogle Scholar
  34. Ip, E. H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.CrossRefPubMedGoogle Scholar
  35. Ip, E. H., Wang, Y. J., De Boeck, P., & Meulders, M. (2004). Locally dependent latent trait model for polytomous responses with application to inventory of hostility. Psychometrika, 69, 191–216.CrossRefGoogle Scholar
  36. Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik A Hadrons and Nuclei, 31, 253–258.Google Scholar
  37. Knowles, E. S., & Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychological Assessment, 12, 245–252.CrossRefPubMedGoogle Scholar
  38. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT press.Google Scholar
  39. Kruis, J., & Maris, G. (2016). Three representations of the Ising model. Scientific Reports, 6(34175), 1–11.Google Scholar
  40. Laird, N. M. (1991). Topics in likelihood-based methods for longitudinal data analysis. Statistica Sinica, 1, 33–50.Google Scholar
  41. Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24, 230–253.CrossRefPubMedGoogle Scholar
  42. Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.CrossRefGoogle Scholar
  43. Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82, 523–527.CrossRefPubMedGoogle Scholar
  44. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564.CrossRefPubMedPubMedCentralGoogle Scholar
  45. Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19, 1790–1817.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  47. Marsman, M., Maris, G., Bechger, T., & Glas, C. (2015). Bayesian inference for low-rank Ising networks. Scientific Reports, 5(9050), 1–7.Google Scholar
  48. McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data. Iowa City, IA: American College Testing.Google Scholar
  49. Pan, J., Ip, E. H., & Dubé, L. (2017). An alternative to post hoc model modification in confirmatory factor analysis: The bayesian lasso. Psychological Methods, 22, 687–704.CrossRefPubMedGoogle Scholar
  50. Parikh, N., & Boyd, S. P. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1, 127–239.CrossRefGoogle Scholar
  51. Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
  52. Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319.CrossRefGoogle Scholar
  53. Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.CrossRefGoogle Scholar
  54. Reise, S. P., Horan, W. P., & Blanchard, J. J. (2011). The challenges of fitting an item response theory model to the social anhedonia scale. Journal of Personality Assessment, 93, 213–224.CrossRefPubMedPubMedCentralGoogle Scholar
  55. Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19–31.CrossRefPubMedGoogle Scholar
  56. Rhemtulla, M., Fried, E. I., Aggen, S. H., Tuerlinckx, F., Kendler, K. S., & Borsboom, D. (2016). Network analysis of substance abuse and dependence symptoms. Drug and Alcohol Dependence, 161, 230–237.CrossRefPubMedPubMedCentralGoogle Scholar
  57. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.CrossRefGoogle Scholar
  58. Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54, 93–105.CrossRefGoogle Scholar
  59. Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via \(L_1\) regularization. Psychometrika, 81, 921–939.CrossRefPubMedGoogle Scholar
  60. van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., et al. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(5918), 1–10.Google Scholar
  61. van der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychological Review, 113, 842–861.CrossRefPubMedGoogle Scholar
  62. Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). New York, NY: Springer.CrossRefGoogle Scholar
  63. Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 9, 126–149.CrossRefGoogle Scholar
  64. Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492.CrossRefGoogle Scholar
  65. Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRefGoogle Scholar
  66. Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2018

Authors and Affiliations

  1. 1.Emory UniversityAtlantaUSA
  2. 2.University of MinnesotaMinneapolisUSA
  3. 3.Columbia UniversityNew YorkUSA

Personalised recommendations