Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang

doi:10.1007/s11336-018-9610-4

Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

Published: 12 March 2018

Volume 83, pages 538–562, (2018)
Cite this article

Psychometrika Aims and scope Submit manuscript

Yunxiao Chen¹,
Xiaoou Li²,
Jingchen Liu ORCID: orcid.org/0000-0002-4937-2601³ &
…
Zhiliang Ying³

1283 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Notes

An R package and example code for the proposed approach can be downloaded from http://www.scientifichpc.com/flagirt.html.

References

Anderson, C. J., & Vermunt, J. K. (2000). Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociological Methodology, 30, 81–121.
Article Google Scholar
Anderson, C. J., & Yu, H.-T. (2007). Log-multiplicative association models as item response models. Psychometrika, 72, 5–23.
Article Google Scholar
Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9, 567–607.
Article Google Scholar
Belloni, A., & Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli, 19, 521–547.
Article Google Scholar
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological), 36, 192–236.
Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.
Google Scholar
Boschloo, L., van Borkulo, C. D., Rhemtulla, M., Keyes, K. M., Borsboom, D., & Schoevers, R. A. (2015). The network structure of symptoms of the diagnostic and statistical manual of mental disorders. PLoS One, 10, e0137621.
Article PubMed PubMed Central Google Scholar
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Article Google Scholar
Braeken, J. (2011). A boundary mixture approach to violations of conditional independence. Psychometrika, 76, 57–76.
Article Google Scholar
Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copula functions for residual dependency. Psychometrika, 72, 393–411.
Article Google Scholar
Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248.
Article PubMed PubMed Central Google Scholar
Chen, Y. (2016). Latent variable modeling and statistical learning. Ph.D. thesis, Columbia University. Available at http://academiccommons.columbia.edu/catalog/ac:198122.
Chen, Y., Li, X., Liu, J., & Ying, Z. (2016) A fused latent and graphical model for multivariate binary data. Available at arXiv:1606.08925v1.pdf. ArXiv preprint.
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Article Google Scholar
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015a). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.
Article PubMed Google Scholar
Chen, Y., Liu, J., & Ying, Z. (2015b). Online item calibration for Q-matrix in CD-CAT. Applied Psychological Measurement, 39, 5–15.
Article PubMed Google Scholar
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.
Article Google Scholar
Cramer, A. O., Sluis, S., Noordhof, A., Wichers, M., Geschwind, N., Aggen, S. H., et al. (2012). Dimensions of normal personality as networks in search of equilibrium: You can’t like parties if you don’t like people. European Journal of Personality, 26, 414–431.
Article Google Scholar
Cramer, A. O., Waldorp, L. J., van der Maas, H. L., & Borsboom, D. (2010). Complex realities require complex theories: Refining and extending the network approach to mental disorders. Behavioral and Brain Sciences, 33, 178–193.
Article Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Google Scholar
Epskamp, S., Maris, G. K., Waldorp, L. J., & Borsboom, D. (2016). Network psychometrics. arXiv preprint arXiv:1609.02818.
Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network pschometrics: Combining network and latent variable models. Psychometrika, 82, 904–927.
Article PubMed Google Scholar
Eysenck, S., & Barrett, P. (2013). Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences, 54, 485–489.
Article Google Scholar
Eysenck, S. B., Eysenck, H. J., & Barrett, P. (1985). A revised version of the psychoticism scale. Personality and Individual Differences, 6, 21–29.
Article Google Scholar
Ferrara, S., Huynh, H., & Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large scale hands-on science performance assessment. Journal of Educational Measurement, 36, 119–140.
Article Google Scholar
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems (pp 604–612).
Fried, E. I., Bockting, C., Arjadi, R., Borsboom, D., Amshoff, M., Cramer, A. O., et al. (2015). From loss to loneliness: The relationship between bereavement and depressive symptoms. Journal of Abnormal Psychology, 124, 256–265.
Article PubMed Google Scholar
Gibbons, R. D., Bock, R. D., Hedeker, D., Weiss, D. J., Segawa, E., Bhaumik, D. K., et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.
Article Google Scholar
Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.
Article Google Scholar
Holland, P. W. (1990). The Dutch identity: A new tool for the study of item response models. Psychometrika, 55, 5–18.
Article Google Scholar
Holland, P. W., & Wainer, H. (2012). Differential item functioning. New York, NY: Routledge.
Book Google Scholar
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.
Article Google Scholar
Ip, E. H. (2002). Locally dependent latent trait model and the Dutch identity revisited. Psychometrika, 67, 367–386.
Article Google Scholar
Ip, E. H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.
Article PubMed Google Scholar
Ip, E. H., Wang, Y. J., De Boeck, P., & Meulders, M. (2004). Locally dependent latent trait model for polytomous responses with application to inventory of hostility. Psychometrika, 69, 191–216.
Article Google Scholar
Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik A Hadrons and Nuclei, 31, 253–258.
Google Scholar
Knowles, E. S., & Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychological Assessment, 12, 245–252.
Article PubMed Google Scholar
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT press.
Google Scholar
Kruis, J., & Maris, G. (2016). Three representations of the Ising model. Scientific Reports, 6(34175), 1–11.
Google Scholar
Laird, N. M. (1991). Topics in likelihood-based methods for longitudinal data analysis. Statistica Sinica, 1, 33–50.
Google Scholar
Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24, 230–253.
Article PubMed Google Scholar
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.
Article Google Scholar
Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82, 523–527.
Article PubMed Google Scholar
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564.
Article PubMed PubMed Central Google Scholar
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19, 1790–1817.
Article PubMed PubMed Central Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Google Scholar
Marsman, M., Maris, G., Bechger, T., & Glas, C. (2015). Bayesian inference for low-rank Ising networks. Scientific Reports, 5(9050), 1–7.
Google Scholar
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data. Iowa City, IA: American College Testing.
Google Scholar
Pan, J., Ip, E. H., & Dubé, L. (2017). An alternative to post hoc model modification in confirmatory factor analysis: The bayesian lasso. Psychological Methods, 22, 687–704.
Article PubMed Google Scholar
Parikh, N., & Boyd, S. P. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1, 127–239.
Article Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319.
Article Google Scholar
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Book Google Scholar
Reise, S. P., Horan, W. P., & Blanchard, J. J. (2011). The challenges of fitting an item response theory model to the social anhedonia scale. Journal of Personality Assessment, 93, 213–224.
Article PubMed PubMed Central Google Scholar
Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19–31.
Article PubMed Google Scholar
Rhemtulla, M., Fried, E. I., Aggen, S. H., Tuerlinckx, F., Kendler, K. S., & Borsboom, D. (2016). Network analysis of substance abuse and dependence symptoms. Drug and Alcohol Dependence, 161, 230–237.
Article PubMed PubMed Central Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Article Google Scholar
Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54, 93–105.
Article Google Scholar
Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via \(L_1\) regularization. Psychometrika, 81, 921–939.
Article PubMed Google Scholar
van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., et al. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(5918), 1–10.
Google Scholar
van der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychological Review, 113, 842–861.
Article PubMed Google Scholar
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). New York, NY: Springer.
Chapter Google Scholar
Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 9, 126–149.
Article Google Scholar
Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492.
Article Google Scholar
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
Article Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
Article Google Scholar

Download references

Acknowledgements

This research was funded by NSF grant DMS-1712657, NSF grant SES-1323977, NSF grant IIS-1633360, Army Research Office grant W911NF-15-1-0159, and NIH grant R01GM047845.

Author information

Authors and Affiliations

Emory University, Atlanta, USA
Yunxiao Chen
University of Minnesota, Minneapolis, USA
Xiaoou Li
Columbia University, New York, USA
Jingchen Liu & Zhiliang Ying

Authors

Yunxiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoou Li
View author publications
You can also search for this author in PubMed Google Scholar
Jingchen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiliang Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingchen Liu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 183 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Li, X., Liu, J. et al. Robust Measurement via A Fused Latent and Graphical Item Response Theory Model. Psychometrika 83, 538–562 (2018). https://doi.org/10.1007/s11336-018-9610-4

Download citation

Received: 12 December 2016
Revised: 26 February 2018
Published: 12 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11336-018-9610-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 183 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 183 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation