, Volume 83, Issue 3, pp 515–537 | Cite as

Hypothesis Testing of the Q-matrix

  • Yuqi Gu
  • Jingchen Liu
  • Gongjun XuEmail author
  • Zhiliang Ying


The recent surge of interests in cognitive assessment has led to the development of cognitive diagnosis models. Central to many such models is a specification of the Q-matrix, which relates items to latent attributes that have natural interpretations. In practice, the Q-matrix is usually constructed subjectively by the test designers. This could lead to misspecification, which could result in lack of fit of the underlying statistical model. To test possible misspecification of the Q-matrix, traditional goodness of fit tests, such as the Chi-square test and the likelihood ratio test, may not be applied straightforwardly due to the large number of possible response patterns. To address this problem, this paper proposes a new statistical method to test the goodness fit of the Q-matrix, by constructing test statistics that measure the consistency between a provisional Q-matrix and the observed data for a general family of cognitive diagnosis models. Limiting distributions of the test statistics are derived under the null hypothesis that can be used for obtaining the test p-values. Simulation studies as well as a real data example are presented to demonstrate the usefulness of the proposed method.


Q-matrix diagnostic classification models hypothesis testing 



The authors thank the Editor, the Associate Editor, and four reviewers for many helpful and constructive comments. This work is partially supported by National Science Foundation (Grant No. SES-1659328, DMS-1712717, IIS-1633360, MMS-1826540), Institute of Education Sciences (Grant No. R305D160010), and Army Grant (Grant No. W911NF-15-1-0159).

Supplementary material

11336_2018_9629_MOESM1_ESM.pdf (1.2 mb)
Supplementary material 1 (pdf 1202 KB)


  1. Bartholomew, D. J., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods & Research, 27(4), 525–546.CrossRefGoogle Scholar
  2. Cai, L., Maydeu-Olivares, A., Coffman, D. L., & Thissen, D. (2006). Limited-information goodness-of-fit testing of item response theory models for sparse \(2^p\) tables. British Journal of Mathematical and Statistical Psychology, 59(1), 173–194.CrossRefPubMedGoogle Scholar
  3. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of \(Q\)-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.CrossRefPubMedGoogle Scholar
  4. Chiu, C.-Y. (2013). Statistical refinement of the \(Q\)-matrix in cognitive diagnosis. Applied Psychological Measurement, 37, 598–618.CrossRefGoogle Scholar
  5. Chiu, C., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.CrossRefGoogle Scholar
  6. de la Torre, J. (2008). An empirically-based method of \(Q\)-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343–362.CrossRefGoogle Scholar
  7. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.CrossRefGoogle Scholar
  8. de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical \(Q\)-matrix validation. Psychometrika, 81(2), 253–273.CrossRefPubMedGoogle Scholar
  9. de la Torre, J., & Douglas, J. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.CrossRefGoogle Scholar
  10. DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35, 8–26.CrossRefGoogle Scholar
  11. DeCarlo, L. T. (2012). Recognizing uncertainty in the \(Q\)-matrix via a bayesian extension of the DINA model. Applied Psychological Measurement, 36(6), 447–468.CrossRefGoogle Scholar
  12. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society Series B-Methodological, 39(1), 1–38.Google Scholar
  13. DiBello, L., Stout, W., & Roussos, L. (1995). Unified cognitive psychometric assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 361–390). Hillsdale, NJ: Erlbaum.Google Scholar
  14. Gu, Y., & Xu, G. (2018). Partial identifiability of restricted latent class models. arXiv preprint arXiv:1803.04353.
  15. Hartz, S. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Doctoral Dissertation, University of Illinois, Urbana-Champaign.Google Scholar
  16. Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the skill joint distribution. Technical report, External Diagnostic Research Group.Google Scholar
  17. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.CrossRefGoogle Scholar
  18. Junker, B., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.CrossRefGoogle Scholar
  19. Lehmann, E. L., & Romano, J. P. (2006). Testing statistical hypotheses. Berlin: Springer.Google Scholar
  20. Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy model for cognitive assessment: A variation on Tatsuoka’s rule-space approach. Journal of Educational Measurement, 41, 205–237.CrossRefGoogle Scholar
  21. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of \(Q\)-matrix. Applied Psychological Measurement, 36(7), 548–564.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Liu, J., Xu, G., & Ying, Z. (2013). Theory of self-learning \(Q\)-matrix. Bernoulli, 19(5A), 1790–1817.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Maydeu-Olivares, A. (2001). Limited information estimation and testing of thurstonian models for paired comparison data under multiple judgment sampling. Psychometrika, 66(2), 209–227.CrossRefGoogle Scholar
  24. Maydeu-Olivares, A., & Joe, H. (2005). Limited-and full-information estimation and goodness-of-fit testing in \(2^n\) contingency tables: A unified framework. Journal of the American Statistical Association, 100(471), 1009–1020.CrossRefGoogle Scholar
  25. Roussos, L. A., Templin, J. L., & Henson, R. A. (2007). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311.CrossRefGoogle Scholar
  26. Rupp, A. (2002). Feature selection for choosing and assembling measurement models: A building-block-based organization. Psychometrika, 2, 311–360.Google Scholar
  27. Rupp, A., & Templin, J. (2008a). Effects of \(q\)-matrix misspecification on parameter estimates and misclassification rates in the dina model. Educational and Psychological Measurement, 68, 78–98.CrossRefGoogle Scholar
  28. Rupp, A., & Templin, J. (2008b). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspective, 6, 219–262.Google Scholar
  29. Rupp, A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York City: Guilford Press.Google Scholar
  30. Sen, B., Banerjee, M., Woodroofe, M., et al. (2010). Inconsistency of bootstrap: The Grenander estimator. The Annals of Statistics, 38(4), 1953–1977.CrossRefGoogle Scholar
  31. Sen, B., & Xu, G. (2015). Model based bootstrap methods for interval censored data. Computational Statistics & Data Analysis, 81, 121–129.CrossRefGoogle Scholar
  32. Stout, W. (2007). Skills diagnosis using IRT-based continuous latent trait models. Journal of Educational Measurement, 44, 313–324.CrossRefGoogle Scholar
  33. Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 55–73.CrossRefGoogle Scholar
  34. Tatsuoka, K. (1990). Toward an integration of item-response theory and cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition, (pp. 453–488).Google Scholar
  35. Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Applied Statistics (JRSS-C), 51, 337–350.Google Scholar
  36. Tatsuoka, C. (2005). Corrigendum: Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(2), 465–467.CrossRefGoogle Scholar
  37. Tatsuoka, K. (2009). Cognitive assessment: An introduction to the rule space method. Boca Raton: CRC Press.Google Scholar
  38. Templin, J. (2006). CDM: Cognitive diagnosis modeling with Mplus . Available from
  39. Templin, J., He, X., Roussos, L., & Stout, W. (2003). The pseudo-item method: A simple technique for analysis of polytomous data with the fusion model. Technical report, External Diagnostic Research Group.Google Scholar
  40. Templin, J., & Henson, R. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.CrossRefPubMedGoogle Scholar
  41. Tollenaar, N., & Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap goodness-of-fit test: Full and limited information. British Journal of Mathematical and Statistical Psychology, 56(2), 271–288.CrossRefPubMedGoogle Scholar
  42. Van der Vaart, A. W. (2000). Asymptotic statistics (Vol. 3). Cambridge: Cambridge university press.Google Scholar
  43. von Davier, M. (2005). A general diagnosis model applied to language testing data. Research report, Educational Testing Service.Google Scholar
  44. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.CrossRefGoogle Scholar
  45. Xu, G. (2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45(2), 675–707.CrossRefGoogle Scholar
  46. Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association.
  47. Zhang, S. S., DeCarlo, L. T., & Ying, Z. (2013). Non-identifiability, equivalence classes, and attribute-specific classification in Q-matrix based cognitive diagnosis models. ArXiv e-prints.Google Scholar

Copyright information

© The Psychometric Society 2018

Authors and Affiliations

  • Yuqi Gu
    • 1
  • Jingchen Liu
    • 2
  • Gongjun Xu
    • 1
    Email author
  • Zhiliang Ying
    • 2
  1. 1.Department of StatisticsUniversity of MichiganAnn ArborUSA
  2. 2.Department of StatisticsColumbia UniversityNew York CityUSA

Personalised recommendations