Abstract
One of the primary goals in cognitive diagnosis is to use the item responses from a cognitive diagnostic assessment to make inferences about what skills a test-taker has. Much of the research to date has focused on parametric inference in cognitive diagnosis models (CDMs), which requires that the parametric model used for inference does an adequate job of describing the item response distribution of the population of examinees being studied. Whatever the type of model misspecification or misfit, users of CDMs need tools to investigate model-data misfit from a variety of angles. In this chapter we separate the model fit methods into four categories defined by two aspects of the methods: (1) the level of the fit analysis, i.e., global/test-level versus item-level; and (2) the choice of the alternative model for comparison, i.e., an alternative CDM (relative fit) or a saturated categorical model (absolute fit).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactionson Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005
Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing. https://doi.org/10.1177/026553229801500201
Chalmers, R. P., & Ng, V. (2017). Plausible-value imputation statistics for detecting item misfit. Applied Psychological Measurement, 41(5), 372–387. Retrieved from https://doi.org/10.1177/0146621617692079
Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.x
Chernoff, H., & Lehmann, E. L. (1954). The use of maximum likelihood estimates in χ2 tests for goodness of fit. The Annals of Mathematical Statistics, 25(3), 579–586. https://doi.org/10.1214/aoms/1177728726
Chiu, C.-Y. (2013). Statistical refinement of the q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618. Retrieved from https://doi.org/10.1177/0146621613488436
Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429–449. Retrieved from https://doi.org/10.1111/j.1745-3984.2009.00091.x
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7
de la Torre, J., & Chiu, C.-Y. (2016, June). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. Retrieved from https://doi.org/10.1007/s11336-015-9467-8
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640
de la Torre, J., & Douglas, J. A. (2008, Mar). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595. Retrieved from https://doi.org/10.1007/s11336-008-9063-2
de la Torre, J., & Lee, Y.-S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355–373. Retrieved from https://doi.org/10.1111/jedm.12022
de la Torre, J., van der Ark, L. A., & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development. Retrieved from https://doi.org/10.1177/0748175615569110
Dzaparidze, K. O., & Nikulin, M. S. (1975). On a modification of the standard statistics of pearson. Theory of Probability & Its Applications, 19(4), 851–853. Retrieved from https://doi.org/10.1137/1119098
Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model finess via realized discrepancies. Statistica Sinica, 6, 733–807. Retrieved from http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9951
George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The <i>R</i> package <b>CDM</b> for cognitive diagnosis models. Journal of Statistical Software, 74(2). Retrieved from http://www.jstatsoft.org/v74/i02/, https://doi.org/10.18637/jss.v074.i02
Gilula, Z., & Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. Journal of the American Statistical Association, 89(426), 645–656. Retrieved from http://www.jstor.org/stable/2290867.
Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington, DC: Degnon Associates.
Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225–252. Retrieved from https://doi.org/10.1111/bmsp.12074
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81. https://doi.org/10.1111/j.1745-3984.2011.00160.x
Lei, P.-W., & Li, H. (2016). Performance of fit indices in choosing correct cognitive diagnostic models and q-matrices. Applied Psychological Measurement, 40(6), 405–417. Retrieved from https://doi.org/10.1177/0146621616647954
Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33(8), 579–598. Retrieved from https://doi.org/10.1177/0146621609331960
Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. Retrieved from https://doi.org/10.3102/1076998615621293
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “Equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409
Ma, W., Iaconangelo, C., & de la Torre, J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40(3), 200–217. https://doi.org/10.1177/0146621615621717
Ma, W., de la Torre, J., & Sorrel, M. A. (2018). GDINA: The generalized DINA model framework. Retrieved from http://cran.r-project.org/package=GDINA
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020. Retrieved from http://pubs.amstat.org/doi/abs/10.1198/016214504000002069
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. https://doi.org/10.1080/00273171.2014.911075
McCulloch, C. E. (1985). Relationships among some chi-square goodness of fit statistics. Communications in Statistics – Theory and Methods, 14(3), 593–603. Retrieved from https://doi.org/10.1080/03610928508828936
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135.
Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. Retrieved from http://www.psychologie-aktuell.com/fileadmin/download/ptam/3-2011_20110927/04_Oliveri.pdf
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. Retrieved from https://doi.org/10.1177/01466216000241003
Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika, 83(2), 251–266. Retrieved from http://biomet.oxfordjournals.org/content/83/2/251.abstract, https://doi.org/10.1093/biomet/83.2.251
Rao, K. C., & Robson, D. S. (1975). A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics, 3, 1139–1153. https://doi.org/10.1080/03610927408827216
Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of P values in composite null models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2000.10474310
Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2018). CDM: Cognitive diagnosis modeling. R package version 7.1–20. https://cran.r-project.org/package=CDM
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4), 1151–1172. http://projecteuclid.org/euclid.aos/1176346785, https://doi.org/10.1214/aos/1176346785
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. Retrieved from http://projecteuclid.org/euclid.aos/1176344136, https://doi.org/10.1214/aos/1176344136
Sinharay, S. (2006a). Bayesian item fit analysis for unidimensional item response theory models. The British Journal of Mathematical and Statistical Psychology, 59(Pt 2), 429–49. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17067420, https://doi.org/10.1348/000711005X66888
Sinharay, S. (2006b). Model diagnostics for Bayesian networks. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/10769986031001001
Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic Models: A case study. Educational and Psychological Measurement, 67(2), 239–257. Retrieved from http://journals.sagepub.com/doi/10.1177/0013164406292025, https://doi.org/10.1177/0013164406292025
Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12024
Sorrel, M. A., Abad, F. J., Olea, J., de la Torre, J., & Barrada, J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631. Retrieved from https://doi.org/10.1177/0146621617707510
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 64(4), 583–616. https://doi.org/10.1111/1467-9868.00353
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit statistic in IRT models. Journal of Educational Measurement, 37, 58–75.
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika. https://doi.org/10.1007/s11336-013-9362-0
Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12010
van Rijn, P. W., Sinharay, S., Haberman, S. J., & Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4(1), 10. Retrieved from https://doi.org/10.1186/s40536-016-0025-3
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307. https://doi.org/10.1348/000711007X193957
von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS Research Report Series. https://doi.org/10.1002/ets2.12043
von Davier, M., & Haberman, S. J. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘Diagnostic’ classification Models-A commentary. Psychometrika. https://doi.org/10.1007/s11336-013-9363-z
Wang, C., Shu, Z., Shang, Z., & Xu, G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39(7), 525–538. Retrieved from https://doi.org/10.1177/0146621615583050
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262. https://doi.org/10.1177/014662168100500212
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Han, Z., Johnson, M.S. (2019). Global- and Item-Level Model Fit Indices. In: von Davier, M., Lee, YS. (eds) Handbook of Diagnostic Classification Models. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-05584-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-05584-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05583-7
Online ISBN: 978-3-030-05584-4
eBook Packages: EducationEducation (R0)