Global- and Item-Level Model Fit Indices

Han, Zhuangzhuang; Johnson, Matthew S.

doi:10.1007/978-3-030-05584-4_13

Zhuangzhuang Han⁵ &
Matthew S. Johnson⁶

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

1569 Accesses
4 Citations

Abstract

One of the primary goals in cognitive diagnosis is to use the item responses from a cognitive diagnostic assessment to make inferences about what skills a test-taker has. Much of the research to date has focused on parametric inference in cognitive diagnosis models (CDMs), which requires that the parametric model used for inference does an adequate job of describing the item response distribution of the population of examinees being studied. Whatever the type of model misspecification or misfit, users of CDMs need tools to investigate model-data misfit from a variety of angles. In this chapter we separate the model fit methods into four categories defined by two aspects of the methods: (1) the level of the fit analysis, i.e., global/test-level versus item-level; and (2) the choice of the alternative model for comparison, i.e., an alternative CDM (relative fit) or a saturated categorical model (absolute fit).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactionson Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
Article Google Scholar
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005
Article Google Scholar
Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing. https://doi.org/10.1177/026553229801500201
Chalmers, R. P., & Ng, V. (2017). Plausible-value imputation statistics for detecting item misfit. Applied Psychological Measurement, 41(5), 372–387. Retrieved from https://doi.org/10.1177/0146621617692079
Article Google Scholar
Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.x
Article Google Scholar
Chernoff, H., & Lehmann, E. L. (1954). The use of maximum likelihood estimates in χ2 tests for goodness of fit. The Annals of Mathematical Statistics, 25(3), 579–586. https://doi.org/10.1214/aoms/1177728726
Article Google Scholar
Chiu, C.-Y. (2013). Statistical refinement of the q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618. Retrieved from https://doi.org/10.1177/0146621613488436
Article Google Scholar
Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429–449. Retrieved from https://doi.org/10.1111/j.1745-3984.2009.00091.x
Article Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7
Article Google Scholar
de la Torre, J., & Chiu, C.-Y. (2016, June). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. Retrieved from https://doi.org/10.1007/s11336-015-9467-8
Article Google Scholar
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640
Article Google Scholar
de la Torre, J., & Douglas, J. A. (2008, Mar). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595. Retrieved from https://doi.org/10.1007/s11336-008-9063-2
Article Google Scholar
de la Torre, J., & Lee, Y.-S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355–373. Retrieved from https://doi.org/10.1111/jedm.12022
Article Google Scholar
de la Torre, J., van der Ark, L. A., & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development. Retrieved from https://doi.org/10.1177/0748175615569110
Dzaparidze, K. O., & Nikulin, M. S. (1975). On a modification of the standard statistics of pearson. Theory of Probability & Its Applications, 19(4), 851–853. Retrieved from https://doi.org/10.1137/1119098
Article Google Scholar
Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model finess via realized discrepancies. Statistica Sinica, 6, 733–807. Retrieved from http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9951
Google Scholar
George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The <i>R</i> package <b>CDM</b> for cognitive diagnosis models. Journal of Statistical Software, 74(2). Retrieved from http://www.jstatsoft.org/v74/i02/, https://doi.org/10.18637/jss.v074.i02
Gilula, Z., & Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. Journal of the American Statistical Association, 89(426), 645–656. Retrieved from http://www.jstor.org/stable/2290867.
Article Google Scholar
Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington, DC: Degnon Associates.
Google Scholar
Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225–252. Retrieved from https://doi.org/10.1111/bmsp.12074
Article Google Scholar
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5
Article Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
Article Google Scholar
Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81. https://doi.org/10.1111/j.1745-3984.2011.00160.x
Article Google Scholar
Lei, P.-W., & Li, H. (2016). Performance of fit indices in choosing correct cognitive diagnostic models and q-matrices. Applied Psychological Measurement, 40(6), 405–417. Retrieved from https://doi.org/10.1177/0146621616647954
Article Google Scholar
Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33(8), 579–598. Retrieved from https://doi.org/10.1177/0146621609331960
Article Google Scholar
Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. Retrieved from https://doi.org/10.3102/1076998615621293
Article Google Scholar
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “Equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409
Article Google Scholar
Ma, W., Iaconangelo, C., & de la Torre, J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40(3), 200–217. https://doi.org/10.1177/0146621615621717
Article Google Scholar
Ma, W., de la Torre, J., & Sorrel, M. A. (2018). GDINA: The generalized DINA model framework. Retrieved from http://cran.r-project.org/package=GDINA
Google Scholar
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020. Retrieved from http://pubs.amstat.org/doi/abs/10.1198/016214504000002069
Article Google Scholar
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. https://doi.org/10.1080/00273171.2014.911075
Article Google Scholar
McCulloch, C. E. (1985). Relationships among some chi-square goodness of fit statistics. Communications in Statistics – Theory and Methods, 14(3), 593–603. Retrieved from https://doi.org/10.1080/03610928508828936
Article Google Scholar
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135.
Article Google Scholar
Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. Retrieved from http://www.psychologie-aktuell.com/fileadmin/download/ptam/3-2011_20110927/04_Oliveri.pdf
Google Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. Retrieved from https://doi.org/10.1177/01466216000241003
Article Google Scholar
Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika, 83(2), 251–266. Retrieved from http://biomet.oxfordjournals.org/content/83/2/251.abstract, https://doi.org/10.1093/biomet/83.2.251
Rao, K. C., & Robson, D. S. (1975). A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics, 3, 1139–1153. https://doi.org/10.1080/03610927408827216
Google Scholar
Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of P values in composite null models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2000.10474310
Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2018). CDM: Cognitive diagnosis modeling. R package version 7.1–20. https://cran.r-project.org/package=CDM
Google Scholar
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4), 1151–1172. http://projecteuclid.org/euclid.aos/1176346785, https://doi.org/10.1214/aos/1176346785
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. Retrieved from http://projecteuclid.org/euclid.aos/1176344136, https://doi.org/10.1214/aos/1176344136
Sinharay, S. (2006a). Bayesian item fit analysis for unidimensional item response theory models. The British Journal of Mathematical and Statistical Psychology, 59(Pt 2), 429–49. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17067420, https://doi.org/10.1348/000711005X66888
Sinharay, S. (2006b). Model diagnostics for Bayesian networks. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/10769986031001001
Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic Models: A case study. Educational and Psychological Measurement, 67(2), 239–257. Retrieved from http://journals.sagepub.com/doi/10.1177/0013164406292025, https://doi.org/10.1177/0013164406292025
Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12024
Sorrel, M. A., Abad, F. J., Olea, J., de la Torre, J., & Barrada, J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631. Retrieved from https://doi.org/10.1177/0146621617707510
Article Google Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 64(4), 583–616. https://doi.org/10.1111/1467-9868.00353
Article Google Scholar
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit statistic in IRT models. Journal of Educational Measurement, 37, 58–75.
Article Google Scholar
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika. https://doi.org/10.1007/s11336-013-9362-0
Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12010
van Rijn, P. W., Sinharay, S., Haberman, S. J., & Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4(1), 10. Retrieved from https://doi.org/10.1186/s40536-016-0025-3
Article Google Scholar
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307. https://doi.org/10.1348/000711007X193957
Article Google Scholar
von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS Research Report Series. https://doi.org/10.1002/ets2.12043
von Davier, M., & Haberman, S. J. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘Diagnostic’ classification Models-A commentary. Psychometrika. https://doi.org/10.1007/s11336-013-9363-z
Wang, C., Shu, Z., Shang, Z., & Xu, G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39(7), 525–538. Retrieved from https://doi.org/10.1177/0146621615583050
Article Google Scholar
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262. https://doi.org/10.1177/014662168100500212
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Human Development, Teachers College, Columbia University, New York, NY, USA
Zhuangzhuang Han
Educational Testing Service, Princeton, NJ, USA
Matthew S. Johnson

Authors

Zhuangzhuang Han
View author publications
You can also search for this author in PubMed Google Scholar
Matthew S. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew S. Johnson .

Editor information

Editors and Affiliations

National Board of Medical Examiners (NBME), Philadelphia, PA, USA
Matthias von Davier
Teachers College, Columbia University, New York, NY, USA
Young-Sun Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Han, Z., Johnson, M.S. (2019). Global- and Item-Level Model Fit Indices. In: von Davier, M., Lee, YS. (eds) Handbook of Diagnostic Classification Models. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-05584-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-05584-4_13
Published: 12 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05583-7
Online ISBN: 978-3-030-05584-4
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics