Mastery Classification of Diagnostic Classification Models
The purpose of diagnostic classification models (DCMs) is to determine mastery or non-mastery of a set of attributes or skills. There are two statistics directly obtained from DCMs that can be used for mastery classification—the posterior marginal probabilities for attributes and the posterior probability for attribute profile.
When using the posterior marginal probabilities for mastery classification, a threshold of a probability is required to determine the mastery or non-mastery status for each attribute. It is not uncommon that a 0.5 threshold is adopted in real assessment for binary classification. However, 0.5 might not be the best choice in some cases. Therefore, a simulation-based threshold approach is proposed to evaluate several possible thresholds and even determine the optimal threshold. In addition to non-mastery and mastery, another category called the indifference region, for those probabilities around 0.5, seems justifiable. However, use of the indifference region category should be used with caution because there may not be any response vector falling in the indifference region based on the item parameters of the test.
Another statistic used for mastery classification is the posterior probability for attribute profile, which is more straightforward than the posterior marginal probability. However, it also has an issue—multiple-maximum—when a test is not well designed. The practitioners and the stakeholders of testing programs should be aware of the existence of the two potential issues when the DCMs are used for the mastery classification purpose.
- Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
- Hartz, S., Roussos, L., & Stout, W. (2002). Skills diagnosis: Theory and practice [User manual for Arpeggio software]. Princeton, NJ: Educational Testing Service.Google Scholar
- Jang, E. (2005). A validity narrative: Effects of reading skills diagnosis on teaching and learning in the context of NG TOEFL. Doctoral dissertation, University of Illinois at Urbana-Champaign.Google Scholar
- Mislevy, R. J., Almond, R. G., Yan, D., & Steinberg, L. S. (1999). Bayes nets in educational assessment: Where do the numbers come from? In K. B. Laskey & H. Prade (Eds.), Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 437–446). San Mateo, CA: Morgan Kaufmann.Google Scholar
- Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and practice. New York, NY: Guilford.Google Scholar
- von Davier, M. (2005). A general diagnostic model applied to language testing data, ETS Research Report RR-05-16. Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR-05-16.pdf
- Zhang, S. S. (2014). Statistical inference and experimental design for Q-matrix based cognitive diagnosis models. Doctoral dissertation, Columbia University.Google Scholar