Abstract
For criterion-referenced tests, classification consistency and accuracy are viewed as important indicators for evaluating reliability and validity of classification results. Numerous procedures have been proposed in the framework of unidimensional item response theory (UIRT) to estimate these indices. Some of these were based on total sum scores, others on latent trait estimates. However, there exist very few attempts to develop them in the framework of multidimensional item response theory (MIRT). Based on previous studies, the aim of this study is first to estimate the consistency and accuracy indices of multidimensional ability estimates from a single administration of a criterion-referenced test. We also examined how Monte Carlo sample size, sample size, test length, and the correlation between the different abilities affect the estimate quality. Comparative analysis of simulation results indicated that the new indices are very desirable to evaluate test-retest consistency and correct classification rate of different decision rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov Chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414.
Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.W.Lissitz & H. Jiao (Eds.), Computers and their impact on state assessment: Recent history and predictions for the future (pp. 195–226). Charlotte, NC: Information Age.
Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1–20.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37--46.
Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523.
Douglas, K. M., & Mislevy, R. J. (2010). Estimating classification accuracy for complex decision rules based on multiple scores. Journal of Educational and Behavioral Statistics, 35(3), 280–306.
Guo, F. (2006). Expected classification accuracy using the latent distribution. Practical Assessment, Research & Evaluation, 11(6), 1–6.
Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13(4), 253--264.
Huynh, H. (1990). Computation and statistical inference for decision consistency indexes based on the Rasch model. Journal of Educational Statistics, 15, 353–368.
Kroehne, U., Goldhammer, F., & Partchev, I. (2014). Constrained multidimensional adaptive testing without intermixing items from different dimensions. Psychological Test and Assessment Modeling, 56(4), 348–367.
LaFond, L. J. (2014). Decision consistency and accuracy indices for the bifactor and testlet response theory models Detecting Heterogeneity in Logistic Regression Models. Unpublished doctoral dissertation, University of Iowa.
Lathrop, Q. N., & Cheng, Y. (2013). Two approaches to estimation of classification accuracy rate under item response theory. Applied Psychological Measurement, 37(3), 226–241.
Lee, W.-C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1–17.
Makransky, G., Mortensen, E. L., & Glas, C. A. W. (2012). Improving personality facet scores with multidimensional computer adaptive testing: An illustration with the Neo Pi-R. Assessment, 20(1), 3–13.
Pommerich, M., & Nicewander, W. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216.
R Core Team. (2015). R: A language and environment for statistical computing (Version 3.2). Vienna, Austria: R Foundation for Statistical Computing.
Rijmen, F., Jeon, M., von Davier, M., & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39(4), 235–256.
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14), 1–8.
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13), 1–4.
Schulz, E. M., Kolen, M. J., & Nicewander, W. A. (1999). A rationale for defining achievement levels using IRT-estimated domain scores. Applied Psychological Measurement, 23(4), 347–362.
Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80(2), 428–449.
Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.
Wang, C., & Nydick, S. (2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39(2), 119–134.
Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36(7), 602–624.
Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77(3), 495–523.
Yao, L. (2013). Classification accuracy and consistency indices for summed scores enhanced using MIRT for test of mixed item types. Retrieved March 1, 2015, from http://www.bmirt.com/8220.html.
Yao, L. (2014). Multidimensional CAT item selection methods for domain scores and composite scores with item exposure control and content constraints. Journal of Educational Measurement, 51(1), 18–38.
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83–105.
Zhang, J. (2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375–398.
Acknowledgments
This research is supported by the China Scholarship Council (CSC No. 201509470001), the National Natural Science Foundation of China (Grant No. 31500909, 31360237, and 31160203), the Key Project of National Education Science “Twelfth Five Year Plan” of Ministry of Education of China (Grant No. DHA150285), the Humanities and Social Sciences Research Foundation of Ministry of Education of China (Grant No. 13YJC880060 and12YJA740057), the National Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB212044), Jiangxi Education Science Foundation (Grant No. 13YB032), the Scienceand Technology Research Foundation of Education Department of Jiangxi Province (GJJ13207), and the Youth Growth Fund and the Doctoral Starting up Foundation of Jiangxi Normal University. The authors thank the editor Jeffrey A. Douglas for his helpful comments and suggestions. Thanks to Prof. Hua-Hua Chang for his kindly help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, W., Song, L., Ding, S., Meng, Y. (2016). Estimating Classification Accuracy and Consistency Indices for Multidimensional Latent Ability. In: van der Ark, L., Bolt, D., Wang, WC., Douglas, J., Wiberg, M. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 167. Springer, Cham. https://doi.org/10.1007/978-3-319-38759-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-38759-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38757-4
Online ISBN: 978-3-319-38759-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)