Estimating Classification Accuracy and Consistency Indices for Multidimensional Latent Ability

Wang, Wenyi; Song, Lihong; Ding, Shuliang; Meng, Yaru

doi:10.1007/978-3-319-38759-8_8

Wenyi Wang⁶,
Lihong Song⁷,
Shuliang Ding⁶ &
…
Yaru Meng⁸

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 167))

1560 Accesses
3 Citations

Abstract

For criterion-referenced tests, classification consistency and accuracy are viewed as important indicators for evaluating reliability and validity of classification results. Numerous procedures have been proposed in the framework of unidimensional item response theory (UIRT) to estimate these indices. Some of these were based on total sum scores, others on latent trait estimates. However, there exist very few attempts to develop them in the framework of multidimensional item response theory (MIRT). Based on previous studies, the aim of this study is first to estimate the consistency and accuracy indices of multidimensional ability estimates from a single administration of a criterion-referenced test. We also examined how Monte Carlo sample size, sample size, test length, and the correlation between the different abilities affect the estimate quality. Comparative analysis of simulation results indicated that the new indices are very desirable to evaluate test-retest consistency and correct classification rate of different decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov Chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414.
Article MathSciNet Google Scholar
Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.
Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Article Google Scholar
Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.W.Lissitz & H. Jiao (Eds.), Computers and their impact on state assessment: Recent history and predictions for the future (pp. 195–226). Charlotte, NC: Information Age.
Google Scholar
Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1–20.
Article MathSciNet MATH Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37--46.
Google Scholar
Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523.
Article Google Scholar
Douglas, K. M., & Mislevy, R. J. (2010). Estimating classification accuracy for complex decision rules based on multiple scores. Journal of Educational and Behavioral Statistics, 35(3), 280–306.
Article Google Scholar
Guo, F. (2006). Expected classification accuracy using the latent distribution. Practical Assessment, Research & Evaluation, 11(6), 1–6.
Google Scholar
Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13(4), 253--264.
Google Scholar
Huynh, H. (1990). Computation and statistical inference for decision consistency indexes based on the Rasch model. Journal of Educational Statistics, 15, 353–368.
Google Scholar
Kroehne, U., Goldhammer, F., & Partchev, I. (2014). Constrained multidimensional adaptive testing without intermixing items from different dimensions. Psychological Test and Assessment Modeling, 56(4), 348–367.
Google Scholar
LaFond, L. J. (2014). Decision consistency and accuracy indices for the bifactor and testlet response theory models Detecting Heterogeneity in Logistic Regression Models. Unpublished doctoral dissertation, University of Iowa.
Google Scholar
Lathrop, Q. N., & Cheng, Y. (2013). Two approaches to estimation of classification accuracy rate under item response theory. Applied Psychological Measurement, 37(3), 226–241.
Article Google Scholar
Lee, W.-C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1–17.
Article Google Scholar
Makransky, G., Mortensen, E. L., & Glas, C. A. W. (2012). Improving personality facet scores with multidimensional computer adaptive testing: An illustration with the Neo Pi-R. Assessment, 20(1), 3–13.
Article Google Scholar
Pommerich, M., & Nicewander, W. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216.
Article Google Scholar
R Core Team. (2015). R: A language and environment for statistical computing (Version 3.2). Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Rijmen, F., Jeon, M., von Davier, M., & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39(4), 235–256.
Article Google Scholar
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14), 1–8.
Google Scholar
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13), 1–4.
Google Scholar
Schulz, E. M., Kolen, M. J., & Nicewander, W. A. (1999). A rationale for defining achievement levels using IRT-estimated domain scores. Applied Psychological Measurement, 23(4), 347–362.
Article Google Scholar
Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80(2), 428–449.
Article MathSciNet MATH Google Scholar
Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.
Article Google Scholar
Wang, C., & Nydick, S. (2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39(2), 119–134.
Article Google Scholar
Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36(7), 602–624.
Article Google Scholar
Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Google Scholar
Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77(3), 495–523.
Article MathSciNet MATH Google Scholar
Yao, L. (2013). Classification accuracy and consistency indices for summed scores enhanced using MIRT for test of mixed item types. Retrieved March 1, 2015, from http://www.bmirt.com/8220.html.
Yao, L. (2014). Multidimensional CAT item selection methods for domain scores and composite scores with item exposure control and content constraints. Journal of Educational Measurement, 51(1), 18–38.
Article Google Scholar
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83–105.
Article MathSciNet Google Scholar
Zhang, J. (2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375–398.
Article Google Scholar

Download references

Acknowledgments

This research is supported by the China Scholarship Council (CSC No. 201509470001), the National Natural Science Foundation of China (Grant No. 31500909, 31360237, and 31160203), the Key Project of National Education Science “Twelfth Five Year Plan” of Ministry of Education of China (Grant No. DHA150285), the Humanities and Social Sciences Research Foundation of Ministry of Education of China (Grant No. 13YJC880060 and12YJA740057), the National Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB212044), Jiangxi Education Science Foundation (Grant No. 13YB032), the Scienceand Technology Research Foundation of Education Department of Jiangxi Province (GJJ13207), and the Youth Growth Fund and the Doctoral Starting up Foundation of Jiangxi Normal University. The authors thank the editor Jeffrey A. Douglas for his helpful comments and suggestions. Thanks to Prof. Hua-Hua Chang for his kindly help.

Author information

Authors and Affiliations

College of Computer Information Engineering, Jiangxi Normal University, 99 Ziyang Road, Nanchang, Jiangxi, P. R. China
Wenyi Wang & Shuliang Ding
Elementary Educational College, Jiangxi Normal University, 99 Ziyang Road, Nanchang, Jiangxi, P. R. China
Lihong Song
School of Foreign Studies, Xi’an Jiaotong University, 28 West Xianning Road, Xi’an, Shanxi, P. R. China
Yaru Meng

Authors

Wenyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Song
View author publications
You can also search for this author in PubMed Google Scholar
Shuliang Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yaru Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihong Song .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
L. Andries van der Ark
University of Wisconsin, Madison, Wisconsin, USA
Daniel M. Bolt
Education University of Hong Kong, Hong Kong, China
Wen-Chung Wang
University of Illinois, Champaign, Illinois, USA
Jeffrey A. Douglas
Umeå University, Umeå, Sweden
Marie Wiberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Song, L., Ding, S., Meng, Y. (2016). Estimating Classification Accuracy and Consistency Indices for Multidimensional Latent Ability. In: van der Ark, L., Bolt, D., Wang, WC., Douglas, J., Wiberg, M. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 167. Springer, Cham. https://doi.org/10.1007/978-3-319-38759-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-38759-8_8
Published: 05 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38757-4
Online ISBN: 978-3-319-38759-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics