Skip to main content

Estimating Classification Accuracy and Consistency Indices for Multidimensional Latent Ability

  • Conference paper
  • First Online:
Book cover Quantitative Psychology Research

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 167))

Abstract

For criterion-referenced tests, classification consistency and accuracy are viewed as important indicators for evaluating reliability and validity of classification results. Numerous procedures have been proposed in the framework of unidimensional item response theory (UIRT) to estimate these indices. Some of these were based on total sum scores, others on latent trait estimates. However, there exist very few attempts to develop them in the framework of multidimensional item response theory (MIRT). Based on previous studies, the aim of this study is first to estimate the consistency and accuracy indices of multidimensional ability estimates from a single administration of a criterion-referenced test. We also examined how Monte Carlo sample size, sample size, test length, and the correlation between the different abilities affect the estimate quality. Comparative analysis of simulation results indicated that the new indices are very desirable to evaluate test-retest consistency and correct classification rate of different decision rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov Chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414.

    Article  MathSciNet  Google Scholar 

  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.

    Article  Google Scholar 

  • Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.W.Lissitz & H. Jiao (Eds.), Computers and their impact on state assessment: Recent history and predictions for the future (pp. 195–226). Charlotte, NC: Information Age.

    Google Scholar 

  • Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1–20.

    Article  MathSciNet  MATH  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37--46.

    Google Scholar 

  • Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523.

    Article  Google Scholar 

  • Douglas, K. M., & Mislevy, R. J. (2010). Estimating classification accuracy for complex decision rules based on multiple scores. Journal of Educational and Behavioral Statistics, 35(3), 280–306.

    Article  Google Scholar 

  • Guo, F. (2006). Expected classification accuracy using the latent distribution. Practical Assessment, Research & Evaluation, 11(6), 1–6.

    Google Scholar 

  • Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13(4), 253--264.

    Google Scholar 

  • Huynh, H. (1990). Computation and statistical inference for decision consistency indexes based on the Rasch model. Journal of Educational Statistics, 15, 353–368.

    Google Scholar 

  • Kroehne, U., Goldhammer, F., & Partchev, I. (2014). Constrained multidimensional adaptive testing without intermixing items from different dimensions. Psychological Test and Assessment Modeling, 56(4), 348–367.

    Google Scholar 

  • LaFond, L. J. (2014). Decision consistency and accuracy indices for the bifactor and testlet response theory models Detecting Heterogeneity in Logistic Regression Models. Unpublished doctoral dissertation, University of Iowa.

    Google Scholar 

  • Lathrop, Q. N., & Cheng, Y. (2013). Two approaches to estimation of classification accuracy rate under item response theory. Applied Psychological Measurement, 37(3), 226–241.

    Article  Google Scholar 

  • Lee, W.-C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1–17.

    Article  Google Scholar 

  • Makransky, G., Mortensen, E. L., & Glas, C. A. W. (2012). Improving personality facet scores with multidimensional computer adaptive testing: An illustration with the Neo Pi-R. Assessment, 20(1), 3–13.

    Article  Google Scholar 

  • Pommerich, M., & Nicewander, W. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216.

    Article  Google Scholar 

  • R Core Team. (2015). R: A language and environment for statistical computing (Version 3.2). Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  • Rijmen, F., Jeon, M., von Davier, M., & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39(4), 235–256.

    Article  Google Scholar 

  • Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14), 1–8.

    Google Scholar 

  • Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13), 1–4.

    Google Scholar 

  • Schulz, E. M., Kolen, M. J., & Nicewander, W. A. (1999). A rationale for defining achievement levels using IRT-estimated domain scores. Applied Psychological Measurement, 23(4), 347–362.

    Article  Google Scholar 

  • Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80(2), 428–449.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.

    Article  Google Scholar 

  • Wang, C., & Nydick, S. (2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39(2), 119–134.

    Article  Google Scholar 

  • Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36(7), 602–624.

    Article  Google Scholar 

  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.

    Google Scholar 

  • Yao, L. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77(3), 495–523.

    Article  MathSciNet  MATH  Google Scholar 

  • Yao, L. (2013). Classification accuracy and consistency indices for summed scores enhanced using MIRT for test of mixed item types. Retrieved March 1, 2015, from http://www.bmirt.com/8220.html.

  • Yao, L. (2014). Multidimensional CAT item selection methods for domain scores and composite scores with item exposure control and content constraints. Journal of Educational Measurement, 51(1), 18–38.

    Article  Google Scholar 

  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83–105.

    Article  MathSciNet  Google Scholar 

  • Zhang, J. (2012). Calibration of response data using MIRT models with simple and mixed structures. Applied Psychological Measurement, 36(5), 375–398.

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by the China Scholarship Council (CSC No. 201509470001), the National Natural Science Foundation of China (Grant No. 31500909, 31360237, and 31160203), the Key Project of National Education Science “Twelfth Five Year Plan” of Ministry of Education of China (Grant No. DHA150285), the Humanities and Social Sciences Research Foundation of Ministry of Education of China (Grant No. 13YJC880060 and12YJA740057), the National Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB212044), Jiangxi Education Science Foundation (Grant No. 13YB032), the Scienceand Technology Research Foundation of Education Department of Jiangxi Province (GJJ13207), and the Youth Growth Fund and the Doctoral Starting up Foundation of Jiangxi Normal University. The authors thank the editor Jeffrey A. Douglas for his helpful comments and suggestions. Thanks to Prof. Hua-Hua Chang for his kindly help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihong Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, W., Song, L., Ding, S., Meng, Y. (2016). Estimating Classification Accuracy and Consistency Indices for Multidimensional Latent Ability. In: van der Ark, L., Bolt, D., Wang, WC., Douglas, J., Wiberg, M. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 167. Springer, Cham. https://doi.org/10.1007/978-3-319-38759-8_8

Download citation

Publish with us

Policies and ethics