Abstract
The percentage of examinees who are classified consistently and accurately into the proficiency levels is an important measurement property of the tests that are used to classify the candidates. Given the suspected discrepancies between the classical test theory (CTT)- and item response theory (IRT)-based single-administration decision consistency and accuracy (DC/DA) estimates, these two approaches were evaluated for accuracy and robustness in various simulated conditions by varying the test length, ability distribution, and the degree of local item dependence (LID). The CTT-based Livingston–Lewis method was found underestimating the DC indices across all conditions and more sensitive to the short tests and skewed ability distributions. The IRT-based Lee method had small biases in most conditions except a high degree of LID. The violation of LID had a much greater negative effect on the DA estimate than on the DC estimate with both methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.
Bourque, M. L., Goodman, D., Hambleton, R. K., & Han, N. (2004). Reliability estimates for the ABTE tests in elementary education, professional teaching knowledge, secondary mathematics and English/language arts (Final report). Leesburg, VA: Mid-Atlantic Psychometric Services.
Brennan, R. L. (2004). BB-CLASS: A computer program that uses the beta-binomial model for classification consistency and accuracy (Version 1.0, CASMA Research Report No. 9). Iowa City, IA: University of Iowa, Center for Advanced Studies in Measurement and Assessment. Available at http://www.education.uiowa.edu/casma
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Deng, N. (2011). Evaluating IRT- and CTT-based methods of estimating classification consistency and accuracy indices from single administrations (Unpublished doctoral dissertation). Amherst, MA: University of Massachusetts.
Hambleton, R. K., & Novick, M. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 10(3), 159–170.
Hanson, B. A., & Brennan, R. L. (1990). An investigation of classification consistency indexes estimated under alternative strong true score models. Journal of Educational Measurement, 27, 345–359.
Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13, 253–264.
Huynh, H. (1990). Computation and statistical inference for decision consistency indexes based on the Rasch model. Journal of Educational Statistics, 15, 353–368.
Lee, W. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1–17.
Lee, W., Brennan, R. L., & Wan, L. (2009). Classification consistency and accuracy for complex assessments under the compound multinomial model. Applied Psychological Measurement, 33, 374–390.
Lee, W., Hanson, B. A., & Brennan, R. L. (2002). Estimating consistency and accuracy indices for multiple classifications. Applied Psychological Measurement, 26, 412–432.
Lee, W., & Kolen, M. J. (2008). IRT-CLASS: A computer program for item response theory classification consistency and accuracy (Version 2.0). Iowa City, IA: University of Iowa, Center for Advanced Studies in Measurement and Assessment. Available at http://www.education.uiowa.edu/casma
Li, S. (2006). Evaluating the consistency and accuracy of proficiency classifications using item response theory (Unpublished dissertation). Amherst, MA: University of Massachusetts.
Livingston, S. A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.
Muraki, E., & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data [Computer program]. Chicago, IL: Scientific Software International, Inc.
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment Research & Evaluation, 7(14). Available online: http://pareonline.net/getvn.asp?v=7&n=14
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13). Available online: http://pareonline.net/getvn.asp?v=10&n=13
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika (Monograph Supplement, 17).
Subkoviak, M. J. (1976). Estimating reliability from a single administration of a criterion-referenced test. Journal of Educational Measurement, 13, 265–276.
Swaminathan, H., Hambleton, R. K., & Algina, J. (1974). Reliability of criterion referenced tests: A decision-theoretic formulation. Journal of Educational Measurement, 11, 263–267.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Amsterdam: Kluwer Academic Publishers.
Wan, L., Brennan, R. L., & Lee, W. (2007). Estimating classification consistency for complex assessments (CASMA Research Report No. 22). Iowa City, IA: University of Iowa, Center for Advanced Studies in Measurement and Assessment. Available at http://www.education.uiowa.edu/casma
Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37, 141–162.
Acknowledgment
The authors are grateful for the valuable comments from the editor Daniel Bolt, which strengthened the study considerably.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Deng, N., Hambleton, R.K. (2013). Evaluating CTT- and IRT-Based Single-Administration Estimates of Classification Consistency and Accuracy. In: Millsap, R.E., van der Ark, L.A., Bolt, D.M., Woods, C.M. (eds) New Developments in Quantitative Psychology. Springer Proceedings in Mathematics & Statistics, vol 66. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9348-8_15
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9348-8_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9347-1
Online ISBN: 978-1-4614-9348-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)