Abstract
Testlets, which are defined as a set of items linked by a common stimulus, are commonly used in educational and psychological tests. Such a linkage may make items within a testlet locally dependent. There are three major approaches to testlet-based items. First, one can fit standard item response theory (IRT) models and ignore the possible local dependence. Second, one can transform items in a testlet into a super (polytomous) item and then fit polytomous IRT models to the transformed data. Third, one can fit testlet response models that were developed to account for the local dependence. This study compared the performance of these three approaches in recovering person measures and test reliability through simulations. It was found that the polytomous-item approach performed highly satisfactorily when data were generated from testlet response models or when data had chain effects between adjacent items. In contrast, fitting standard item response models tended to result in overestimation of test reliability when data were generated from testlet response models, and underestimation or overestimation of test reliability when the data had chain effects. Likewise, fitting testlet response models might result in underestimation or overestimation of test reliability when the data have chain effects. Thus, if person measures as well as their measurement precision (test reliability) is the major concern, the polytomous-item approach is recommended.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43, 561–573.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Baghaei, P. (2008). Local dependency and Rasch measures. Rasch Measurement Transactions, 21(3), 1105–1106.
Bradlow, E., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
English Language Institute, University of Michigan. (2006). Examination for the certificate of proficiency in English 2004–05 annual report. Ann Arbor, MI: English Language Institute, University of Michigan.
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.
Keller, L. A., Swaminathan, H., & Sireci, S. G. (2003). Evaluating scoring procedures for context-dependent item sets. Applied Measurement in Education, 16, 207–222.
Klein-Braley, C. (1997). C-Tests in the context of reduced redundancy testing: An appraisal. Language Testing, 14, 47–84.
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31, 453–477.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Samejima, F. (1969). Estimation of a latent ability using a response pattern of graded scores. Psychometrika Monographs, 17, 1–100.
Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51, 400–418.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.
Spiegelhalter, D., Thomas, A., Best, N., & Lunn, D. (2007). WinBUGS version 1.4.3: Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: a use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.
Tuerlinckx, F., & De Boeck, P. (1999). Distinguishing constant and dimension-dependent interaction: A simulation study. Applied Psychological Measurement, 23, 299–307.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.
Wainer, H., Bradlow, E., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Dordrecht: Springer Netherlands.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126–149.
Wilson, M., & Adams, R. (1995). Rasch models for item bundles. Psychometrika, 60, 181–198.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing, 27, 119–140.
Zhang, O. (2010). Polytomous IRT or Testlet Model: An evaluation of scoring models in small testlet size situations (Unpublished master’s Thesis). Gainesville, FL: University of Florida.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Wang, WC., Jin, KY. (2016). Analyses of Testlet Data. In: Zhang, Q. (eds) Pacific Rim Objective Measurement Symposium (PROMS) 2015 Conference Proceedings. Springer, Singapore. https://doi.org/10.1007/978-981-10-1687-5_13
Download citation
DOI: https://doi.org/10.1007/978-981-10-1687-5_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1686-8
Online ISBN: 978-981-10-1687-5
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)