Analyses of Testlet Data

Wang, Wen-Chung; Jin, Kuan-Yu

doi:10.1007/978-981-10-1687-5_13

Analyses of Testlet Data

Wen-Chung Wang² &
Kuan-Yu Jin³

Conference paper
First Online: 07 August 2016

549 Accesses
1 Citations

Abstract

Testlets, which are defined as a set of items linked by a common stimulus, are commonly used in educational and psychological tests. Such a linkage may make items within a testlet locally dependent. There are three major approaches to testlet-based items. First, one can fit standard item response theory (IRT) models and ignore the possible local dependence. Second, one can transform items in a testlet into a super (polytomous) item and then fit polytomous IRT models to the transformed data. Third, one can fit testlet response models that were developed to account for the local dependence. This study compared the performance of these three approaches in recovering person measures and test reliability through simulations. It was found that the polytomous-item approach performed highly satisfactorily when data were generated from testlet response models or when data had chain effects between adjacent items. In contrast, fitting standard item response models tended to result in overestimation of test reliability when data were generated from testlet response models, and underestimation or overestimation of test reliability when the data had chain effects. Likewise, fitting testlet response models might result in underestimation or overestimation of test reliability when the data have chain effects. Thus, if person measures as well as their measurement precision (test reliability) is the major concern, the polytomous-item approach is recommended.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43, 561–573.
Article Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Google Scholar
Baghaei, P. (2008). Local dependency and Rasch measures. Rasch Measurement Transactions, 21(3), 1105–1106.
Google Scholar
Bradlow, E., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Article Google Scholar
English Language Institute, University of Michigan. (2006). Examination for the certificate of proficiency in English 2004–05 annual report. Ann Arbor, MI: English Language Institute, University of Michigan.
Google Scholar
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.
Article Google Scholar
Keller, L. A., Swaminathan, H., & Sireci, S. G. (2003). Evaluating scoring procedures for context-dependent item sets. Applied Measurement in Education, 16, 207–222.
Article Google Scholar
Klein-Braley, C. (1997). C-Tests in the context of reduced redundancy testing: An appraisal. Language Testing, 14, 47–84.
Article Google Scholar
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.
Article Google Scholar
Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Article Google Scholar
Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31, 453–477.
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Article Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Samejima, F. (1969). Estimation of a latent ability using a response pattern of graded scores. Psychometrika Monographs, 17, 1–100.
Article Google Scholar
Schroeders, U., Robitzsch, A., & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: An example with C-tests. Journal of Educational Measurement, 51, 400–418.
Article Google Scholar
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.
Article Google Scholar
Spiegelhalter, D., Thomas, A., Best, N., & Lunn, D. (2007). WinBUGS version 1.4.3: Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Google Scholar
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: a use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.
Article Google Scholar
Tuerlinckx, F., & De Boeck, P. (1999). Distinguishing constant and dimension-dependent interaction: A simulation study. Applied Psychological Measurement, 23, 299–307.
Article Google Scholar
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.
Article Google Scholar
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.
Article Google Scholar
Wainer, H., Bradlow, E., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). Dordrecht: Springer Netherlands.
Google Scholar
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Book Google Scholar
Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126–149.
Google Scholar
Wilson, M., & Adams, R. (1995). Rasch models for item bundles. Psychometrika, 60, 181–198.
Article Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
Article Google Scholar
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing, 27, 119–140.
Article Google Scholar
Zhang, O. (2010). Polytomous IRT or Testlet Model: An evaluation of scoring models in small testlet size situations (Unpublished master’s Thesis). Gainesville, FL: University of Florida.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychological Studies, Education University of Hong Kong, 10 Lo Ping Road, Tai Po, New Territories, Hong Kong
Wen-Chung Wang
Education University of Hong Kong, Tai Po, Hong Kong
Kuan-Yu Jin

Authors

Wen-Chung Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Yu Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Chung Wang .

Editor information

Editors and Affiliations

College of Foreign Studies, Jiaxing University College of Foreign Studies, Jiaxing, Zhejiang, China
Quan Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, WC., Jin, KY. (2016). Analyses of Testlet Data. In: Zhang, Q. (eds) Pacific Rim Objective Measurement Symposium (PROMS) 2015 Conference Proceedings. Springer, Singapore. https://doi.org/10.1007/978-981-10-1687-5_13

Download citation

DOI: https://doi.org/10.1007/978-981-10-1687-5_13
Published: 07 August 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1686-8
Online ISBN: 978-981-10-1687-5
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics