Effects of Discontinue Rules on Psychometric Properties of Test Scores
- 88 Downloads
This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808–826, 2012. https://doi.org/10.1177/0013164412441937) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808–826, 2012. https://doi.org/10.1177/0013164412441937) agree with results presented by DeAyala et al. (J Educ Meas 38:213–234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795–819, 2017. https://doi.org/10.1007/s11336-016-9544-7) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.
Keywordsdiscontinue rule ignorability bias local dependency DIF
- Bracken, B. A., & McCallum, R. S. (2015). Universal nonverbal intelligence test (2nd ed.). Itasca, IL: Riverside Publishers.Google Scholar
- Chen, H., Yamamoto, K., & von Davier, M. (2014). Controlling multistage testing exposure rates in international large-scale assessments. In D. L. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications. New York: CRC Press.Google Scholar
- Glas, C. A. W. (2010). Item parameter estimation and item fit analysis. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). New York: Springer.Google Scholar
- Holland, P. W., & Thayer, D. T. (1986). Differential item functioning and the Mantel–Haenzel procedure. ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1986.tb00186.x.
- Homack, S. R., & Reynolds, C. R. (2007). Essentials of assessment with brief intelligence tests. Hoboken: Wiley. ISBN: 978-0-471-26412-5.Google Scholar
- Kaufman, A. S., & Kaufman, N. L. (2004). Manual: Kaufman assessment battery for children (2nd ed.). Circle Pines, MN: AGS Publishing.Google Scholar
- Kaufman, A. S., & Kaufman, N. L. (2014). Kaufman adolescent and adult intelligence test. Encyclopedia of Special Education. https://doi.org/10.1002/9781118660584.ese1323.
- Little, R. J. A. (1988). Missing-data adjustments in large surveys. Journal of Business and Economic Statistics, 6, 287–296.Google Scholar
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
- Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.Google Scholar
- Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 1996, i–36. https://doi.org/10.1002/j.2333-8504.1996.tb01708.x.CrossRefGoogle Scholar
- Riverside Publishing Company. (2003). Stanford-Binet intelligence scales (SB5) (5th edn). Itasca, IL.Google Scholar
- Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.Google Scholar
- Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business and Economic Statistics, 4, 87–94.Google Scholar
- Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland Publishing Company.Google Scholar
- van der Linden, W. (ed.) (2016). Handbook of item response theory (Vol. 1, 2nd edn). Boca Raton: CRC Press.Google Scholar
- von Davier, M. (2005). A general diagnostic model applied to language testing data. In Research report RR-05-16. Princeton, NJ: ETS.Google Scholar
- von Davier, M. (2016b). CTT and No-DIF and ? = (almost) Rasch model. Chapter 14. In: M. Rosen, K. Y. Hansen, U. Wolff (Eds.). Cognitive abilities and educational outcomes: A festschrift in Honour of Jan-Eric Gustafsson (pp. 249–272). A Volume in the Springer Book Series: Methodology of Educational Measurement and Assessment.Google Scholar
- von Davier, M., & Rost, J. (1995). Polytomous mixed Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations, recent developments, and applications (pp. 371–379). New York: Springer.Google Scholar
- Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89–98). New York: Waxman.Google Scholar