Detecting Person Misfit in Adaptive Testing
An examinee’s test score does not reveal the operation of undesirable influences of test-taking behavior such as faking on biodata questionnaires and personality tests, guessing, or knowledge of the correct answers due to test preview on achievement tests. These and other influences may result in inappropriate test scores, which may have serious consequences for practical test use, for example, in job and educational selection, where classification errors may result. In the context of item response theory (IRT) modeling, several methods have been proposed to detect item score patterns that are not in agreement with the expected item score pattern based on a particular test model. These item score patterns should be detected because scores of such persons may not be adequate descriptions of their trait level (θ). Research with respect to methods that provide information about the fit of an individual item score pattern to a test model is usually referred to as appropriateness measurement or person fit measurement. Most studies in this area are, however, in the context of paper-and-pencil (p&p) tests. As will be argued below, the application of person fit theory presented in the context of p&p tests cannot simply be generalized to a computerized adaptive test (CAT). In this chapter we introduce and review the existing literature on person fit in the context of a CAT.
KeywordsItem Response Theory Item Bank Test Taker Item Response Theory Model Computerize Adaptive Test
Unable to display preview. Download preview PDF.
- Drasgow, F., Levine, M. V. & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology 38, 67–86.Google Scholar
- Farrow, T. F. D., Reilly, R., Rahman, T. A., Herford, A. E.,Woodruff, P. W. R. & Spence, S. A. (2003). Sex and personality traits influence the difference between time taken to tell the truth or lie. Perceptual and Motor Skills, 97, 451–460.Google Scholar
- Glas C. A. W., Meijer, R. R. & van Krimpen-Stoop, E. M. L A. (1998). Statistical tests for person-misfit in computerized adaptive testing. Technical Report RR 98-01. Enschede, the Netherlands, University of Twente.Google Scholar
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff Publishing.Google Scholar
- Holden, R. R. (1998). Detecting fakers on a personnel test: Response latencies versus standard validity scales. Journal of Social Behavior and Personality, 13, 387–398.Google Scholar
- Klauer, K. C.(1995). The assessment of person fit. In G. F. Fischer & I. W. Molenaar (eds.), Rasch models: foundations, recent developments, and applications (pp. 97–110). New York: Springer-Verlag.Google Scholar
- Robin, F. (2002). Investigating the relationship between test response behavior, measurement and person fit. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
- Rosa, K., Swygert, K. A., Nelson, L. & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items-scale scores for patterns of summed scores. In D. Thissen & H. Wainer (Eds.) Test Scoring. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
- van Krimpen-Stoop, E. M. L. A. (2001). Detection of misfitting item-score patterns in computerized adaptive testing, unpublished doctoral dissertation. University of Twente, the Netherlands.Google Scholar
- van Krimpen-Stoop, E. M. L. A. & Meijer, R. R. (2000). Detecting person misfit in adaptive testing using statistical process control techniques. In W. J. van der Linden & C. A. W. Glas: New developments in computerized adaptive testing: theory and practice (pp.201–219). Boston: Kluwer-Nijhoff Publishing.Google Scholar
- Wright, B. D. & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.Google Scholar