Detecting Person Misfit in Adaptive Testing

  • Rob R. Meijer
  • Edith M. L. A. van Krimpen-Stoop
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


An examinee’s test score does not reveal the operation of undesirable influences of test-taking behavior such as faking on biodata questionnaires and personality tests, guessing, or knowledge of the correct answers due to test preview on achievement tests. These and other influences may result in inappropriate test scores, which may have serious consequences for practical test use, for example, in job and educational selection, where classification errors may result. In the context of item response theory (IRT) modeling, several methods have been proposed to detect item score patterns that are not in agreement with the expected item score pattern based on a particular test model. These item score patterns should be detected because scores of such persons may not be adequate descriptions of their trait level (θ). Research with respect to methods that provide information about the fit of an individual item score pattern to a test model is usually referred to as appropriateness measurement or person fit measurement. Most studies in this area are, however, in the context of paper-and-pencil (p&p) tests. As will be argued below, the application of person fit theory presented in the context of p&p tests cannot simply be generalized to a computerized adaptive test (CAT). In this chapter we introduce and review the existing literature on person fit in the context of a CAT.


Item Response Theory Item Bank Test Taker Item Response Theory Model Computerize Adaptive Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bradlow, E. T., Weiss, R. E., Cho, M. (1998). Identification of outliers in computerized adaptive testing. Journal of the American Statistical Association, 93, 910–919.CrossRefGoogle Scholar
  2. Drasgow, F., Levine, M. V. & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology 38, 67–86.Google Scholar
  3. Farrow, T. F. D., Reilly, R., Rahman, T. A., Herford, A. E.,Woodruff, P. W. R. & Spence, S. A. (2003). Sex and personality traits influence the difference between time taken to tell the truth or lie. Perceptual and Motor Skills, 97, 451–460.Google Scholar
  4. Glas C. A. W., Meijer, R. R. & van Krimpen-Stoop, E. M. L A. (1998). Statistical tests for person-misfit in computerized adaptive testing. Technical Report RR 98-01. Enschede, the Netherlands, University of Twente.Google Scholar
  5. Good, P. I. (2001). Applying statistics in the courtroom. Boca Raton, FL: Chapman & Hall.CrossRefGoogle Scholar
  6. Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff Publishing.Google Scholar
  7. Holden, R. R. (1998). Detecting fakers on a personnel test: Response latencies versus standard validity scales. Journal of Social Behavior and Personality, 13, 387–398.Google Scholar
  8. Klauer, K. C.(1995). The assessment of person fit. In G. F. Fischer & I. W. Molenaar (eds.), Rasch models: foundations, recent developments, and applications (pp. 97–110). New York: Springer-Verlag.Google Scholar
  9. Knowles, E. S. & Condon, C. A. (1999). Why people say “Yes”: A dual-process theory of acquiesence. Journal of Personality and Social Psychology, 77, 379–386.CrossRefGoogle Scholar
  10. Levine, M. V. & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika,53, 161–176.MATHCrossRefMathSciNetGoogle Scholar
  11. Levine, M. V. & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics,4, 269–290.CrossRefGoogle Scholar
  12. Lord, F. M. & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 453–461.CrossRefGoogle Scholar
  13. McLeod, L. D. & Lewis, C. (1999). Detecting item memorization in the CAT environment. Applied Psychological Measurement, 23, 147–160.CrossRefGoogle Scholar
  14. McLeod, L. D., Lewis, C. & Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psychological Measurement, 27, 121–137.CrossRefMathSciNetGoogle Scholar
  15. Meijer, R. R. (2004). Using patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns. Journal of Educational Measurement, 41, 119–136.CrossRefGoogle Scholar
  16. Meijer, R. R., Molenaar, I. W. & Sijtsma, K. (1994). Item, test, person and group characteristics and their influence on nonparametric appropriateness measurement. Applied Psychological Measurement,18, 111–120.CrossRefGoogle Scholar
  17. Meijer, R. R. & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.CrossRefMathSciNetGoogle Scholar
  18. Meijer, R. R. & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education,8, 261–272.CrossRefGoogle Scholar
  19. Nering, M. L. (1997). The distribution of indexes of person fit within the computerized adaptive testing environment. Applied Psychological Measurement,21, 115–127.CrossRefGoogle Scholar
  20. Page, E. S. (1954). Continuous inspection schemes. Biometrika,41, 100–115.MATHMathSciNetGoogle Scholar
  21. Rammsayer, T. (1999). Timing behavior in computerized testing: Response times as a function of correct and incorrect answers. Diagnostica, 45, 178–183.CrossRefGoogle Scholar
  22. Reise, S. P. & Waller, N. G. (1993). Traitedness and the assessment of response pattern scalability. Journal of Personality and Social Psychology, 65, 143–151.CrossRefGoogle Scholar
  23. Robin, F. (2002). Investigating the relationship between test response behavior, measurement and person fit. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
  24. Rosa, K., Swygert, K. A., Nelson, L. & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items-scale scores for patterns of summed scores. In D. Thissen & H. Wainer (Eds.) Test Scoring. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  25. Snijders, T. A. B. (2001). Asymptotic distribution of person fit statistics with estimated person parameters. Psychometrika, 66, 331–342.CrossRefMathSciNetGoogle Scholar
  26. van der Linden, W. J. & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68, 251–265.CrossRefMathSciNetGoogle Scholar
  27. van Krimpen-Stoop, E. M. L. A. (2001). Detection of misfitting item-score patterns in computerized adaptive testing, unpublished doctoral dissertation. University of Twente, the Netherlands.Google Scholar
  28. van Krimpen-Stoop, E. M. L. A. & Meijer, R. R. (1999). Simulating the null distribution of person fit statistics for conventional and adaptive tests. Applied Pychological Measurement, 23, 327–345.CrossRefGoogle Scholar
  29. van Krimpen-Stoop, E. M. L. A. & Meijer, R. R. (2000). Detecting person misfit in adaptive testing using statistical process control techniques. In W. J. van der Linden & C. A. W. Glas: New developments in computerized adaptive testing: theory and practice (pp.201–219). Boston: Kluwer-Nijhoff Publishing.Google Scholar
  30. van Krimpen-Stoop, E. M. L. A. & Meijer, R. R. (2001). CUSUM-based person fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26, 199–217.CrossRefGoogle Scholar
  31. van Krimpen-Stoop, E. M. L. A. & Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164–180.CrossRefMathSciNetGoogle Scholar
  32. Waller, N. G. & Reise, S. P. (1989). Computerized adaptive personality assessment: An illustration with the Absorption scale. Journal of Personality and Social Psychology, 57, 1071–1058.CrossRefGoogle Scholar
  33. Wright, B. D. & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Rob R. Meijer
    • 1
  • Edith M. L. A. van Krimpen-Stoop
    • 1
  1. 1.Heymans Institute, University of GroningenGroningenThe Netherlands

Personalised recommendations