Quality & Quantity

, Volume 45, Issue 3, pp 715–734 | Cite as

Item-fit evaluation in biased tests: a study under Rasch model

  • M. Dolores Hidalgo
  • José A. López-Pina
Research Note


In this paper, the power rates and distributional properties of the Outfit, Infit, Lz, ECI2z and ECI4z statistics when they are used in tests with biased or differential item functioning (DIF) were explored. In this study, different conditions of sample size, sample size ratio focal and reference group, impact between groups, DIF effect size, and percentage of DIF items were manipulated. In addition, examinee responses were generated to simulate uniform DIF. Results suggest that item fit statistics generally detected medium percents of DIF in large samples (1000/500 or 1000/1000) only when DIF effect size was relatively high and when the mean of focal and reference group was different. Moreover, when groups had equal mean, low correct identification rates were found in the five item-fit indices. In general, the results showed adequate control of false positive rates. These findings lead to the conclusion that all indices used in this study are partially adequate fit measures for detecting biased items, mainly when impact between groups is present and sample size is large.


Item-fit statistics Rasch model Differential item functioning Type I error rate Power rate 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Birenbaum M.: Effect of dissimulation motivation and anxiety on response pattern appropriateness measures. Appl. Psychol. Meas. 10, 167–174 (1986)CrossRefGoogle Scholar
  2. Drasgow F., Levine M.V., Williams E.A.: Appropriateness measurement with polychotomous item response models and standardized indices. Br. J. Math. Stat. Psychol. 38, 67–86 (1985)Google Scholar
  3. Drasgow F., Levine M.V., Mclaughlin M.E.: Detecting inappropriate test scores with optimal and practical appropriateness indices. Appl. Psychol. Meas. 11, 59–79 (1987)CrossRefGoogle Scholar
  4. Engelhard G.: Historical views of invariance: evidence from the measurement theories of Thorndike, Thurstone, and Rasch. Educ. Psychol. Meas. 52, 275–291 (1992)CrossRefGoogle Scholar
  5. Linacre J.M., Wright B.D.: BIGSTEPS: Rasch-Model Computer Program. MESA Press, Chicago (1993)Google Scholar
  6. Lord F.M.: Applications of Item Response Theory to Practical Testing Problems. Erlbaum, Hillsdale (1980)Google Scholar
  7. Millsap R.E., Meredith W.: Inferential conditions in the statistical detection of measurement bias. Appl. Psychol. Meas. 16, 389–402 (1992)CrossRefGoogle Scholar
  8. Narayanan P., Swaminathan H.: Identification of items than show nonuniform DIF. Appl. Psychol. Meas. 20, 257–274 (1996)CrossRefGoogle Scholar
  9. Raju N.S.: The area between two item characteristic curves. Psychometrika 53, 284–291 (1988)CrossRefGoogle Scholar
  10. Rasch, G.: Probabilistic Models for Some Intelligence and Attainment Test. The Danish Institute of Educational Research, Copenhagen (1960). (Expanded edition, 1980. The University Chicago Press, Chicago)Google Scholar
  11. Reise S.P.: A comparison of item-and person-fit methods of assessing model-data fit in IRT. Appl. Psychol. Meas. 14, 127–137 (1990)CrossRefGoogle Scholar
  12. Seol H.: Detecting differential item functioning with five standardized item-fit indices in the Rasch model. J. Outcome Meas. 3, 233–247 (1999)Google Scholar
  13. Smith R.M.: The distributional properties of Rasch item fit statistics. Educ. Psychol. Meas. 51, 541–565 (1991)CrossRefGoogle Scholar
  14. Smith R.M.: A comparison of the power of Rasch total and between-item fit statistics to detect measurement disturbances. Educ. Psychol. Meas. 54, 42–55 (1994a)CrossRefGoogle Scholar
  15. Smith R.M.: Detecting item bias in the Rasch rating scale model. Educ. Psychol. Meas. 54, 886–898 (1994b)CrossRefGoogle Scholar
  16. Smith R.M.: Detecting item bias with the Rasch model. J. Appl. Meas. 5, 430–449 (2004)Google Scholar
  17. Smith R.M., Schumaker R.E., Bush M.J.: Using item mean squares to evaluate fit to the Rasch model. J. Outcome Meas. 2, 66–78 (1998)Google Scholar
  18. Tatsuoka K.K.: Caution indices based on item response theory. Psychometrika 49, 95–110 (1984)CrossRefGoogle Scholar
  19. Tatsuoka K.K., Linn R.L.: Indices for detecting unusual response patterns: links between two general approaches and potential applications. Appl. Psychol. Meas. 7, 81–96 (1983)CrossRefGoogle Scholar
  20. Wright B.D., Masters G.N.: Rating Scale Analysis. MESA Press, Chicago (1982)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Department of Basic Psychology and MethodologyUniversity of MurciaMurciaSpain

Personalised recommendations