Advertisement

The Investigation of Differential Item Functioning in Adaptive Tests

  • Rebecca Zwick
Chapter
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)

Abstract

Differential item functioning (DIF) refers to a difference in item performance between equally proficient members of two demographic groups. From an item response theory (IRT) perspective, DIF can be defined as a difference between groups in item response functions. The classic example of a DIF item is a mathematics question containing sports jargon that is more likely to be understood by men than by women. An item of this kind would be expected to manifest DIF against women: They are less likely to give a correct response than men with equivalent math ability. In reality, the causes of DIF are often far more obscure. Camilli and Shepard (1994) and Holland andWainer (1993) provide an excellent background in the history, theory, and practice of DIF analysis.

Keywords

Differential Item Functioning Differential Item Functioning Analysis Differential Item Functioning Item Item Response Function Differential Item Functioning Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agresti, A. (1990). Categorical data analysis. New York: Wiley.MATHGoogle Scholar
  2. Camilli, G. & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.Google Scholar
  3. Donoghue, J. R., Holland, P. W. & Thayer, D. T. (1993). A Monte Carlo study of factors that affect the Mantel–Haenszel and standardization measures of differential item functioning. In P. W. Holland and H. Wainer (Eds.), Differential item functioning (pp. 137–166). Hillsdale, NJ: Erlbaum.Google Scholar
  4. Dorans, N. J. & Holland, P. W. (1993). DIF detection and description: Mantel–Haenszel and standardization. In P. W. Holland and H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hillsdale, NJ: Erlbaum.Google Scholar
  5. Dorans, N. J. & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal ofEducational Measurement,23, 355–368.CrossRefGoogle Scholar
  6. Fischer, G. H. (1995). Some neglected problems in IRT. Psychometrika,60, 459–487.CrossRefMATHGoogle Scholar
  7. Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.Google Scholar
  8. Holland, P. W. & Thayer, D. T.(1985). An alternative definition of the ETS delta scale of item difficulty (ETS Research Report No. 85–43). Princeton, NJ: Educational Testing Service.Google Scholar
  9. Holland, P. W. & Thayer, D.T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.Google Scholar
  10. Holland, P. W. & Wainer, H. (Eds.), (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
  11. Holland, P. W. & Zwick, R. (1991). A simulation study of some simple approaches to the study of DIF for CAT’s (Internal memorandum). Princeton, NJ: Educational Testing Service.Google Scholar
  12. Jiang, H. and Stout, W. F. (1998). Improved Type I error control and reduced estimation bias for DIF detection using SIBTEST. Journal of Educational and Behavioral Statistics, 23, 291–322.Google Scholar
  13. Kelley, T. L. (1923). Statistical methods. New York: Macmillan.Google Scholar
  14. Krass, I. & Segall, D. (1998). Differential item functioning and online item calibration (Draft report). Monterey, CA: Defense Manpower Data Center.Google Scholar
  15. Legg, S. M. & Buhr, D. C. (1992). Computerized adaptive testing with different groups. Educational Measurement: Issues and Practice, 11, 23–27.CrossRefGoogle Scholar
  16. Lei, P.-W., Chen, S.-Y. & Yu, L. (2006). Comparing methods of assessing differential item functioning in a computerized adaptive testing environment. Journal of Educational Measurement, 43, 245–264.CrossRefGoogle Scholar
  17. Li, H.-H. & Stout,W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677.CrossRefMATHGoogle Scholar
  18. Mantel, N. & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.Google Scholar
  19. Miller, T. R. (1992, April). Practical considerations for conducting studies of differential item functioning in a CAT environment. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.Google Scholar
  20. Miller, T. R. & Fan, M. (1998, April). Assessing DIF in high dimensional CATs. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar
  21. Nandakumar, R., Banks, J. C. & Roussos, L. (2006). Kernel-smoothed DIF detection procedure for computerized adaptive tests (Computerized testing report 00-08). Newtown, PA: Law School Admission Council.Google Scholar
  22. Nandakumar, R. & Roussos, L. (2001). CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research report). Newtown, PA: Law School Admission Council.Google Scholar
  23. Nandakumar, R. & Roussos, L. A. (2004). Evaluation of the CATSIB DIF procedure in a pretest setting. Journal of Educational and Behavioral Statistics, 29, 177–200.CrossRefGoogle Scholar
  24. Pashley, P. J. (1997). Computerized LSAT research agenda: Spring 1997 update (LSAC report). Newtown, PA: Law School Admission Council.Google Scholar
  25. Phillips, A. & Holland, P. W. (1987). Estimation of the variance of the Mantel–Haenszel log-odds-ratio estimate. Biometrics, 43, 425–431.CrossRefMathSciNetMATHGoogle Scholar
  26. Pommerich, M., Spray, J. A. & Parshall, C. G. (1995). An analytical evaluation of two common-oddsratios as population indicators of DIF (ACT Report 95-1). Iowa City: American College Testing Program.Google Scholar
  27. Powers, D. E. & O’Neill, K. (1993). Inexperienced and anxious computer users: Coping with a computer-administered test of academic skills. Educational Assessment, 1, 153–173.CrossRefGoogle Scholar
  28. Robins, J., Breslow, N. & Greenland, S. (1986). Estimators of the Mantel–Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42, 311–323.CrossRefMathSciNetMATHGoogle Scholar
  29. Roussos, L. (1996, June). A Type I error rate study of a modified SIBTEST DIF procedure with potential application to computerized-adaptive tests. Paper presented at the annual meeting of the Psychometric Society, Banff, Alberta, Canada.Google Scholar
  30. Roussos, L., Nandakumar, R. & Banks, J. C. (2006). Theoretical formula for statistical bias in CATSIB estimates due to discretization of the ability scale (Computerized testing report 99-07). Newtown, PA: Law School Admission Council.Google Scholar
  31. Roussos, L. A., Schnipke, D. L. & Pashley, P. J. (1999). A generalized formula for the Mantel–Haenszel differential item functioning parameter. Journal of Educational and Behavioral Statistics, 24, 293–322.Google Scholar
  32. Roussos, L. & Stout, W. F. (1996). Simulation studies of effects of small sample size and studied item parameters on SIBTEST and Mantel–Haenszel Type I error performance. Journal of Educational Measurement, 33, 215–230.CrossRefGoogle Scholar
  33. Schaeffer, G., Reese, C., Steffen, M., McKinley, R. L. & Mills, C. N. (1993). Field test of a computer-based GRE general test (ETS Research Report No. RR 93-07). Princeton, NJ: Educational Testing Service.Google Scholar
  34. Shealy, R. & Stout, W.F. (1993a). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.CrossRefMATHGoogle Scholar
  35. Shealy, R. & Stout, W. F. (1993b). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–239). Hillsdale, NJ: Erlbaum.Google Scholar
  36. Steinberg, L., Thissen, D. & Wainer, H. (1990). Validity. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 187–231). Hillsdale, NJ: Erlbaum.Google Scholar
  37. Stocking, M. L., Jirele, T., Lewis, C. & Swanson, L. (1998). Moderating possibly irrelevant multiple mean score differences on a test of mathematical reasoning. Journal of Educational Measurement, 35, 199–222.CrossRefGoogle Scholar
  38. Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  39. Thissen, D., Steinberg, L. & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Erlbaum.Google Scholar
  40. Way, W. D. (1994). A simulation study of the Mantel–Haenszel procedure for detecting DIF for the NCLEX using CAT (Internal technical report). Princeton, NJ: Educational Testing Service.Google Scholar
  41. Wenglinsky, H. (1998). Does it compute? The relationship between educational technology and student achievement in mathematics (ETS Policy Information Center report). Princeton, NJ: Educational Testing Service.Google Scholar
  42. Wingersky, M. S., Patrick, R. & Lord, F. M. (1988). LOGIST user’s guide: LOGIST Version 6.00. Princeton, NJ: Educational Testing Service.Google Scholar
  43. Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Hillsdale, NJ: Erlbaum.Google Scholar
  44. Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.CrossRefGoogle Scholar
  45. Zwick, R. (1992). Application of Mantel’s chi-square test to the analysis of differential item functioning for functioning for ordinal items (Technical memorandum). Princeton, NJ: Educational Testing Service.Google Scholar
  46. Zwick, R. (1997). The effect of adaptive administration on the variability of the Mantel–Haenszel measure of differential item functioning. Educational and Psychological Measurement, 57, 412–421.CrossRefGoogle Scholar
  47. Zwick, R. & Thayer, D. T. (1996). Evaluating the magnitude of differential item functioning in polytomous items. Journal of Educational and Behavioral Statistics, 21, 187–201.Google Scholar
  48. Zwick, R. & Thayer, D. T. (2002). Application of an empirical Bayes enhancement of Mantel–Haenszel DIF analysis to computer-adaptive tests. Applied Psychological Measurement, 26, 57–76. CrossRefMathSciNetGoogle Scholar
  49. Zwick, R. & Thayer, D. T. (2003, August). An empirical Bayes enhancement of Mantel–Haenszel DIF analysis for computer-adaptive tests (Computerized Testing Report No. 98-15). Newtown, PA: Law School Admission Council.Google Scholar
  50. Zwick, R., Thayer, D. T. & Lewis, C. (1997) An investigation of the validity of an empirical Bayes approach to Mantel–Haenszel DIF analysis (ETS Research Report No. 97-21). Princeton, NJ: Educational Testing Service.Google Scholar
  51. Zwick, R., Thayer, D. T. & Lewis, C. (1999). An empirical Bayes approach to Mantel–Haenszel DIF analysis. Journal of Educational Measurement, 36, 1–28.CrossRefGoogle Scholar
  52. Zwick, R., Thayer, D. T. & Lewis, C. (2000). Using loss functions for DIF detection: An empirical Bayes approach. Journal of Educational and Behavioral Statistics, 25, 225–247. Google Scholar
  53. Zwick, R., Thayer, D. T. & Mazzeo, J. (1997). Descriptive and inferential procedures for assessing DIF in polytomous items. Applied Measurement in Education, 10, 321–344.CrossRefGoogle Scholar
  54. Zwick, R., Thayer, D. T. & Wingersky, M. (1993). A simulation study of methods for assessing differential item functioning in computer-adaptive tests. (ETS Research Report 93-11). Princeton, NJ: Educationl Testing Service.Google Scholar
  55. Zwick, R., Thayer, D. T. & Wingersky, M. (1994a) A simulation study of methods for assessing differential item functioning in computerized adaptive tests. Applied Psychological Measurement, 18, 121–140.CrossRefGoogle Scholar
  56. Zwick, R., Thayer, D. T. & Wingersky, M. (1994b) DIF analysis for pretest items in computer-adaptive testing (ETS Research Report 94-33). Princeton, NJ: Educational Testing Service.Google Scholar
  57. Zwick, R., Thayer, D. T. & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests. Journal of Educational Measurement, 32, 341–363.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Rebecca Zwick
    • 1
  1. 1.Department of EducationUniversity of CaliforniaSanta BarbaraUSA

Personalised recommendations