Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality

  • Carol M. Woods
  • Thomas F. Oltmanns
  • Eric Turkheimer


This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.


Differential item functioning Measurement invariance SNAP Personality 


  1. American Psychiatric Association (1987). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author.Google Scholar
  2. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.Google Scholar
  3. Birnbaum, A. (1968). Some latent trait models. In F. M. Lord, & M. R. Novick (Eds.) Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison & Wesley.Google Scholar
  4. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.Google Scholar
  5. Clark, L. (1996). SNAP Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.Google Scholar
  6. Finch, H. (2005). The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295.CrossRefGoogle Scholar
  7. Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule (Minnesota): I. Construction of the schedule. Journal of Psychology, 10, 249–254.Google Scholar
  8. Holland, P.W. & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  9. MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372–379.CrossRefGoogle Scholar
  10. Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.CrossRefGoogle Scholar
  11. Millsap, R. E., & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.CrossRefGoogle Scholar
  12. Muthén, B. O. (1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of educational statistics, 10, 121–132.CrossRefGoogle Scholar
  13. Muthén, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer, & H. I. Braun (Eds.) Test Validity (pp. 213–238). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  14. Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557–585.CrossRefGoogle Scholar
  15. Muthén, B. O., Kao, C., & Burstein, L. (1991). Instructionally sensitive psychometrics: an application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.CrossRefGoogle Scholar
  16. Muthén, L. K., & Muthén, B. O. (2007). Mplus: Statistical Analysis with Latent Variables, (Version 4.21) [Computer software]. Los Angeles, CA: Muthén & Muthén.Google Scholar
  17. Oltmanns, T. F., & Turkheimer, E. (2006). Perceptions of self and others regarding pathological personality traits. In R. F. Krueger, & J. Tackett (Eds.) Personality and psychopathology: Building bridges. New York: Guilford.Google Scholar
  18. Simms, L. J., & Clark, L. A. (2006). Chapter 17: The schedule for nonadaptive and adaptive personality (SNAP): A dimensional measure of traits relevant to personality and personality pathology. Differentiating Normal & Abnormal Personality. New York: Springer.Google Scholar
  19. Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1291–1306.Google Scholar
  20. Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.CrossRefGoogle Scholar
  21. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer, & H. Braun (Eds.) Test validity (pp. 147–169). Hillsdale, NJ: Erlbaum.Google Scholar
  22. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P.W. Holland, & H. Wainer (Eds.) Differential item functioning (pp. 67–111). Hillsdale, NJ: Erlbaum.Google Scholar
  23. Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83.CrossRefGoogle Scholar
  24. Waller, N. G., Thompson, J. S., & Wenk, E. (2000). Using IRT to separate measurement bias from true group differences on homogeneous and heterogeneous scales: An illustration with the MMPI. Psychological Methods, 5, 125–146.CrossRefPubMedGoogle Scholar
  25. Wang, W. (2004). Effects of anchor item methods on detection of differential item functioning within the family of Rasch models. The Journal of Experimental Education, 72, 221–261.CrossRefGoogle Scholar
  26. Wang, W., & Yeh, Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479–498.CrossRefGoogle Scholar
  27. Williams, V. S. L., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24, 42–69.Google Scholar
  28. Woods, C. M. (in press). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research.Google Scholar
  29. Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2008). Detection of aberrant responding on a personality scale in a military sample: An application of evaluating person fit with two-level logistic regression. Psychological Assessment, 20, 159–168.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Carol M. Woods
    • 1
    • 2
  • Thomas F. Oltmanns
    • 1
  • Eric Turkheimer
    • 3
  1. 1.Washington University in St. LouisSt. LouisUSA
  2. 2.Psychology DepartmentWashington University in St. LouisSt. LouisUSA
  3. 3.University of VirginiaCharlottesvilleUSA

Personalised recommendations