Examining Sources of Gender DIF in Mathematics Knowledge of Future Teachers Using Cross-Classified IRT Models

  • Liuhan Sophie Cai
  • Anthony D. Albano


Research on differential item functioning (DIF) has focused traditionally on the detection of effects. However, recent studies have investigated potential sources of DIF, in an attempt to determine how or why it may occur. This study examines variability in item difficulty in math performance that is accounted for by gender, referred to as gender DIF, and the extent to which gender DIF is explained by both person predictors (opportunity to learn [OTL]) and item characteristics (item format). Cross-classified multilevel IRT models are used to examine the relationships among item difficulty, gender, OTL, and item format. Data come from the U.S. cohort of an international study of future math teachers, the Teacher Education and Development Study in Mathematics.


  1. Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.CrossRefGoogle Scholar
  2. Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76.CrossRefGoogle Scholar
  3. Albano, A. D., & Rodriguez, M. C. (2013). Examining differential math performance by gender and opportunity to learn. Educational and Psychological Measurement, 73, 836–856.CrossRefGoogle Scholar
  4. Baker, D. P., & Jones, D. P. (1993). Creating gender equality: Cross-national gender stratification and mathematical performance. Sociology of Education, 66, 91–103.CrossRefGoogle Scholar
  5. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects models using Eigen and S4 [Computer software manual]. (R package version 1.1-8). Retrieved from 
  6. Becker, B. J. (1990). Item characteristics and gender differences on the SAT-M for mathematically able youths. American Educational Research Journal, 27, 65–87.CrossRefGoogle Scholar
  7. Beller, M., & Gafni, N. (1996). 1991 international assessment of educational progress in mathematics and sciences: The gender differences perspective. Journal of Educational Psychology, 88, 365–377.CrossRefGoogle Scholar
  8. Beller, M., & Gafni, N. (2000). Can item format (multiple choice vs. open-ended) account for gender differences in mathematics achievement? Sex Roles, 42, 1–21.CrossRefGoogle Scholar
  9. Bielinski, J., & Davison, M. L. (1998). Gender differences by item difficulty interactions in multiple-choice mathematics items. American Educational Research Journal, 35, 455–476.CrossRefGoogle Scholar
  10. Bielinski, J., & Davison, M. L. (2001). A sex difference by item difficulty interaction in multiple-choice mathematics items administered to national probability samples. Journal of Educational Measurement, 38, 51–77.CrossRefGoogle Scholar
  11. Blömeke, S., Suhl, U., Kaiser, G., & Döhrmann, M. (2012). Family background, entry selectivity and opportunities to learn: What matters in primary teacher education? An international comparison of fifteen countries. Teaching and Teacher Education, 28, 44–55.CrossRefGoogle Scholar
  12. Bolger, N., & Kellaghan, T. (1990). Method of measurement and gender differences in scholastic achievement. Journal of Educational Measurement, 27, 165–174.CrossRefGoogle Scholar
  13. Boscardin, C. K., Aguirre-Munoz, Z., Stoker, G., Kim, J., Kim, M., & Lee, J. (2005). Relationship between opportunity to learn and student performance on English and algebra assessments. Educational Assessment, 10, 307–332.CrossRefGoogle Scholar
  14. Burkes, L. L. (2009). Identifying differential item functioning related to student socioeconomic status and investigating sources related to classroom opportunities to learn. Unpublished dissertation, University of Delaware.Google Scholar
  15. Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6, 57–79.CrossRefGoogle Scholar
  16. Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42, 133–148.CrossRefGoogle Scholar
  17. DeMars, C. E. (1998). Gender differences in mathematics and science on a high school proficiency exam: The role of response format. Applied Measurement in Education, 11, 279–299.CrossRefGoogle Scholar
  18. DeMars, C. E. (2000). Test stakes and item format interactions. Applied Measurement in Education, 13, 55–77.CrossRefGoogle Scholar
  19. Eccles, J. S. (1994). Understanding women’s educational and occupational choices. Psychology of Women Quarterly, 18, 585–609.CrossRefGoogle Scholar
  20. Else-Quest, N. M., Hyde, J. S., & Linn, M. C. (2010). Cross-national patterns of gender differences in mathematics: A meta-analysis. Psychological Bulletin, 136, 103–127.CrossRefGoogle Scholar
  21. Entwisle, D. R., Alexander, K. L., & Olson, L. S. (1994). The gender gap in math: Its possible origins in neighborhood effects. American Sociological Review, 59, 822–838.CrossRefGoogle Scholar
  22. Floden, R. E. (2002). The measurement of opportunity to learn. In A. Porter & A. Gamoran (Eds.), Methodological advances in cross-national surveys of educational achievement (pp. 231–266). Washington, DC: National Academies Press.Google Scholar
  23. Gallagher, A., & Kaufman, J. (2005). Gender differences in mathematics. New York, NY: Cambridge University Press.Google Scholar
  24. Gamer, M., & Engelhard, G., Jr. (1999). Gender differences in performance on multiple-choice and constructed response mathematics items. Applied Measurement in Education, 12, 29–51.CrossRefGoogle Scholar
  25. Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis paradigm. Journal of Educational Measurement, 40, 281–306.CrossRefGoogle Scholar
  26. Haggarty, L., & Pepin, B. (2002). An investigation of mathematics textbooks and their use in English, French and German classrooms: Who gets an opportunity to learn what? British Educational Research Journal, 28, 567–590.CrossRefGoogle Scholar
  27. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Measurement methods for the social science). Newbury Park, CA: Sage.Google Scholar
  28. Harris, A. M., & Carlton, S. T. (1993). Patterns of gender differences on mathematics items on the scholastic aptitude test. Applied Measurement in Education, 6, 137–151.CrossRefGoogle Scholar
  29. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum Associates Publishers.Google Scholar
  30. Husén, T. (1967). International study of achievement in mathematics: A comparison of twelve countries. New York, NY: Wiley.Google Scholar
  31. Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: A meta-analysis. Psychological Bulletin, 107, 139–155.CrossRefGoogle Scholar
  32. Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams, C. C. (2008). Gender similarities characterize math performance. Science, 321, 494–495.CrossRefGoogle Scholar
  33. Jacobs, J. E., Davis-Kean, P., Bleeker, M., Eccles, J. S., & Malanchuk, O. (2005). “I can, but I don’t want to”: The impact of parents, interests, and activities on gender differences in math. In A. Gallagher & J. Kaufman (Eds.), Gender differences in mathematics: An integrative psychological approach (pp. 246–263). New York, NY: Cambridge University Press.Google Scholar
  34. Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79–93.CrossRefGoogle Scholar
  35. Kim, S.-H., Cohen, A. S., Alagoz, C., & Kim, S. (2007). DIF detection and effect size measures for polytomously scored items. Journal of Educational Measurement, 44, 93–116.CrossRefGoogle Scholar
  36. Langenfeld, T. E. (1997). Test fairness: Internal and external investigations of gender bias in mathematics testing. Educational Measurement: Issues and Practice, 16, 20–26.CrossRefGoogle Scholar
  37. Lindberg, S. M., Hyde, J. S., Petersen, J. L., & Linn, M. C. (2010). New trends in gender and mathematics performance: A meta-analysis. Psychological Bulletin, 136, 1123–1135.CrossRefGoogle Scholar
  38. Liu, O. L., & Wilson, M. (2009). Gender differences in large-scale math assessments: PISA trend 2000 and 2003. Applied Measurement in Education, 22, 164–184.CrossRefGoogle Scholar
  39. Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the MH D-DIF statistics across populations. In H. Wainer (Ed.), Differential item functioning (pp. 171–196). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
  40. McDonnell, L. M. (1995). Opportunity to learn as a research concept and a policy instrument. Educational Evaluation and Policy Analysis, 17, 305–322.CrossRefGoogle Scholar
  41. Meece, J. L., Parsons, J. E., Kaczala, C. M., & Goff, S. B. (1982). Sex differences in math achievement: Toward a model of academic choice. Psychological Bulletin, 91, 324–348.CrossRefGoogle Scholar
  42. Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19, 289–304.CrossRefGoogle Scholar
  43. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543.CrossRefGoogle Scholar
  44. Mullis, I. V., Martin, M. O., & Foy, P. (2008). TIMSS 2007 international mathematics report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.Google Scholar
  45. Muthén, B. O., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.CrossRefGoogle Scholar
  46. Penner, A. M. (2003). International gender × item difficulty interactions in mathematics and science achievement tests. Journal of Educational Psychology, 95, 650–655.CrossRefGoogle Scholar
  47. R Development Core Team. (2015). R: A language and environment for statistical computing [Computer software manual]. (Version 3.2.1). Vienna, Austria. Retrieved from 
  48. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danmarks Paedogogiske Institute.Google Scholar
  49. Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371.CrossRefGoogle Scholar
  50. Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66, 63–84.CrossRefGoogle Scholar
  51. Su, R., Rounds, J., & Armstrong, P. I. (2009). Men and things, women and people: A meta-analysis of sex differences in interests. Psychological Bulletin, 135, 859–884.CrossRefGoogle Scholar
  52. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  53. Tatto, M. T., Schwille, J., Senk, S., Ingvarson, L., Peck, R., & Rowley, G. (2008). Teacher education and development study in mathematics (TEDS-M): Conceptual Framework. East Lansing, MI: Teacher Education and Development International Study Center, College of Education, Michigan State University.Google Scholar
  54. Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25, 246–280.CrossRefGoogle Scholar
  55. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
  56. Travers, K. J., & Westbury, I. (1989). The IEA study of mathematics I: Analysis of mathematics curricula. Oxford, UK: Pergamon Press.Google Scholar
  57. Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443–464.CrossRefGoogle Scholar
  58. Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.CrossRefGoogle Scholar
  59. Wang, J., & Goldschmidt, P. (1999). Opportunity to learn, language proficiency, and immigrant status effects on mathematics achievement. The Journal of Educational Research, 93, 101–111.CrossRefGoogle Scholar
  60. Willingham, W., & Cole, S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  61. Wiseman, A. W. (2008). A culture of (in) equality?: A cross-national study of gender parity and gender segregation in national school systems. Research in Comparative and International Education, 3, 179–201.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Liuhan Sophie Cai
    • 1
  • Anthony D. Albano
    • 1
  1. 1.University of Nebraska-LincolnLincolnUSA

Personalised recommendations