Advertisement

An Overview of Item Response Theory

  • David Magis
  • Duanli Yan
  • Alina A. von Davier
Chapter
Part of the Use R! book series (USE R)

Abstract

An important component of both CAT and MST is the use of item response theory (IRT) as an underlying framework for item bank calibration, ability estimation, and item/module selection. In this chapter, we present a brief overview of this theory, by providing key information and introducing appropriate notation for use in subsequent chapters. Only topics and contents directly related to adaptive and multistage testing will be covered in this chapter; appropriate references for further reading are therefore also mentioned.

References

  1. Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–24. https://doi.org/10.1177/0146621697211001 CrossRefGoogle Scholar
  2. Andersen, E. B. (1970). Asymptotic properties of conditional maximum likelihood equations. Journal of the Royal Statistical Society, Series B, 32, 283–301.zbMATHGoogle Scholar
  3. Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. https://doi.org/10.1007/BF02293814 CrossRefzbMATHGoogle Scholar
  4. Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.zbMATHGoogle Scholar
  5. Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (Research Bulletin No. 81-20). Princeton, NJ: Educational Testing Service.Google Scholar
  6. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  7. Birnbaum, A. (1969). Statistical theory for logistic mental test models with a prior distribution of ability. Journal of Mathematical Psychology, 6, 258–276. https://doi.org/10.1016/0022-2496(69)90005-4 CrossRefzbMATHGoogle Scholar
  8. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/10.1007/BF02291411 CrossRefzbMATHGoogle Scholar
  9. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801 MathSciNetCrossRefGoogle Scholar
  10. Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197. https://doi.org/10.1007/BF02291262 CrossRefGoogle Scholar
  11. Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405 CrossRefGoogle Scholar
  12. Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copulas for residual dependencies. Psychometrika, 72, 393–411. https://doi.org/10.1007/s11336-007-9005-4 MathSciNetCrossRefzbMATHGoogle Scholar
  13. Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289. https://doi.org/10.3102/10769986022003265 CrossRefGoogle Scholar
  14. De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.CrossRefzbMATHGoogle Scholar
  15. DeMars, C. (2010). Item response theory. Oxford: Oxford University Press.CrossRefGoogle Scholar
  16. Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x CrossRefGoogle Scholar
  17. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  18. Finch, H., & Habing, B. (2007). Performance of DIMTEST- and NOHARM-based statistics for testing unidimensionality. Applied Psychological Measurement, 31, 292–307. https://doi.org/10.1177/0146621606294490 MathSciNetCrossRefGoogle Scholar
  19. Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46, 59–77. https://doi.org/10.1007/BF02293919 MathSciNetCrossRefzbMATHGoogle Scholar
  20. Fraser, C., & McDonald, R. P. (2003). Noharm 3.0 [Computer software manual]. http://people.niagaracollege.ca/cfraser/download/ Google Scholar
  21. Gessaroli, M. E., & De Champlain, A. F. (1996). Using a approximate chi-square statistic to test the number of dimensions underlying the responses to a set of items. Journal of Educational Measurement, 33, 157–179. https://doi.org/10.1111/j.1745-3984.1996.tb00487.x CrossRefGoogle Scholar
  22. Green, B. F. J. (1950). A general solution for the latent class model of latent structure analysis (ETS Research Bulletin Series No. RB-50-38). Princeton, NJ: Educational Testing Service.Google Scholar
  23. Haberman, S. J., & von Davier, A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 229–248). New York: CRC Press.Google Scholar
  24. Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese psychological Research, 22, 144–149.CrossRefGoogle Scholar
  25. Haley, D. (1952). Estimation of the dosage mortality relationship when the dose is subject to error (Technical report No. 15). Palo Alto, CA: Applied Mathematics and Statistics Laboratory, Stanford University.Google Scholar
  26. Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.CrossRefGoogle Scholar
  27. Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49–78. https://doi.org/10.1207/s15327906mbr1901\_3 CrossRefGoogle Scholar
  28. Holland, P. W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577–602. https://doi.org/10.1007/BF02294609 MathSciNetCrossRefzbMATHGoogle Scholar
  29. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.Google Scholar
  30. Jeffreys, H. (1939). Theory of probability. Oxford, UK: Oxford University Press.zbMATHGoogle Scholar
  31. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.Google Scholar
  32. Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604\_2 CrossRefGoogle Scholar
  33. Kelderman, H., & Rijkes, C. P. M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149–176. https://doi.org/10.1007/BF02295181 CrossRefzbMATHGoogle Scholar
  34. Klein Entink, R. H., Fox, J.-P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. https://doi.org/10.1007/s11336-008-9075-y MathSciNetCrossRefzbMATHGoogle Scholar
  35. Lord, F. M. (1951). A theory of test scores and their relation to the trait measured (ETS Research Bulletin Series No. RB-51-13). Princeton, NJ: Educational Testing Service.Google Scholar
  36. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  37. Lord, F. M. (1986). Maximum likelihood and bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162. https://doi.org/10.1111/j.1745-3984.1986.tb00241.x CrossRefGoogle Scholar
  38. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.zbMATHGoogle Scholar
  39. Magis, D. (2014). On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models. British Journal of Mathematical and Statistical Psychology, 67, 430–450. https://doi.org/10.1111/bmsp.12027 MathSciNetCrossRefGoogle Scholar
  40. Magis, D. (2015b). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40, 96–105. https://doi.org/10.3102/1076998614558122 CrossRefGoogle Scholar
  41. Magis, D. (2015c). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200–204. https://doi.org/10.1007/S11336-013-9378-5 MathSciNetCrossRefzbMATHGoogle Scholar
  42. Magis, D. (2016). Efficient standard error formulas of ability estimators with dichotomous item response models. Psychometrika, 81, 184–200. https://doi.org/10.1007/s11336-015-9443-3 MathSciNetCrossRefzbMATHGoogle Scholar
  43. Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847 CrossRefGoogle Scholar
  44. Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547. https://doi.org/10.1007/BF02294327 CrossRefzbMATHGoogle Scholar
  45. Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272 CrossRefzbMATHGoogle Scholar
  46. McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional response data (Research Report No. ONR 82-1). Iowa City, IA: American College testing.Google Scholar
  47. Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334. https://doi.org/10.1177/014662169301700401 CrossRefGoogle Scholar
  48. Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381. https://doi.org/10.1007/BF02306026 CrossRefzbMATHGoogle Scholar
  49. Mislevy, R. J. (1986). Bayesian modal estimation in item response models. Psychometrika, 51, 177–195. https://doi.org/10.1007/BF02293979 MathSciNetCrossRefzbMATHGoogle Scholar
  50. Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725–737. https://doi.org/10.1177/001316448204200302 CrossRefGoogle Scholar
  51. Mosteller, F., & Tukey, J. (1977). Exploratory data analysis and regression. Reading, MA: Addison-Wesley.Google Scholar
  52. Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71. https://doi.org/10.1177/014662169001400106 CrossRefGoogle Scholar
  53. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 19–176. https://doi.org/10.1177/014662169201600206 CrossRefGoogle Scholar
  54. Muraki, E., & Bock, R. D. (2003). PARSCALE 4.0 [Computer software manual]. Lincolnwood, IL: Scientific Software International.Google Scholar
  55. Muraki, E., & Carlson, J. E. (1993). Full-information factor analysis for polytomous item responses. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.Google Scholar
  56. Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks, CA: Sage.CrossRefGoogle Scholar
  57. Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks, CA: Sage.CrossRefGoogle Scholar
  58. Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26. Psychometrics (pp. 125–167). Amsterdam: Elsevier.Google Scholar
  59. Rao, C. R., & Sinharay, S. (2007). Handbook of statistics, Vol. 26. Psychometrics. Amsterdam: Elsevier.Google Scholar
  60. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
  61. Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207–230. https://doi.org/10.2307/1164671 CrossRefGoogle Scholar
  62. Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.CrossRefzbMATHGoogle Scholar
  63. Roskam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roskam & R. Suck (Eds.), Progress in mathematical psychology (pp. 151–171). Amsterdam: North-Holland.Google Scholar
  64. Roskam, E. E. (1997). Models for speed and time-limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.CrossRefGoogle Scholar
  65. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement, Vol. 34 (Monograph no. 17). Richmond: Byrd Press.Google Scholar
  66. Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional space. Psychometrika, 39, 111–121. https://doi.org/10.1007/BF02291580 MathSciNetCrossRefzbMATHGoogle Scholar
  67. Samejima, F. (1994). Some critical observations of the test information function as a measure of local accuracy in ability estimation. Psychometrika, 59, 307–329. https://doi.org/10.1007/BF02296127 CrossRefzbMATHGoogle Scholar
  68. Samejima, F. (1998). Expansion of Warm’s weighted likelihood estimator of ability for the three-parameter logistic model to general discrete responses. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.Google Scholar
  69. Schuster, C., & Yuan, K.-H. (2011). Robust estimation of latent ability in item response models. Journal of Educational and Behavioral Statistics, 36, 720–735. https://doi.org/10.3102/1076998610396890 CrossRefGoogle Scholar
  70. Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331–342. https://doi.org/10.1007/BF02294437 MathSciNetCrossRefzbMATHGoogle Scholar
  71. Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. https://doi.org/10.1177/014662168300700208 CrossRefGoogle Scholar
  72. Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589–617. https://doi.org/10.1007/BF02294821 MathSciNetCrossRefzbMATHGoogle Scholar
  73. Stout, W. (2005). DIMTEST (Version 2.0) [Computer software manual]. Champaign, IL: The William Stout Institute for Measurement.Google Scholar
  74. Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharray (Eds.), Handbook of statistics, Vol. 26. psychometrics (pp. 683–718). Amsterdam: Elsevier.Google Scholar
  75. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x CrossRefGoogle Scholar
  76. Sympson, J. B. (1978). A model with testing for multidimensional items. In D. J. Weiss (Ed.), Proceedings of the 1977 computerized adaptive testing conference. Minneapolis, MN: University of Minnesota.Google Scholar
  77. Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of responses to test items. Applied Psychological Measurement, 27, 159–203. https://doi.org/10.1177/0146621603027003001 MathSciNetCrossRefGoogle Scholar
  78. Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577. https://doi.org/10.1007/BF02295596 CrossRefzbMATHGoogle Scholar
  79. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale, NJ: Erlbaum.Google Scholar
  80. van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.CrossRefzbMATHGoogle Scholar
  81. van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.CrossRefzbMATHGoogle Scholar
  82. van der Linden, W. J., Klein Entink, R., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347. https://doi.org/10.1177/0146621609349800 CrossRefGoogle Scholar
  83. Verhelst, N. D., Verstralen, H. H. F. M., & Jansen, M. G. (1997). A logistic model for time limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 169–185). New York: Springer.CrossRefGoogle Scholar
  84. Wainer, H. (2000). Computerized adaptive testing: A primer (2nd ed.). New York: Routledge/Taylor and Francis.Google Scholar
  85. Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 246–270). Boston, MA: Kluwer-Nijhoff.Google Scholar
  86. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  87. Wang, T., & Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339. https://doi.org/10.1177/0146621605275984 MathSciNetCrossRefGoogle Scholar
  88. Warm, T. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika, 54, 427–450. https://doi.org/10.1007/BF02294627 MathSciNetCrossRefGoogle Scholar
  89. Weiss, D. J. (1983). New horizons in testing: Latent trait theory and computerized adaptive testing. New York: Academic Press.Google Scholar
  90. Wright, B. O., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA Press.Google Scholar
  91. Wright, B. O., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.Google Scholar
  92. Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492. https://doi.org/10.1177/0146621605284537 MathSciNetCrossRefGoogle Scholar
  93. Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262. https://doi.org/10.1177/014662168100500212 CrossRefGoogle Scholar
  94. Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. https://doi.org/10.1177/014662168400800201 CrossRefGoogle Scholar
  95. Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x MathSciNetCrossRefGoogle Scholar
  96. Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 111–153). Westport, CT: Praeger.Google Scholar
  97. Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442. https://doi.org/10.1037/0033-2909.99.3.432 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • David Magis
    • 1
  • Duanli Yan
    • 2
  • Alina A. von Davier
    • 3
  1. 1.Department of EducationUniversity of LiegeLiegeBelgium
  2. 2.Educational Testing ServicePrincetonUSA
  3. 3.ACTNext by ACTIowa CityUSA

Personalised recommendations