The extension of item response theory to data from more than one group of persons offers a unified approach to such problems as differential item functioning, item parameter drift, nonequivalent groups equating, vertical equating, two-stage testing, and matrix-sampled educational assessment. The common element in these problems is the existence of persons from different populations responding to the same test or to tests containing common items. In differential item functioning, the populations typically correspond to sex or demographic groups; in item parameter drift, to annual cohorts of students; in vertical equating, to children grouped by age or grade; in nonequivalent groups equating, to normative samples from different places or times; in two-stage testing, to examinees classified by levels of performance on a pretest; and in matrix-sampled educational assessment, to students from different schools or programs administered matrix-sampled assessment instruments. In all these settings, the objective of the multiplegroup analysis is to estimate jointly the item parameters and the latent distribution of a common attribute or ability of the persons in each of the populations.
KeywordsDifferential Item Functioning Latent Distribution Item Response Theory Item Parameter Item Response Model
Unable to display preview. Download preview PDF.
- Bock, R.D. ( 1985 reprint). Multivariate Statistical Methods in Behavioral Research. Chicago: Scientific Software International.Google Scholar
- Bock, R.D. (1989). Measurement of human variation: A two-stage model. In R.D. Bock (ed.), Multilevel Analysis of Educational Data (pp. 319–342 ). New York: Academic Press.Google Scholar
- Bock, R.D. and Kolakowski, D. (1973). Further evidence of sex-linked major-gene influence on human spatial visualizing ability. American Journal of Human Genetics 25, 1–14.Google Scholar
- Bock, R.D. and Mislevy, R.J. (1981). An item response model for matrix-sampling data: The California grade-three assessment. New Directions for Testing and Measurement 10, 65–90.Google Scholar
- Bock, R.D. and Zimowski, M. (1989). Duplex Design: Giving Students A Stake in Educational Assessment. Chicago: Methodology Research Center, NORC.Google Scholar
- de Leeuw, J. and Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics 11, 193–196.Google Scholar
- Dorus, E., Cox, N.J., Gibbons, R.D., Shaughnessy, R., Pandey, G.N., and Cloninger, R.C. (1983). Lithium ion transport and affective disorders within families of bipolar patients. Archives of General Psychiatry 401, 945–552.Google Scholar
- Lord, F.M. and Novick, M.R. (1968). Statistical Theories of Mental Test Scores (with Contributions by A. Birnbaum). Reading, MA: Addison-Wesley.Google Scholar
- Mislevy, R.J. and Bock, R.D. (1996). BILOG 3: Item Analysis and Test Scoring with Binary Logistic Models. Chicago: Scientific Software International.Google Scholar
- Muraki, E. and Bock, R.D. (1991). PARSCALE: Parametric Scaling of Rating Data. Chicago: Scientific Software International.Google Scholar
- Zimowski, M.F., Muraki, E., Mislevy, R.J., and Bock, R.D. (1996). BILOGMG: Multiple-Group IRT Analysis and Test Maintenance for Binary Items. Chicago: IL: Scientific Software International.Google Scholar