Optimal Sampling Design for IRT Linking with Bimodal Data

  • Jiahe QianEmail author
  • Alina A. von Davier
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 89)


Optimal sampling designs for an IRT linking with improved efficiency are often sought in analyzing assessment data. In practice, the skill distribution of an assessment sample may be bimodal, and this warrants special consideration when trying to create these designs. In this study we explore optimal sampling designs for IRT linking of bimodal data. Our design paradigm is modeled and presents a formal setup for optimal IRT linking. In an optimal sampling design, the sample structure of bimodal data is treated as being drawn from a stratified population. The optimum search algorithm proposed is used to adjust the stratum weights and form a weighted compound sample that minimizes linking errors. The initial focus of the current study is the robust mean–mean transformation method, though the model of IRT linking under consideration is adaptable to generic methods.


Optimal sampling design Stratified population Complete grouped jackknifing Optimum search 



The authors thank Jim Carlson, Shelby Haberman, Yi-Hsuan Lee, Ying Lu, and Daniel Bolt for their suggestions and comments. The authors also thank Shuhong Li and Jill Carey for their assistance in assembling data and Kim Fryer for editorial help. Any opinions expressed in this paper are solely those of the authors and not necessarily those of ETS.


  1. Angoff WH (1984) Scales, norms, and equivalent scores. Educational Testing Service, PrincetonGoogle Scholar
  2. Berger MPF (1991) On the efficiency of IRT models when applied to different sampling designs. Appl Psychol Meas 15:293–306CrossRefGoogle Scholar
  3. Berger MPF (1997) Optimal designs for latent variable models: a review. In: Rost J, Langeheine R (eds) Application of latent trait and latent class models in the social sciences. Waxmann, Muenster, pp 71–79Google Scholar
  4. Berger MPF, van der Linden WJ (1992) Optimality of sampling designs in item response theory models. In: Wilson M (ed) Objective measurement: theory into practice, vol 1. Ablex, Norwood, pp 274–288Google Scholar
  5. Berger MPF, King CYJ, Wong WK (2000) Minimax D-optimal designs for item response theory models. Psychometrika 65:377–390CrossRefzbMATHMathSciNetGoogle Scholar
  6. Beveridge GSG, Schechter RS (1970) Optimization: theory and practice. McGraw-Hill, New YorkzbMATHGoogle Scholar
  7. Buyske S (2005) Optimal design in educational testing. In: Berger MPF, Wong WK (eds) Applied optimal designs. Wiley, New York, pp 1–19CrossRefGoogle Scholar
  8. Cochran WG (1977) Sampling techniques, 3rd edn. Wiley, New YorkzbMATHGoogle Scholar
  9. Dorans NJ, Holland PW (2000) Population invariance and equitability of tests: basic theory and the linear case. J Educ Meas 37:281–306CrossRefGoogle Scholar
  10. Duong M, von Davier AA (2012) Observed-score equating with a heterogeneous target population. Int J Test 12:224–251CrossRefGoogle Scholar
  11. Haberman SJ (2009) Linking parameter estimates derived from an item response model through separate calibrations (research report 09–40). Educational Testing Service, PrincetonGoogle Scholar
  12. Haberman SJ, Lee Y, Qian J (2009) Jackknifing techniques for evaluation of equating accuracy (research report 09–39). Educational Testing Service, PrincetonGoogle Scholar
  13. Haebara T (1980) Equating logistic ability scales by a weighted least squares method. Jpn Psychol Res 22(3):144–149Google Scholar
  14. Hua L-K, Wang Y, Heijmans JGC (1989) Optimum seeking methods (single variable). In: Lucas WF, Thompson M (eds) Mathematical modelling, vol 2. Springer, New York, pp 57–78Google Scholar
  15. Jones DH, Jin Z (1994) Optimal sequential designs for on-line item estimation. Psychometrika 59:59–75CrossRefzbMATHGoogle Scholar
  16. Kish L (1965) Survey sampling. Wiley, New YorkzbMATHGoogle Scholar
  17. Kolen MJ, Brennan RL (2004) Test equating, scaling, and linking: methods and practices. Springer, New YorkCrossRefGoogle Scholar
  18. Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Neyman J (ed) Proceedings of the second Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 481–492Google Scholar
  19. Lord FM (1980) Applications of item response theory to practical testing problems. Erlbaum, HillsdaleGoogle Scholar
  20. Lord MF, Wingersky MS (1985) Sampling variances and covariances of parameter estimates in item response theory. In: Weiss DJ (ed) Proceedings of the 1982 IRT/CAT conference, Department of Psychology, CAT Laboratory, University of Minnesota, MinneapolisGoogle Scholar
  21. Loyd BH, Hoover HD (1980) Vertical equating using the Rasch model. J Educ Meas 17:179–193CrossRefGoogle Scholar
  22. Mislevy RJ, Bock RD (1990) BILOG 3, 2nd edn. Scientific Software, MooresvilleGoogle Scholar
  23. Muraki E, Bock RD (2002) PARSCALE (Version 4.1) [Computer software]. Scientific Software, LincolnwoodGoogle Scholar
  24. Nocedal J, Wright SJ (2006) Numerical optimization. Springer, New YorkzbMATHGoogle Scholar
  25. Qian J, Spencer B (1994). Optimally weighted means in stratified sampling. In: Proceedings of the section on survey research methods, American Statistical Association, pp 863–866Google Scholar
  26. Qian J, von Davier AA, Jiang Y (2013) Achieving a stable scale for an assessment with multiple forms: weighting test samples in IRT linking. In: Millsap RE, van der Ark LA, Bolt DM, Woods CM (eds) Springer proceedings in mathematics & statistics, new developments in quantitative psychology. Springer, New York, pp 171–185CrossRefGoogle Scholar
  27. Silvey SD (1970) Statistical inference. Penguin Books, BaltimorezbMATHGoogle Scholar
  28. Stocking ML (1990) Specifying optimum examinees for item parameter estimation in item response theory. Psychometrika 55:461–475CrossRefGoogle Scholar
  29. Stocking ML, Lord FM (1983) Developing a common metric in item response theory. Appl Psychol Meas 7:201–210CrossRefGoogle Scholar
  30. van der Linden WJ, Luecht RM (1998) Observed-score equating as a test assembly problem. Psychometrika 63:401–418CrossRefzbMATHMathSciNetGoogle Scholar
  31. von Davier M, von Davier AA (2011) A general model for IRT scale linking and scale transformation. In: von Davier AA (ed) Statistical models for test equating, scaling, and linking. Springer, New York, pp 225–242CrossRefGoogle Scholar
  32. von Davier AA, Wilson C (2008) Investigating the population sensitivity assumption of item response theory true-score equating across two subgroups of examinees and two test formats. Appl Psychol Meas 32:11–26CrossRefMathSciNetGoogle Scholar
  33. von Davier AA, Holland PW, Thayer DT (2004) The kernel method of test equating. Springer, New YorkzbMATHGoogle Scholar
  34. Wilde DJ (1964) Optimum seeking methods. Prentice-Hall, Englewood CliffsGoogle Scholar
  35. Wolter K (2007) Introduction to variance estimation, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  36. Zumbo BD (2007) Validity: foundational issues and statistical methodology. In: Rao CR, Sinharay S (eds) Handbook of statistics, vol 26, Psychometrics. Elsevier Science B.V, Amsterdam, pp 45–79Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Educational Testing Service, Research and DevelopmentPrincetonUSA

Personalised recommendations