Optimal Online Calibration Designs for Item Replenishment in Adaptive Testing

  • Yinhong He
  • Ping ChenEmail author


The maintenance of item bank is essential for continuously implementing adaptive tests. Calibration of new items online provides an opportunity to efficiently replenish items for the operational item bank. In this study, a new optimal design for online calibration (referred to as D-c) is proposed by incorporating the idea of original D-optimal design into the reformed D-optimal design proposed by van der Linden and Ren (Psychometrika 80:263–288, 2015) (denoted as D-VR design). To deal with the dependence of design criteria on the unknown item parameters of new items, Bayesian versions of the locally optimal designs (e.g., D-c and D-VR) are put forward by adding prior information to the new items. In the simulation implementation of the locally optimal designs, five calibration sample sizes were used to obtain different levels of estimation precision for the initial item parameters, and two approaches were used to obtain the prior distributions in Bayesian optimal designs. Results showed that the D-c design performed well and retired smaller number of new items than the D-VR design at almost all levels of examinee sample size; the Bayesian version of D-c using the prior obtained from the operational items worked better than that using the default priors in BILOG-MG and PARSCALE; and Bayesian optimal designs generally outperformed locally optimal designs when the initial item parameters of the new items were poorly estimated.


computerized adaptive testing online calibration locally optimal design Bayesian optimal design item replenishment item bank maintenance 



This study was partially supported by the National Natural Science Foundation of China (Grant No. 31300862), KLAS (Grant No. 130028732), the Research Program Funds of the Collaborative Innovation Center of Assessment toward Basic Education Quality (Grant Nos. 2019-01-082-BZK01 and 2019-01-082-BZK02), and the Startup Foundation for Introducing Talent of NUIST (Grant No. 2018r041). The authors are indebted to the editor, associate editor and two anonymous reviewers for their suggestions and comments on the earlier manuscript.


  1. Ali, U. S., & Chang, H.-H. (2014). An item-driven adaptive design for calibrating pretest items (Research Report No. RR-14-38). Princeton, NJ: ETS.Google Scholar
  2. Ban, J. C., Hanson, B. A., Wang, T. Y., Yi, Q., & Harris, D. J. (2001). A comparative study of on-line pretest item—Calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38, 191–212.CrossRefGoogle Scholar
  3. Berger, M. P. F. (1992). Sequential sampling designs for the two-parameter item response theory model. Psychometrika, 57, 521–538.CrossRefGoogle Scholar
  4. Berger, M. P. F. (1994). D-Optimal sequential sampling designs for item response theory models. Journal of Educational Statistics, 19, 43–56.CrossRefGoogle Scholar
  5. Berger, M. P. F., King, C. Y. J., & Wong, W. K. (2000). Minimax D-optimal designs for item response theory models. Psychometrika, 65, 377–390.CrossRefGoogle Scholar
  6. Birnbaum, A. (1968). Some latent ability models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Boston: Addison-Wesley.Google Scholar
  7. Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Psychological Measurement, 6, 431–444.CrossRefGoogle Scholar
  8. Buyske, S. (1998). Optimal design for item calibration in computerized adaptive testing: The 2PL case. In N. Flournoy, et al. (Eds.), New developments and applications in experimental design. Lecture notes-monograph series (Vol. 34). Haywood, CA: Institute of Mathematical Statistics.Google Scholar
  9. Buyske, S. (2005). Optimal design in educational testing. In M. P. F. Berger & W. K. Wong (Eds.), Applied optimal designs. West Sussex: Wiley.Google Scholar
  10. Chang, Y. C. I., & Lu, H. Y. (2010). Online calibration via variable length computerized adaptive testing. Psychometrika, 75, 140–157.CrossRefGoogle Scholar
  11. Chen, P. (2017). A comparative study of online item calibration methods in multidimensional computerized adaptive testing. Journal of Educational and Behavioral Statistics, 42, 559–590.CrossRefGoogle Scholar
  12. Chen, P., & Wang, C. (2016). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81, 674–701.CrossRefPubMedGoogle Scholar
  13. Chen, P., Wang, C., Xin, T., & Chang, H.-H. (2017). Developing new online calibration methods for multidimensional computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 81–117.CrossRefPubMedGoogle Scholar
  14. Chen, P., Xin, T., Wang, C., & Chang, H.-H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77, 201–222.CrossRefGoogle Scholar
  15. Cheng, Y., Patton, J. M., & Shao, C. (2015). A-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement, 75, 260–283.CrossRefPubMedGoogle Scholar
  16. Cheng, Y., & Yuan, K. H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75, 280–291.CrossRefPubMedPubMedCentralGoogle Scholar
  17. He, Y., Chen, P., Li, Y., & Zhang, S. (2017). A new online calibration method based on Lord’s bias-correction. Applied Psychological Measurement, 41, 456–471.CrossRefPubMedPubMedCentralGoogle Scholar
  18. He, Y., Chen, P., & Li, Y. (2019). New efficient and practicable adaptive designs for calibrating items online. Applied Psychological Measurement,. Scholar
  19. Jones, D. H., & Jin, Z. (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 59–75.CrossRefGoogle Scholar
  20. Kang, H. A. (2016). Likelihood estimation for jointly analyzing item responses and response times (unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
  21. Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.CrossRefGoogle Scholar
  22. Kingsbury, G. G. (2009). Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.Google Scholar
  23. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  24. Lu, H. Y. (2014). Application of optimal designs to item calibration. Plos One, 9(9), e106747.CrossRefPubMedPubMedCentralGoogle Scholar
  25. Mathew, T., & Sinha, B. K. (2001). Optimal designs for binary data under logistic regression. Journal of Statistical Planning and Inference, 93, 295–307.CrossRefGoogle Scholar
  26. Minkin, S. (1987). Optimal designs for binary data. Journal of the American Statistical Association, 82, 1098–1103.CrossRefGoogle Scholar
  27. Ren, H., van der Linden, W. J., & Diao, Q. (2017). Continuous online item calibration: Parameter recovery and item utilization. Psychometrika, 82, 498–522.CrossRefPubMedGoogle Scholar
  28. Stocking, M. L. (1988). Scale drift in on-line calibration (Research Report. 88–28). Princeton, NJ: ETS.Google Scholar
  29. Stocking, M. L. (1990). Specifying optimum examinees for item parameter estimation in item response theory. Psychometrika, 55, 461–475.CrossRefGoogle Scholar
  30. Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55, 371–390.CrossRefGoogle Scholar
  31. van der Linden, W. J., & Ren, H. (2015). Optimal Bayesian adaptive design for test item calibration. Psychometrika, 80, 263–288.CrossRefPubMedGoogle Scholar
  32. Wainer, H., & Mislevy, R. J. (1990). Chap. 4: Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 65–102). Hillsdale, NJ: Erlbaum.Google Scholar
  33. Wingersky, M., & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347–364.CrossRefGoogle Scholar
  34. Zheng, Y. (2014). New methods of online calibration for item bank replenishment (unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
  35. Zheng, Y. (2016). Online calibration of polytomous items under the generalized partial credit model. Applied Psychological Measurement, 40, 434–450.CrossRefPubMedPubMedCentralGoogle Scholar
  36. Zheng, Y., & Chang, H. H. (2017). A comparison of five methods for pretest item selection in online calibration. International Journal of Quantitative Research in Education, 4, 133–158.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2019

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsNanjing University of Information Science and TechnologyNanjing CityChina
  2. 2.School of Mathematical SciencesBeijing Normal UniversityHai Dian DistrictChina
  3. 3.Collaborative Innovation Center of Assessment Toward Basic Education QualityBeijing Normal UniversityBeijingChina

Personalised recommendations