New IRT Models for Examinee-Selected Items

  • Wen-Chung WangEmail author
  • Chen-Wei Liu
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 140)


Examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., responding to two items in five given items; leading to ten selection patterns), has the advantages of enhancing students’ learning motivation and reducing their testing anxiety. The ESI design yields incomplete data (i.e., only those selected items are answered and the others have missing data). It has been argued that missing data in the ESI design are missing not at random, making standard item response theory (IRT) models inappropriate. Recently, Wang et al. (Journal of Educational Measurement 49(4):419–445, 2012) propose an IRT model for examinee-selected items by adding an additional latent trait to standard IRT models to account for the selection effect. This latent trait could correlate with the intended-to-be-measured latent trait, and the correlation quantifies how stronger the selection effect and how serious the violation of the assumption of missing at random are. In this study, we developed a framework to incorporate this model as a special case and generate several new models. We conducted an experiment to collect real data, in which 501 fifth graders took two mandatory items and four pairs of mathematic (dichotomous) items. In each pair of items, students were first asked to indicate which item they preferred to answer and then answered both items. This is referred to as the “Choose one, Answer all” approach. These new IRT models were fit to the real data and the results were discussed.


Item response theory Examinee-selected items Selection effect Missing data 



This study was supported by the General Research Fund, Hong Kong (No.~844112).


  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Allen, N. L., Holland, P. W., & Thayer, D. T. (2005). Measuring the benefits of examinee-selected questions. Journal of Educational Measurement, 42(1), 27–51.CrossRefGoogle Scholar
  3. Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.CrossRefzbMATHGoogle Scholar
  4. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.Google Scholar
  5. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.MathSciNetCrossRefGoogle Scholar
  7. Bradlow, E. T., & Thomas, N. (1998). Item response theory models applied to data allowing examinee choice. Journal of Educational and Behavioral Statistics, 23(3), 236–243.CrossRefGoogle Scholar
  8. Bridgeman, B., Morgan, R., & Wang, M.-m. (1997). Choice among essay topics: Impact on performance and validity. Journal of Educational Measurement, 34(3), 273–286.CrossRefGoogle Scholar
  9. Fitzpatrick, A. R., & Yen, W. M. (1995). The psychometric characteristics of choice items. Journal of Educational Measurement, 32(3), 243–259.CrossRefGoogle Scholar
  10. Fox, J. P. (2005). Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 58(1), 145–172.MathSciNetCrossRefGoogle Scholar
  11. Fox, J.-P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66(2), 271–288.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Kendall, M. G. (1975). Rank correlation methods (4th ed.). London: Charles Griffin.zbMATHGoogle Scholar
  13. Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3–21.MathSciNetCrossRefGoogle Scholar
  14. Lukhele, R., Thissen, D., & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31(3), 234–250.CrossRefGoogle Scholar
  15. Mann, H. B. (1945). Nonparametric tests against trend. Econometrica: Journal of the Econometric Society, 13, 245–259.MathSciNetCrossRefzbMATHGoogle Scholar
  16. Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.CrossRefzbMATHGoogle Scholar
  17. Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.Google Scholar
  18. Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.CrossRefGoogle Scholar
  19. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Expanded edition, 1980. Chicago: The University of Chicago Press, ed.). Copenhagen: Institute of Educational Research.Google Scholar
  20. Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47(3), 361–372.MathSciNetCrossRefGoogle Scholar
  21. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph, 17, 1–100.Google Scholar
  23. Wainer, H., & Thissen, D. (1994). On examinee choice in educational testing. Review of Educational Research, 64(1), 159–195.CrossRefGoogle Scholar
  24. Wainer, H., Wang, X.-B., & Thissen, D. (1994). How well can we compare scores on test forms that are constructed by examinees’ choice? Journal of Educational Measurement, 31(3), 183–199.CrossRefGoogle Scholar
  25. Wainer, H., Wang, X.-B., & Thissen, D. (1991). How well can we equate test forms constructed by examinees? (Program Statistics Report 91-55). Princeton, NJ: Educational Testing Service.Google Scholar
  26. Wang, W.-C., Jin, K.-Y., Qiu, X.-L., & Wang, L. (2012). Item response models for examinee-selected items. Journal of Educational Measurement, 49(4), 419–445.CrossRefGoogle Scholar
  27. Wang, W.-C., & Qiu, X.-L. (2013). A multidimensional and multilevel extension of a random-effect approach to subjective judgment in rating scales. Multivariate Behavioral Research, 48(3), 398–427.CrossRefGoogle Scholar
  28. Wang, X. B. (1999). Understanding psychological processes that underlie test takers’ choices of constructed response items. Newtown, PA: Law School Admission Council.Google Scholar
  29. Wang, X.-b., Wainer, H., & Thissen, D. (1995). On the viability of some untestable assumptions in equating exams that allow examinee choice. Applied Measurement in Education, 8(3), 211–225.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.The Hong Kong Institute of EducationHong KongChina

Personalised recommendations