Advertisement

Item Selection and Ability Estimation in Adaptive Testing

  • Wim J. van der Linden
  • Peter J. Pashley
Chapter
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)

Abstract

The last century saw a tremendous progression in the refinement and use of standardized linear tests. The first administered College Board exam occurred in 1901 and the first Scholastic Assessment Test (SAT) was given in 1926. Since then, progressively more sophisticated standardized linear tests have been developed for a multitude of assessment purposes, such as college placement, professional licensure, higher-education admissions, and tracking educational standing or progress. Standardized linear tests are now administered around the world. For example, the Test of English as a Foreign Language (TOEFL) has been delivered in approximately 88 countries.

Keywords

Posterior Distribution Item Response Theory Item Parameter Item Pool Computerize Adaptive Testing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andersen, E. B. (1980). Discrete statistical models with social sciences applications. Amsterdam: North-Holland.Google Scholar
  2. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.Google Scholar
  3. Bock, R. D. & Mislevy, R. J. (1988). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.CrossRefGoogle Scholar
  4. Chang, H.-H. & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.MATHCrossRefMathSciNetGoogle Scholar
  5. Chang, H.-H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement,20, 213–229.CrossRefGoogle Scholar
  6. Chang, H.-H. & Ying, Z. (1999). α-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.CrossRefGoogle Scholar
  7. Chang, H.-H. & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73, 441–450.CrossRefGoogle Scholar
  8. Chang, H.-H. & Ying, Z. (2009). Nonlinear sequential designs for logistic item response models with applications to computerized adaptive tests. The Annals of Statistics, 37, 1466–1488.MATHCrossRefMathSciNetGoogle Scholar
  9. Chen, S., Hou, L. & Dodd, B. G. (1998). A comparison of maximum-likelihood estimation and expected a posteriori estimation in CAT using the partial credit model. Educational and Psychological Measurement, 58, 569–595.CrossRefGoogle Scholar
  10. De Ayala, R. J. (1992). The nominal response model in computerized adaptive testing. Applied Psychological Measurement, 16, 327–343.CrossRefGoogle Scholar
  11. De Ayala, R. J., Dodd, B. G. & Koch, W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5, 17–34.CrossRefGoogle Scholar
  12. Eggen, T. J. H. M. & Verschoor, A. J. (2006). Optimal testing with easy and difficult items in computerized adaptive testing. Applied Psychological Measurement, 30, 379–393.CrossRefMathSciNetGoogle Scholar
  13. Freund, P. A., Hofer, S. & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.CrossRefMathSciNetGoogle Scholar
  14. Geerlings, H., van der Linden, W. J. & Glas, C. A. W. (2009). Modeling rule-based item generation. Submitted for publication.Google Scholar
  15. Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.Google Scholar
  16. Glas, C. A. W. & van der Linden, W. J. (2001). Modeling item variability in item parameters in item response models (Research Report 01-11). Enschede, the Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.Google Scholar
  17. Glas, C. A. W. & van der Linden, W. J. (2003). Computerized adaptive testing with item clones. Applied Psychological Measurement, 27, 247–261.CrossRefMathSciNetGoogle Scholar
  18. Gulliksen, H. (1950). Theory of mental tests. Hillsdale, NJ: Erlbaum.Google Scholar
  19. Holling, H., Bertling, J. P. & Zeuch, N. (in press). Probability word problems: Automatic item generation and LLTM modelling. Studies in Educational Evaluation. Google Scholar
  20. Klein Entink, R. H., Fox, J.-P. & van der Linden, W. J. (2009). A multivariate multilevel approach to simultaneous modeling of accuracy and speed on test items. Psychometrika, 74, 21–48.MATHCrossRefGoogle Scholar
  21. Lehmann, E. L. & Casella, G. (1998). Theory of point estimation. New York: Springer-Verlag.MATHGoogle Scholar
  22. Lord, F. M. (1971). The self-scoring flexilevel test. Journal of Educational Measurement, 8, 147–151.CrossRefGoogle Scholar
  23. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  24. Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.CrossRefGoogle Scholar
  25. Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.MATHGoogle Scholar
  26. Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.MATHCrossRefMathSciNetGoogle Scholar
  27. Mislevy, R. J. & Wu, P.-K. (1988). Inferring examinee ability when some items response are missing (Research Report 88-48-ONR). Princeton, NJ: Educational Testing Service.Google Scholar
  28. Owen, R. J. (1969). A Bayesian approach to tailored testing (Research Report 69-92). Princeton, NJ: Educational Testing Service.Google Scholar
  29. Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.MATHCrossRefMathSciNetGoogle Scholar
  30. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Denmarks Paedogogiske Institut.Google Scholar
  31. Roberts, J. S., Lin, Y. & Laughlin, J. E. (2001). Computerized adaptive testing with the generalized graded unfolding model. Applied Psychological Measurement, 25, 177–192.CrossRefMathSciNetGoogle Scholar
  32. Samejima, F. (1973). A comment on Birnbaum’s three-parameter logistic model in latent trait theory. Psychometrika, 38, 221–233.MATHCrossRefGoogle Scholar
  33. Samejima, F. (1993). The bias function of the maximum-likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, 195–210.CrossRefGoogle Scholar
  34. Schnipke, D. L. & Green, B. F. (1995). A comparison of item selection routines in linear and adaptive testing. Journal of Educational Measurement, 32, 227–242.CrossRefGoogle Scholar
  35. Segall, D. O. (1997). Equating the CAT-ASVAB. In W. A. Sands, B. K. Waters & J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 181–198). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  36. Sinharay, S., Johnson, M. S. & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.CrossRefGoogle Scholar
  37. Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21, 365–389.Google Scholar
  38. Thissen, D., Chen, W.-H. & Bock, R. D. (2002). Multilog 7: Analysis of multi-category response data [Computer program and manual]. Lincolnwood, IL: Scientific Software International.Google Scholar
  39. Thissen, D. & Mislevy, R. J. (1990). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103–134). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  40. Tsutakawa, R. K. & Johnson, C. (1990). The effect of uncertainty on item parameter estimation on ability estimates. Psychometrika, 55, 371–390.CrossRefGoogle Scholar
  41. van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 62, 201–216.CrossRefMathSciNetGoogle Scholar
  42. van der Linden, W. J. (1999). A procedure for empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29.CrossRefGoogle Scholar
  43. van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308.MATHCrossRefMathSciNetGoogle Scholar
  44. van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 5–20.CrossRefGoogle Scholar
  45. van der Linden, W. J. & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education,13, 35–53.CrossRefGoogle Scholar
  46. van der Linden, W. J. & Glas, C. A. W. (2001). Cross-validating item parameter estimation in computerized adaptive testing. In A. Boomsma, M. A. J. van Duijn & T. A. M. Snijders (Eds.), Essays on item response theory (pp. 205–219). New York: Springer-Verlag.Google Scholar
  47. van der Linden, W. J. & Glas, C. A. W. (2007). Statistical aspects of adaptive testing. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801–838). Amsterdam: North-Holland.Google Scholar
  48. van Rijn, P. W., Eggen, T. J. H. M., Hemker, B. T. & Sanders, P. F. (2002). Evaluation of selection procedures for computerized adaptive testing with polytomous items. Applied Psychological Measurement, 26, 393–411.CrossRefMathSciNetGoogle Scholar
  49. Veerkamp, W. J. J. & Berger, M. P. F. (1997). Item-selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203–226.Google Scholar
  50. Wainer, H., Lewis, C., Kaplan, B. & Braswell, J. (1991). Building algebra testlets: A comparison of hierarchical and linear structures. Journal of Educational Measurement, 28, 311–323.CrossRefGoogle Scholar
  51. Wang, T., Hanson, B. A. & Lau, C.-M. A. (1999). Reducing bias in CAT trait estimation: A comparison of approaches. Applied Psychological Measurement, 23, 263–278.CrossRefGoogle Scholar
  52. Wang, T. & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109–135.CrossRefGoogle Scholar
  53. Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54, 427–450.CrossRefMathSciNetGoogle Scholar
  54. Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 4, 473–285.CrossRefGoogle Scholar
  55. Weiss, D. J. & McBride, J. R. (1984). Bias and information of Bayesian adaptive testing. Applied Psychological Measurement, 8, 273–285.CrossRefGoogle Scholar
  56. Zimoski, M. F., Muraki, E., Mislevy, R. & Bock, D. R. (2006). BILOG-MG 3 for Windows [Computer program and manual]. Lincolnwood, IL: Scientific Software International.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Wim J. van der Linden
    • 1
  • Peter J. Pashley
    • 2
  1. 1.CTB/McGraw-HillMontereyUSA
  2. 2.Law School Admission CouncilNewtownUSA

Personalised recommendations