Testlet-Based Adaptive Mastery Testing

  • Hans J. Vos
  • Cees A. W. Glas
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)


In mastery testing, the problem is to decide whether a test taker must be classified as a master or a nonmaster. The decision is based on the test taker’s observed test score. Well-known examples of mastery testing include testing for pass-fail decisions, licensure, and certification. A mastery test can have both fixed-length and variable-length forms. In a fixed-length mastery test, the performance on a fixed number of items is used for deciding on mastery or nonmastery. Over the last few decades, the fixed-length mastery problem has been studied extensively by many researchers (e.g., De Gruijter & Hambleton, 1984; van der Linden, 1990). Most of these authors derived, analytically or numerically, optimal rules by applying (empirical) Bayesian decision theory (e.g., DeGroot, 1970; Lehmann, 1986) to this problem. In the variable-length form, in addition to the action of declaring mastery or nonmastery, the action of continuing to administer items is available also (e.g., Kingsbury and Weiss, 1983; Lewis & Sheehan, 1990; Sheehan and Lewis, 1992; Spray & Reckase, 1996).


Item Parameter Test Taker Computerize Adaptive Testing Bayesian Decision Theory Mastery Problem 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L.Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, DC: American Council of Education.Google Scholar
  2. Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  3. Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.CrossRefGoogle Scholar
  4. Chang, H.-H. & Stout, W. F. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.MATHCrossRefMathSciNetGoogle Scholar
  5. Coombs, C. H., Dawes, R. M. & Tversky, A. (1970). Mathematical psychology: An elementary introduction. Englewood Cliffs, NJ: Prentice-Hall.MATHGoogle Scholar
  6. DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.MATHGoogle Scholar
  7. De Gruijter, D. N. M. & Hambleton, R. K. (1984). On problems encountered using decision theory to set cutoff scores. Applied Psychological Measurement, 8, 1–8.CrossRefGoogle Scholar
  8. Ferguson, R. L. (1969). The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Unpublished doctoral dissertation, University of Pittsburgh, Pittsburgh, PA.Google Scholar
  9. Glas, C. A. W., Wainer, H. & Bradlow, E. T. (2000). MML and EAP estimates for the testlet response model. In W. J. van der Linden & C. A. W.Glas (Eds.), Computer adaptive testing: Theory and practice (pp. 271–287). Boston: Kluwer-Nijhoff Publishing.Google Scholar
  10. Haladyna, T. M. (1994). Developing and validating multiple- choice test items. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  11. Huynh, H. (1980). A nonrandomized minimax solution for passing scores in the binomial error model. Psychometrika, 45, 167–182.MATHCrossRefMathSciNetGoogle Scholar
  12. Keeney, D. & Raiffa, H. (1976). Decisions with multiple objectives: Preferences and value trade-offs. New York: John Wiley and Sons.Google Scholar
  13. Kingsbury, G. G. & Weiss, D. J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.): New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257–283). New York: Academic Press.Google Scholar
  14. Kolen, M. J. & Brennan, R. L. (1995). Test equating. New York: Springer-Verlag.MATHGoogle Scholar
  15. Lehmann, E. L. (1986). Testing statistical hypothesis. (2nd ed.). New York: Wiley.Google Scholar
  16. Lewis, C. & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14, 367–386.CrossRefGoogle Scholar
  17. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  18. Luce, R. D. & Raiffa, H. (1957). Games and decisions. New York: John Wiley and Sons.MATHGoogle Scholar
  19. McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden and R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 257–269). New York: Springer-Verlag.Google Scholar
  20. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
  21. Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 237–255). New York: Academic Press.Google Scholar
  22. Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W. J. van der Linden and R.K. Hambleton (Eds.), Handbook of modern item response theory. (pp. 271–286). New York: Springer-Verlag.Google Scholar
  23. Sheehan, K. & Lewis, C. (1992). Computerized mastery testing with non-equivalent testlets. Applied Psychological Measurement, 16, 65–76.CrossRefGoogle Scholar
  24. Sireci, S. G., Wainer, H. & Thissen, D. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.CrossRefGoogle Scholar
  25. Smith, R. L. & Lewis, C. (1995). A Bayesian computerized mastery model with multiple cut scores. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.Google Scholar
  26. Spray, J. A. & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21, 405–414.Google Scholar
  27. van der Linden, W. J. (1981). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4, 469–492.CrossRefGoogle Scholar
  28. van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.), New developments in testing: Theory and applications (pp. 129–155). Boston: Kluwer-Nijhof Publishing.Google Scholar
  29. van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201–216.MATHCrossRefMathSciNetGoogle Scholar
  30. van der Linden, W. J. & Mellenbergh, G. J. (1977). Optimal cutting scores using a linear loss function. Applied Psychological Measurement, 1, 593–599.CrossRefGoogle Scholar
  31. van der Linden, W. J. & Vos, H. J. (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155–172.MATHCrossRefMathSciNetGoogle Scholar
  32. Verhelst, N.D., Glas, C. A. W. & van der Sluis, A. (1984). Estimation problems in the Rasch model: The basic symmetric functions. Computational Statistics Quarterly, 1, 245–262.MathSciNetGoogle Scholar
  33. Vos, H. J. (1997a). Simultaneous optimization of quota-restricted selection decisions with mastery scores. British Journal of Mathematical and Statistical Psychology, 50, 105–125.MATHGoogle Scholar
  34. Vos, H. J. (1997b). A simultaneous approach to optimizing treatment assignments with mastery scores. Multivariate Behavioral Research, 32, 403–433.CrossRefGoogle Scholar
  35. Vos, H. J. (1999). Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics, 24, 271–292.Google Scholar
  36. Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157–187.CrossRefGoogle Scholar
  37. Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer-Nijhof Publishing.Google Scholar
  38. Wainer, H. & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22–29.CrossRefGoogle Scholar
  39. Wald, A. (1947). Sequential analysis. New York: Wiley.MATHGoogle Scholar
  40. Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.CrossRefGoogle Scholar
  41. Yen, W. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Hans J. Vos
    • 1
  • Cees A. W. Glas
    • 1
  1. 1.Department of Research Methodology, Measurement, and Data AnalysisUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations