Advertisement

Multistage Testing: Issues, Designs, and Research

  • April Zenisky
  • Ronald K. Hambleton
  • Richard M. Luecht
Chapter
Part of the Statistics for Social and Behavioral Sciences book series (SSBS)

Abstract

Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modules (Luecht & Nungester, 1998) or testlets (Wainer & Kiely, 1987) and can be characterized as short versions of linear test forms where some specified number of individual items are administered together to meet particular test specifications and provide a certain proportion of the total test information. The individual items in a module may be all related to one or more common stems (such as passages or graphics) or be more generally discrete from one another, per the content specifications of the testing program for the test in question. These self-contained, carefully constructed, fixed sets of items are the same for every examinee to whom each set is administered, but any two examinees may or may not be presented with the same sequence of modules, nor even the same modules.

Keywords

Item Bank Computerize Adaptive Testing Adaptive Testing Ability Estimation Item Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adema, J. J. (1990). The construction of customized two-stage tests. Journal of Educational Measurement, 27, 241–253.CrossRefGoogle Scholar
  2. Angoff, W. & Huddleston, E. (1958). The multi-level experiment: A study of a two-level testing system for the College Board Scholastic Aptitude Test (Statistical Report No. SR-58-21). Princeton, NJ: Educational Testing Service.Google Scholar
  3. Armstrong, R., Jones, D., Koppel, N. & Pashley, P. (2000, April). Computerized adaptive testing with multiple forms structures. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
  4. Berger, M. P. F. (1994). A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets. Applied Psychological Measurement, 18, 141–153.CrossRefGoogle Scholar
  5. Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.CrossRefGoogle Scholar
  6. Breithaupt, K., Ariel, A. & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.CrossRefGoogle Scholar
  7. Breithaupt, K. & Hare, D. R. (2007), Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5–20.CrossRefMathSciNetGoogle Scholar
  8. Cronbach, L. J. & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana, IL: University of Illinois Press.Google Scholar
  9. Dodd, B. G. & Fitzpatrick, S. J. (2002). Alternatives for scoring CBTs. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 215–236). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  10. Folk, V. G. & Smith, R. L. (2002). Models for delivery of computer-based tests. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 41–66). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  11. Glas, C. A. W., Wainer, H. & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Boston: Kluwer-Nijhof Publishing.Google Scholar
  12. Green, B. F., Jr., Bock, R. D., Humphreys, L. G., Linn, R. B. & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.CrossRefGoogle Scholar
  13. Hadadi, A., Luecht, R. M., Swanson, D. B. & Case, S. M. (1998, April). Study 1: Effects of modular subtest structure and item review on examinee performance, perceptions and pacing. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.Google Scholar
  14. Hambleton, R. K. & Xing, D. (2006). Optimal and non-optimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19, 221–239.CrossRefGoogle Scholar
  15. Jodoin, M. G., Zenisky, A. L. & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams. Applied Measurement in Education, 2006, 19, 203–220.CrossRefGoogle Scholar
  16. Kim, H. (1993). Monte Carlo simulation comparison of two-stage testing and computer adaptive testing. Unpublished doctoral dissertation, University of Nebraska, Lincoln.Google Scholar
  17. Kim, H. & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta.Google Scholar
  18. Lee, G. & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237–255.CrossRefGoogle Scholar
  19. Linn, R., Rock, D. & Cleary, T. (1969). The development and evaluation of several programmed testing methods. Educational and Psychological Measurement, 29, 129–146.CrossRefGoogle Scholar
  20. Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.CrossRefGoogle Scholar
  21. Lord, F. M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 227–238.CrossRefGoogle Scholar
  22. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  23. Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.MATHGoogle Scholar
  24. Luecht, R. M. (1997, March). An adaptive sequential paradigm for managing multidimensional content. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.Google Scholar
  25. Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224–236.CrossRefGoogle Scholar
  26. Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
  27. Luecht, R. M. (2003, April). Exposure control using adaptive multistage item bundles. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.Google Scholar
  28. Luecht, R. M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 575–596). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  29. Luecht, R. M., Brumfield, T. & Breithaupt, K. (2006). A testlet-assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.CrossRefGoogle Scholar
  30. Luecht, R. M. & Burgin, W. (2003, April). Test information targeting strategies for adaptive multistage testing designs. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.Google Scholar
  31. Luecht, R. M. & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.CrossRefGoogle Scholar
  32. Luecht, R. M., Nungester, R. J. & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the meeting of the National Council on Measurement in Education, New York.Google Scholar
  33. Mead, A. (2006). An introduction to multistage testing [Special Issue]. Applied Measurement in Education, 19, 185–260.CrossRefGoogle Scholar
  34. Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9, 287–304.CrossRefGoogle Scholar
  35. Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. Unpublished doctoral dissertation, University of Massachusetts, Amherst.Google Scholar
  36. Reese, L. M. & Schnipke, D. L. (1999). An evaluation of a two-stage testlet design for computerized testing (Computerized Testing Report 96-04). Newtown, PA: Law School Admission Council.Google Scholar
  37. Reese, L. M., Schnipke, D. L. & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. (Law School Admissions Council Computerized Testing Report 97-02). Newtown, PA: Law School Admission Council.Google Scholar
  38. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement, No. 17. Google Scholar
  39. Schnipke, D. L. & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Law School Admissions Council Computerized Testing Report 97-01). Newtown, PA: Law School Admission Council.Google Scholar
  40. Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.CrossRefGoogle Scholar
  41. Thissen, D. (1998, April). Scaled scores for CATs based on linear combinations of testlet scores. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.Google Scholar
  42. Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.CrossRefGoogle Scholar
  43. van der Linden, W. J. (2000). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225–240.CrossRefGoogle Scholar
  44. van der Linden, W. J. (2005). Models for optimal test design. New York: Springer-Verlag.MATHGoogle Scholar
  45. van der Linden, W. J. & Adema, J. J. (1998). Simultaneous assembly of multiple test forms. Journal of Educational Measurement, 35, 185–198.CrossRefGoogle Scholar
  46. van der Linden, W. J., Ariel, A. & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31, 81–99.CrossRefGoogle Scholar
  47. Vos, H. J. (2000, April). Adaptive mastery testing using a Multidimensional IRT Model and Bayesian sequential decision theory. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans.Google Scholar
  48. Vos, H. J. & Glas, C. A. W. (2001, April). Multidimensional IRT based adaptive sequential mastery testing. Paper presented at the meeting of the National Council in Measurement in Education, Seattle.Google Scholar
  49. Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 15–20.CrossRefMathSciNetGoogle Scholar
  50. Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer-Nijhof Publishing.Google Scholar
  51. Wainer, H. & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.CrossRefGoogle Scholar
  52. Wainer, H. & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.CrossRefGoogle Scholar
  53. Wainer, H., Sireci, S. & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197–219.CrossRefGoogle Scholar
  54. Wald, A. (1947). Sequential analysis. New York: Wiley.MATHGoogle Scholar
  55. Wise, S. L. & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.Google Scholar
  56. Xing, D. (2001). Impact of several computer-based testing variables on the psychometric properties of credentialing examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.Google Scholar
  57. Xing, D. & Hambleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing exams. Educational and Psychological Measurement, 64, 5–21.CrossRefMathSciNetGoogle Scholar
  58. Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. Unpublished doctoral dissertation, University of Massachusetts, Amherst.Google Scholar
  59. Zenisky, A. L., Hambleton, R. K. & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the Medical College Admissions Test. Journal of Educational Measurement, 39, 1–16.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • April Zenisky
    • 1
  • Ronald K. Hambleton
    • 1
  • Richard M. Luecht
    • 2
  1. 1.Center for Educational Assessment, University of MassachusettsAmherstUSA
  2. 2.ERM DepartmentUniversity of North Carolina at GreensboroGreensboroUSA

Personalised recommendations