Summary
In testing, the area of standards and standard-setting remains relatively unsettled. The chapter begins with definitions of scores and standards, and describes norm-referenced score interpretation, domain-referenced score interpretation, relative standards, and absolute standards. It then reviews the work related to the credibility of standards and outlines some of the more common standard-setting techniques used with MCQ-based tests and clinical examinations. Finally, because it is sometimes useful to combine scores from several related assessments, information is presented on when and how to do so.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anastasi, A. (1988).Psychological testing(6thed.). New York: Macmillan.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Eds.)Educational Measurement.Washington, DC: American Council on Education.
Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests.Review of Educational Research 56137–172.
Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory.Applied Psychological Measurement 4219–240.
Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teachers examinations.Journal of Educational Measurement27, 145–163.
Clauser, B. E., & Clyman, S. G. (1994). A contrasting-groups approach to standard setting for performance assessments on clinical skills.Academic Medicine 69S42–S44.
Cronbach, L. J. (1990).Essentials of psychological testing(5thed.). New York: Harper Collins.
Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for setting standards on the National Teachers Examination.Journal of Educational Measurement 21113–129.
Cusimano, M.D. (1996). Standard setting in medical education.Journal of Educational Measurement 21113–129.
Dawes, R. M., & Corrigan, R. (1974). Linear models in decision making.Psychological Bulletin 8195–106.
De Gruijter, D. N. M. (1985). Compromise models for establishing examination standards.Journal of Educational Measurement22, 263–269.
Ebel, R. L. (1979).Essentials of educational measurement.Englewood Cliffs, NJ: Prentice-Hall.
Fabrey, L., & Raymond, M. (1987). Congruence of standard-setting methods for a nursing certification examination. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, DC.
Fitzpatrick, A. R. (1989). Social influences in standard setting.Review of Educational Research 59222–235.
Glass, G. V. (1978). Standards and criteria.Journal of Educational Measurement 15237–261.
Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.)Educational Measurement(pp. 485–514). New York: American Council on Education and Macmillan.
Jaeger, R. M. (1995). Setting performance standards through two-stage judgmental policy capturing.Applied Measurement in Education 815–40.
Kane, M. (1987). On the use of IRT models with judgmental standard-setting procedures.Journal of Educational Measurement 24333–345.
Kane, M. (1994). Validating the performance standards associated with passing scores.Review of Educational Research 64425–461.
Kane, M., & Wilson, J. (1984). Errors of measurement and standard setting in mastery testing.Applied Psychological Measurement 8107–115.
Livingston, S. A., & Zeiky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service. Princeton, NJ.
Meskauskas, J. A. (1976). Evaluation models for criterion-referenced testing: Views regarding mastery and standard-setting.Review of Educational Research 45133–158.
Mills, C. N., Jaeger, R. M., Plake, B. S., & Hambleton, R. K. (1998). An investigation of several new methods for establishing standards on complex performance assessments. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Norcini, J. J. (1992). Approaches to standard-setting for performance-based examinations. In R. M. Harden, I. R. Hart, & M. A. Mulholland (Eds.)Approaches to the assessment of clinical competence Part 1(pp. 3237). Dundee, Scotland: Centre for Medical Education.
Norcini, J. J. (1999). Standards and reliability: When piles of thumb don’t apply.Academic Medicine 741088–1090.
Norcini, J. J., & Shea, J. A. (1992). The reproducibility of standards over groups and occasions.Applied Measurement in Education 563–72.
Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards.Applied Measurement in Education 1039–59.
Norcini, J. J., Lipner, R. S., Langdon, L. O.&Strecker, C. A. (1987) A comparison of three variations on a standard-setting method.Journal of Educational Measurement 2456–64.
Norcini, J. J., Maihoff, N. A., Day, S. C., & Benson, Jr., J. A. (1989). Trends in medical knowledge as assessed by the certifying examination in internal medicine.Journal of the American Medical Association 2622402–2404.
Norcini, J. J., Shea, J. A., & Kanya, D. T. (1988). The effect of various factors on standard-setting.Journal of Educational Measurement 2557–65.
Norcini, J. J., Shea, J. A., & Ping, J. C. (1988). A note on application of multiple matrix sampling to standard-setting.Journal of Educational Measurement 25159–164.
Norcini, J. J., Shea, J. A.&Webster, G. D. (1986) Perceptions of the certification standards of the American Board of Internal Medicine.Journal of General Internal Medicine 1166–169.
Orr, N. A., & Nungester, R.L. (1991). Assessment of constituency opinion about NBME examination standards.Academic Medicine 66465–470.
Petersen, N. S., Kolen, M. J.&Hoover, H. D. (1989) Scaling norming and equating. In R. L. Linn (Ed.)Education measurement(pp. 221–262). New York: American Council on Education and Macmillan.
Plake, B. S, Impara, J. C.&Potenza, M. T. (1994) Content specificity of expert judgments in a standard-setting study.Journal of Educational Measurement 31339–347.
Popham, W. J. (1978). As always provocative.Journal of Educational Measurement 15297–300.
Putnam, S. E., Pence, P., & Jaeger, R. M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments.Applied Measurement in Education 857–84.
Ramsey, P. G., Carline, J. D., Inui, T. S., Larson, E. B., LoGerfo, J. P., & Wenrich, M. D. (1989). Predictive validity of certification by the American Board of Internal Medicine.Annals of Internal Medicine 110719–726.
Shea, J. A., Reshetar, R. A., Dawson, B. D.&Norcini, J. J. (1994) Sensitivity of the modified Angoff standard-setting method to variations in item content.Teaching and Learning in Medicine 6288–292.
Shepard, L. A. (1980). Standard setting issues and methods.Applied Psychological Measurement 4447–467.
Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), Aguide to criterion-referenced test construction(pp. 169–198). Baltimore: Johns Hopkins Press.
Shimberg, B. (1981). Testing for licensure and certification.American Psychologist 361138–1146.
Smith, R. L., & Smith, J. K. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures.Journal of Educational Measurement 25259–274.
Van der Linden, W. J. (1982). A latent trait method for determining intrajudge inconsistency in the Angoff and Nedelsky procedures.Journal of Educational Measurement19, 295–308.
Wang, M. W., & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies.Review of Educational Research 4663–705.
Wainer, H. (1976). Estimating coefficients in liner models: It don’t make no nevermind.Psychological Bulletin 83(2)213–217.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Norcini, J., Guille, R. (2002). Combining Tests and Setting Standards. In: Norman, G.R., et al. International Handbook of Research in Medical Education. Springer International Handbooks of Education, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0462-6_30
Download citation
DOI: https://doi.org/10.1007/978-94-010-0462-6_30
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3904-8
Online ISBN: 978-94-010-0462-6
eBook Packages: Springer Book Archive