Clinical Performance Assessments

  • Emil R. Petrusa
Part of the Springer International Handbooks of Education book series (SIHE, volume 7)


Evaluation of clinical performance for physicians in training is central to assuring qualified practitioners. The time-honored method of oral examination after a single patient suffers from several measurement shortcomings. Too little sampling, low reliability, partial validity and potential for evaluator bias undermine the oral examination. Since 1975, standardized clinical examinations have developed to provide broader sampling, more objective evaluation criteria and more efficient administration. Research supports reliability of portrayal and data capture by standardized patients as well as the predictability of future trainee performance. Methods for setting pass marks for cases and the whole test have evolved from those for written examinations. Pass marks from all methods continue to fail an unacceptably high number of learners without additional adjustments. Studies show a positive impact of these examinations on learner study behaviors and on the number of direct observations of learners’ patient encounters. Standardized clinical performance examinations are sensitive and specific for benefits of a structured clinical curriculum. Improvements must include better alignment of a test’s purpose, measurement framework and scoring. Data capture methods for clinical performance at advanced levels need development. Checklists completed by standardized patients do not capture the organization or approach a learner takes in the encounter. Global ratings completed by faculty hold promise but more work is needed. Future studies should investigate the validity of case and test-wise pass marks. Finally research on the development of expertise should guide the next generation of assessment tasks, encounters and scoring in standardized clinical examinations.


Clinical Performance Standardize Patient Objective Structure Clinical Examination Test Taker Academic Medicine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abrahamowicz, M., Tamblyn, R. M., Ramsay, J. O., Klass, D. K., & Kopelow, M. L. (1990). Detecting and correcting for rater-induced differences in standardized patient tests of clinical competence.Academic Medicine65, S25–S26.CrossRefGoogle Scholar
  2. Allen, S. S., Bland, C. J., Harris, I. B., Anderson, D., Poland, G., Satran, L., & Miller, W. (1991). Structured clinical teaching strategy.Medical Teacher 13177–184.CrossRefGoogle Scholar
  3. Anderson, D. C., Harris, I. B., Allen, S., Satran, L., Bland, C. J., Davis-Feickert, J. A., Poland, G. A., & Miller, W. J. (1991). Comparing students’ feedback about clinical instruction with their performances.Academic Medicine 6629–34.CrossRefGoogle Scholar
  4. Anderson, M. B., Stillman, P. L., & Wang, Y. (1994). Growing use of standardized patients in teaching and evaluation in medical education.Teaching and Learning in Medicine 615–22.CrossRefGoogle Scholar
  5. Barrows, H. S., & Bennett, K. (1972). The diagnostic (problem solving) skill of the neurologist: experimental studies and their implications for neurological training.Archives of Neurology 26273–275.CrossRefGoogle Scholar
  6. Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests.Review of Educational Research 56137–172.Google Scholar
  7. Berner, E. S., Hamilton, L. A. & Best, W. R. (1974). A new approach to evaluating problem-solving in medical students.Journal of Medical Education 49666–671.Google Scholar
  8. Brennan, R. (1983).Elements of generalizability theory.Iowa City, IA: American College Testing Program. Cassell, E. J. (1990).The nature of suffering and the goals of medicine.New York: Oxford University Press.Google Scholar
  9. Cater, J. I., Forsyth, J. S., & Frost, G. J. (1991). The use of the objective structured clinical examination as anaudit of teaching and student performance.Medical Teacher 13253–257.CrossRefGoogle Scholar
  10. Cohen, D. S., Colliver, J. A., Marcy, M. S., Fried, E. D., & Swartz, M. H. (1996). Psychometric properties of a standardized-patient checklist and rating-scale form used to assess interpersonal and communication skills.Academic Medicine 71S87–89.CrossRefGoogle Scholar
  11. Cohen, R., Rothman, A. I., Poldre, P.&Ross, J. (1991). Validity and generalizability of global ratings in an objective structured clinical examination.Academic Medicine 66545–548.CrossRefGoogle Scholar
  12. Cohen, R., Rothman, A. I., Ross, J., & Poldre, P. (1991). Validating an objective structured clinical examination (OSCE) as a method for selecting foreign medical graduates for a pre-internship program.Academic Medicine 66S67–S69.CrossRefGoogle Scholar
  13. Colliver, J. A., & Williams, R. G. (1993). Technical issues: test application.Academic Medicine 68454–460.CrossRefGoogle Scholar
  14. Colliver, J. A., Marcy, M. L., Travis, T. A., & Robbs, R. S. (1991). The interaction of student gender andstandardized-patient gender on a performance-based examination of clinical competence.AcademicMedicine 66S31–S33.Google Scholar
  15. Colliver, J. A., Markwell, S. J., Vu, N. V., & Barrows, H. S. (1990a). Case specificity of standardized-patient examinations: Consistency of performance on components of clinical competence within and between cases.Evaluation in the Health Professions 13252–261.CrossRefGoogle Scholar
  16. Colliver, J. A., Mast, T. A., Vu, N. V.&Barrows, H. S. (1991). Sequential testing with a performance-based examination using standardized patients.Academic Medicine 66S64–S66.CrossRefGoogle Scholar
  17. Colliver, J. A., Morrison, L. J., Markwell, S. J., Verhulst, S. J., Steward, D. E., Dawson-Saunders, E.&Barrows, H. S. (1990b). Three studies of the effect of multiple standardized patients on intercase reliability of five standardized-patient examinations.Teaching and Learning in Medicine 2237–245.CrossRefGoogle Scholar
  18. Colliver, J. A., Steward, D. E., Markwell, S. J., & Marcy, M. L. (1991). Effect of repeated simulations by standardized patients on intercase reliability.Teaching and Learning in Medicine 315–19.CrossRefGoogle Scholar
  19. Colliver, J. A., Vu, N. V., Marcy, M. L., Travis, T. A., & Robbs, R. S. (1993). The effects of examinee and standardized-patient gender and their interaction on standardized-patient ratings of interpersonal and communication skills.Academic Medicine2, 153–157.CrossRefGoogle Scholar
  20. Colliver, J. A., Vu, N. V., Markwell, S. J., & Verhulst, S. J. (1991). Reliability and efficiency of components of clinical competence assessed with five performance-based examinations using standardized patients.Medical Education25, 303–310.CrossRefGoogle Scholar
  21. Des Marchais, J. E. (1993). A student-centered, problem-based curriculum: 5 years’ experience.Canadian Medical Association Journal 1481567–1572.Google Scholar
  22. Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978).Medical problem-solving: an analysis of clinical reasoning.Cambridge, MA: Harvard University Press.Google Scholar
  23. Ericsson, K. A., & Charness, N. (1994). Expert performance: its structure and acquisition.American Psychologist 49725–747.CrossRefGoogle Scholar
  24. Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning.American Psychologist 39193–202.CrossRefGoogle Scholar
  25. Gallagher, T. H., Lo, B., Chesney, M., & Christensen, K. (1997). How do physicians respond to patient’srequests for costly, unindicated services?Journal of General Internal Medicine12, 663–668.CrossRefGoogle Scholar
  26. Glass, G. V. (1978). Standards and criteria.Journal of Educational Measurement15, 237–261.CrossRefGoogle Scholar
  27. Guilford, J. P. (1965).Fundamental statistics in psychology and education.New York: McGraw-Hill, 486–489.Google Scholar
  28. Hambleton, R. K., & Powell, S. (1993). A framework for viewing the process of standard setting.Evaluation in the Health Professions6, 3–24.CrossRefGoogle Scholar
  29. Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE).Medical Education 1341–54.Google Scholar
  30. Harden, R. M., Stevenson, M., Downie, W. W., & Wilson, G. M. (1975). Assessment of clinical competence using objective structured examination.British Medical Journal1(5955), 447–451.CrossRefGoogle Scholar
  31. Hodder, R. V., Rivington, R. N., Calcutt, L. E., & Hart, I. R. (1988). The effectiveness of immediate feedback during the objective structured clinical examination.Medical Education 23184–188.CrossRefGoogle Scholar
  32. Jaegar, R. M., & Tittle, C. K. (Eds.) (1980).Minimum competency testing: Motives models measures and consequences.Berkeley, CA: McCutchan.Google Scholar
  33. Kassebaum, D. G. (1990). The measurement of outcomes in the assessment of educational program effectiveness.Academic Medicine65, 293–296.CrossRefGoogle Scholar
  34. Kassirer, J. P., & Gorry, G. A. (1978). Clinical problem-solving: A behavioral analysis.Annals of Internal Medicine89, 245–255.Google Scholar
  35. Kohn, L. T., Corrigan, J. M., & Donaldson, M. S. (Eds.) (1999).To err is human: building a safer health system.Committee on Quality of Health Care in America, Institute of Medicine. Washington, D.C.: National Academy Press.Google Scholar
  36. Linn, R. L. (Ed.) (1989).Educational measurementLondon: Collier Macmillan.Google Scholar
  37. Livingston, S. A., & Zieky, M. J. (1982).Passing scores: a manual for setting standard of performance on educational and occupational tests.Princeton, NJ: Educational Testing Service.Google Scholar
  38. Lloyd, J. S., Williams, R. G., Simonton, D. K., & Sherman, D. (1990). Order effects in standardized patient examinations.Academic Medicine 65S51–S52.CrossRefGoogle Scholar
  39. Matsell, D. G., Wolfish, N. M.&Hsu, E. (1991). Reliability and validity of the objective structured clinical examination in pediatrics.Medical Education 25293–299.CrossRefGoogle Scholar
  40. Mattem, W. D., Weinholtz, D., & Friedman, C. P. (1984). The attending physician as teacher.New England Journal of Medicine 2371129–1132.Google Scholar
  41. Maxwell J. A., Cohen, R. M., & Reinhard, J. D. (1983). A qualitative study of teaching rounds in a department of medicine.Proceedings of Annual Conference on Research in Medical Education22, 192–197.Google Scholar
  42. Morrison, L. J., & Barrows, H. S. (1994). Developing consortia for clinical practice examinations: The Macy Project.Teaching and Learning in Medicine 623–27.CrossRefGoogle Scholar
  43. Mosier, C. L. (1943). On the reliability of a weighted composite.Psychometrika 8161–168.CrossRefGoogle Scholar
  44. Newble, D. L (1988). Eight years’ experience with a structured clinical examination.Medical Education 22200–204.CrossRefGoogle Scholar
  45. Newble, D., & Jaeger, K. (1983). The effects of assessments and examinations on the learning of medical students.Medical Education 17165–171.CrossRefGoogle Scholar
  46. Newble, D. L., & Swanson, D. B. (1983). Psychometric characteristics of the objective structured clinical examination.Medical Education 22325–334.CrossRefGoogle Scholar
  47. Norcini, J. J. (1990). Equivalent pass/fail decisions.Journal of Educational Measurement27, 59–66.CrossRefGoogle Scholar
  48. Norcini, J. J. (1992). Approaches to standard setting for performance-based examinations.Proceedings of the Fifth Ottawa Conference on the Assessment of Clinical Competence.Dundee, Scotland, 33–37.Google Scholar
  49. Norcini, J. J. Jr. (1999). Standards and reliability in evaluation: when rules of thumb don’t apply.Academic Medicine 741088–1090.CrossRefGoogle Scholar
  50. Norcini, J., Stillman, P., Regan, M. B., Haley, H., Sutnick, A., Williams, R., & Friedman, M. (1992). Scoring and standard-setting with standardized patients. Presented at the annual meeting of the American Educational Research Association, San Francisco, CA.Google Scholar
  51. Norman, G. (1985). Objective measurement of clinical performance.Medical Education 1943–47.CrossRefGoogle Scholar
  52. Petrusa, E. R. (1987). The effect of number of cases on performance on a standardized multiple-stations clinical examination.Journal of Medical Education 62859–860.Google Scholar
  53. Petrusa, E. R., Blackwell, T. A., & Ainsworth, M. A. (1990). Reliability and validity of an objective structured clinical examination for assessing the clinical performance of residents.Archives of Internal Medicine 150573–577.CrossRefGoogle Scholar
  54. Petrusa, E. R., Blackwell, T. A., Carline, J., Ramsey, P. G., McGaghie, W. C., Colindres, R., Kowlowitz, V., Mast, T. A., & Soler, N. (1991). A multi-institutional trial of an objective structured clinical examination.Teaching and Learning in Medicine 386–94.CrossRefGoogle Scholar
  55. Petrusa, E. R., Hales, J. W., Wake, L., Harward, D. H., Hoban, D., & Willis, S. (2000). Prediction accuracy and financial savings for four screening tests of a sequential test of clinical performance.Teaching and Learning in Medicine 124–13.CrossRefGoogle Scholar
  56. Petrusa, E. R., Guckian, J. C.&Perkowski, L. C. (1984). A multiple station objective clinical evaluation.Proceedings of the Twenty-third Annual Conference on Research in Medical Education 23211–216.Google Scholar
  57. Petrusa, E. R., Richards, B., Willis, S., Smith, A., Harward, D., & Camp, M.G. (1994). Criterion referenced pass marks for a clinical performance examination. Presented at the annual meeting of the Association of American Medical Colleges, Washington, DC.Google Scholar
  58. Poldre, P. A., Rothman, A. I., Cohen, R., Dirks, F., & Ross, J. A. (1992). Judgmental-empirical approach to standard setting for an OSCE. Presented at the annual meeting of the American Educational Research Association, San Francisco, CA.Google Scholar
  59. Rethans, J. J., Drop, R., Sturmans, F., & Van der Vleuten, C. (1991). A method for introducing standardized (simulated) patients into general practice consultations.British Journal of General Practice 4194–96.Google Scholar
  60. Reznick, R., Smee, S., Rothman, A., Chalmers, A., Swanson, D., Dufresne, L., Lacombe, G., Baumber, J., Poldre, P., & Levasseur, L. (1992). An objective structured clinical examination for the licentiate: report of the pilot project of the Medical Council of Canada.Academic Medicine67, 487–494.CrossRefGoogle Scholar
  61. Roloff, M. E., & Miller, G. R. (1987).Interpersonal processes. New directions in communication research.Newbury Park, CA: Sage Publications.Google Scholar
  62. Ross, J. R., Syal, S., Hutcheon, M. A., & Cohen, R. (1987). Second-year students’ score improvement during an objective structured clinical examination.Journal of Medical Education 62857–858.Google Scholar
  63. Rothman, A. I., Cohen, R., Dirks, F. R., & Ross, J. (1990). Evaluating the clinical skills of foreign medical school graduates participating in an internship preparation program.Academic Medicine 65391–395.CrossRefGoogle Scholar
  64. Rothman, A., Poldre, P., Cohen, R., & Ross, J. (1993).Standard setting in a multiple station test of clinical skills. Presented at the annual meeting of the American Educational Research Association.Google Scholar
  65. Rutala, P. J., Witzke, D. B., Leko, E. O., & Fulginiti, J. V. (1990). The influence of student and standardized-patient genders on scoring in an objective structured clinical examination.Academic Medicine66, S28–S30.CrossRefGoogle Scholar
  66. Rutala, P. J., Witzke, D. B., Leko, E. E., Fulginiti, J. V., & Taylor, P. J. (1990). Student fatigue as a variableaffecting performance in an objective structured clinical examination.Academic Medicine65, S53–S54.CrossRefGoogle Scholar
  67. Shatzer, J. H., Wardrop, J. L., Williams, R. G., & Hatch, T. F. (1994). The generalizability of performance on different-station-length standardized patient cases.Teaching and Learning in Medicine 654–53.CrossRefGoogle Scholar
  68. Shatzer, J. H., DaRosa, D., Colliver, J. A., & Barkmeier, L. (1993). Station-length requirements for reliable performance-based examination scores.Academic Medicine 68224–229.CrossRefGoogle Scholar
  69. Stillman, P. L., Haley, H. L., Regan, M. B.&Philbin, M. M. (1991a). Positive effects of a clinical performance assessment program.Academic Medicine 66481–483.CrossRefGoogle Scholar
  70. Stillman, P. L., Regan, M. B., Swanson, D. B., Case, S., McCahan, J., Feinblatt, J., Smith, S. R., Williams, J., & Nelson, D. V. (1990). An assessment of the clinical skills of fourth-year students at four New England medical schools.Academic Medicine 65329–326.Google Scholar
  71. Stillman, P., Swanson, D., Regan, M. B., Philbin, M. M., Nelson, V., Ebert, T., Ley, B., Parrino, T., Shorey, J., & Stillman, A. (1991b). Assessment of clinical skills of residents utilizing standardized patients. A follow-up study and recommendations for application.Annals of Internal Medicine 114393–401.Google Scholar
  72. Subkoviak, M. J. (1976). Estimating reliability from a single administration of a mastery test.Journal of Educational Measurement 13265–276.CrossRefGoogle Scholar
  73. Swanson, D. B., & Norcini, J. J. (1989). Factors influencing the reproducibility of tests using standardized patients.Teaching and Learning in Medicine 1158–166.CrossRefGoogle Scholar
  74. Swartz, M. H., Colliver, J. A., Bardes, C. L., Charon, R., Fried, E. D., & Moroff, S. (1999). Global ratings of videotaped performance versus global rating of actions recorded on checklists: a criterion for performance assessment with standardized patients.Academic Medicine 741028–1032.CrossRefGoogle Scholar
  75. Tamblyn, R. M., Klass, D. J., Schnabl, G. K., & Kopelow, M. L. (1991). The accuracy of standardized patient presentation.Medical Education 25100–109.CrossRefGoogle Scholar
  76. Van der Vleuten, C. P. M. (1996). The assessment of professional competence: developments, research and practical implications.Advances in Health Sciences Education 141–67.CrossRefGoogle Scholar
  77. Van der Vleuten, C. P. M., & Swanson, D. B. (1990). Assessment of clinical skills with standardized patients: state of the art.Teaching and Learning in Medicine2, 58–76.CrossRefGoogle Scholar
  78. Vu, N. V., & Barrows, H. S. (1994). Use of standardized patients in clinical assessments: recent developments and measurement findings.Educational Researcher 2323–30.Google Scholar
  79. Vu, N. V., Barrows, H. S., March, M. L., Verhulst, S. J., Colliver, J. A.&Travis, T. (1992). Six years of comprehensive, clinical performance-based assessment using standardized patients at the Southern Illinois University School of Medicine.Academic Medicine 6743–50.CrossRefGoogle Scholar
  80. Williams, R. G., Barrows, H. S., Vu, N. V., Verhulst, S. J., Colliver, J. A., Marcy, M., & Steward, D. (1987). Direct, standardized assessment of clinical competence.Medical Education 21482–489.CrossRefGoogle Scholar
  81. Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence.Journal of Educational Measurement 30187–213.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2002

Authors and Affiliations

  • Emil R. Petrusa
    • 1
  1. 1.Duke University School of MedicineUSA

Personalised recommendations