Academic Psychiatry

, Volume 43, Issue 2, pp 151–156 | Cite as

Inflated Clinical Evaluations: a Comparison of Faculty-Selected and Mathematically Calculated Overall Evaluations Based on Behaviorally Anchored Assessment Data

  • Eric G. MeyerEmail author
  • Kelly L. Cozza
  • Riley M. R. Konara
  • Derrick Hamaoka
  • James C. West
Empirical Report



This retrospective study compared faculty-selected evaluation scores with those mathematically calculated from behaviorally anchored assessments.


Data from 1036 psychiatry clerkship clinical evaluations (2012–2015) was reviewed. These clinical evaluations required faculty to assess clinical performance using 14 behaviorally anchored questions followed by a faculty-selected overall evaluation. An explicit rubric was included in the overall evaluation to assist the faculty in interpreting their 14 assessment responses. Using the same rubric, mathematically calculated evaluations of the same assessment responses were generated and compared to the faculty-selected evaluations.


Comparison of faculty-selected to mathematically calculated evaluations revealed that while the two methods were reliably correlated (Cohen’s kappa = 0.314, Pearson’s coefficient = 0.658, p < 0.001), there was a notable difference in the results (t = 24.5, p < 0.0001). The average faculty-selected evaluation was 1.58 (SD = 0.61) with a mode of “1” or “outstanding,” while the mathematically calculated evaluation had an average of 2.10 (SD = 0.90) with a mode of “3” or “satisfactory.” 51.0% of the faculty-selected evaluations matched the mathematically calculated results: 46.1% were higher and 2.9% were lower.


Clerkship clinical evaluation forms that require faculty to make an overall evaluation generate results that are significantly higher than what would have been assigned solely using behavioral anchored assessment questions. Focusing faculty attention on assessing specific behaviors rather than overall evaluations may reduce this inflation and improve validity. Clerkships may want to consider removing overall evaluation questions from their clinical evaluation tools.


Competency Evaluation Assessment UME Medical student 



We wish to thank the faculty and students of the USUHS Psychiatry Clerkship.

Compliance with Ethical Standards

This research protocol was reviewed and declared exempt by the USUHS institutional review board (IRB) in accordance with all applicable Federal regulations governing the protection of animals in research. (PSY-88-9072).


The opinions and assertions expressed herein are those of the author(s) and do not necessarily reflect the official policy or position of the Uniformed Services University or the Department of Defense.

None of the authors have a financial relationship with any entity producing, marketing, re-selling, or distributing healthcare goods or services consumed by, or used on, patients.

This work was prepared by military employees of the US Government as part of the individual’s official duties and therefore is in the public domain and does not possess copyright protection (public domain information may be freely distributed and copied; however, as a courtesy it is requested that the Uniformed Services University and the author be given an appropriate acknowledgement).


  1. 1.
    Roman BJ, Trevino J. An approach to address grade inflation in a psychiatry clerkship. Acad Psychiatry. 2006;30(2):110–5.CrossRefGoogle Scholar
  2. 2.
    Cooke M, Irby DM, O’Brien BC. Educating physicians: a call for reform of medical school and residency. John Wiley & Sons; 2010.Google Scholar
  3. 3.
    Fazio SB, Papp KK, Torre DM, Defer TM. Grade inflation in the internal medicine clerkship: a national survey. Teach Learn Med. 2013;25(1):71–6.CrossRefGoogle Scholar
  4. 4.
    Bowen RE, Grant WJ, Schenarts KD. The sum is greater than its parts: clinical evaluations and grade inflation in the surgery clerkship. Am J Surg. 2015;209(4):760–4.CrossRefGoogle Scholar
  5. 5.
    Pangaro LN, McGaghie WC, editors. Handbook on medical student evaluation and assessment. Gegensatz Press; 2015.Google Scholar
  6. 6.
    Epstein RM. Assessment in medical education. N Engl J Med. 2007;356(4):387–96.CrossRefGoogle Scholar
  7. 7.
    Swaffield S. The misrepresentation of Assessment for Learning – and the woeful waste of a wonderful opportunity. In: The 20th Annual Conference of the Association for Achievement and Improvement through Assessment (AAIA), Bournemouth. 2009. Accessed 6 Jun 2018.
  8. 8.
    Pangaro L. A new vocabulary and other innovations for improving descriptive in-training evaluations. Acad Med. 1999;74(11):1203–7.CrossRefGoogle Scholar
  9. 9.
    Holmes AV, Peltier CB, Hanson JL, Lopreiato JO. Writing medical student and resident performance evaluations: beyond “performed as expected”. Pediatrics. 2014;133(5):766–8.CrossRefGoogle Scholar
  10. 10.
    Ohland MW, Layton RA, Loughry ML, Yuhasz AG. Effects of behavioral anchors on peer evaluation reliability. J Eng Educ. 2005;94(3):319–26.CrossRefGoogle Scholar
  11. 11.
    Holmboe ES, Ward DS, Reznick RK, Katsufrakis PJ, Leslie KM, Patel VL, et al. Faculty development in assessment: the missing link in competency-based medical education. Acad Med. 2011;86(4):460–7.CrossRefGoogle Scholar
  12. 12.
    Ben-David MF. AMEE guide no. 18: standard setting in student assessment. Med Teach. 2000;22(2):120–30.CrossRefGoogle Scholar
  13. 13.
    Jonsson A, Svingby G. The use of scoring rubrics: reliability, validity and educational consequences. Educ Res Rev. 2007;2(2):130–44.CrossRefGoogle Scholar
  14. 14.
    Mclaughlin K, Vitale G, Coderre S, Violato C, Wright B. Clerkship evaluation–what are we measuring? Med Teach. 2009;31(2):e36–9.CrossRefGoogle Scholar
  15. 15.
    Grover S, Swisher-McClure S, Sosnowicz S, Li J, Mitra N, Berman AT, et al. Grade inflation in medical student radiation oncology clerkships: missed opportunities for feedback? Int J Radiat Oncol Biol Phys. 2015;92(4):740–4.CrossRefGoogle Scholar
  16. 16.
    Battistone MJ, Pendleton B, Milne C, Battistone ML, Sande MA, Hemmer PA, et al. Global descriptive evaluations are more responsive than global numeric ratings in detecting Students’ progress during the inpatient portion of an internal medicine clerkship. Acad Med. 2001;76(10):S105–7.CrossRefGoogle Scholar
  17. 17.
    Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med. 1998;73(9):993–7.CrossRefGoogle Scholar
  18. 18.
    Devine LA, Stroud L, Gupta R, Lorens E, Robertson S, Panisko D. Does making the numerical values of verbal anchors on a rating scale available to examiners inflate scores on a long case examination? Acad Med. 2016;91(1):127–32.CrossRefGoogle Scholar
  19. 19.
    Riese A, Rappaport L, Alverson B, Park S, Rockney RM. Clinical performance evaluations of third-year medical students and association with student and evaluator gender. Acad Med. 2017;92(6):835–40.CrossRefGoogle Scholar
  20. 20.
    Holmboe ES, Edgar L, Hamstra S. The milestones guidebook. Accreditation Council for Graduate Medical Education, 2016.Google Scholar
  21. 21.
    Chen HC, van den Broek WE, ten Cate O. The case for use of entrustable professional activities in undergraduate medical education. Acad Med. 2015;90(4):431–6.CrossRefGoogle Scholar
  22. 22.
    van der Vleuten CP, Schuwirth LW, Driessen EW, Dijkstra J, Tigelaar D, Baartman LK, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012;34(3):205–14.CrossRefGoogle Scholar

Copyright information

© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply  2018

Authors and Affiliations

  1. 1.Uniformed Services University of the Health SciencesBethesdaUSA
  2. 2.Reynolds Army Community HospitalFort SillUSA

Personalised recommendations