European Archives of Oto-Rhino-Laryngology

, Volume 275, Issue 6, pp 1649–1655 | Cite as

Procedure-specific assessment tool for flexible pharyngo-laryngoscopy: gathering validity evidence and setting pass–fail standards

  • Jacob Melchiors
  • K. Petersen
  • T. Todsen
  • A. Bohr
  • Lars Konge
  • Christian von Buchwald



The attainment of specific identifiable competencies is the primary measure of progress in the modern medical education system. The system, therefore, requires a method for accurately assessing competence to be feasible. Evidence of validity needs to be gathered before an assessment tool can be implemented in the training and assessment of physicians. This evidence of validity must according to the contemporary theory on validity be gathered from specific sources in a structured and rigorous manner. The flexible pharyngo-laryngoscopy (FPL) is central to the otorhinolaryngologist. We aim to evaluate the flexible pharyngo-laryngoscopy assessment tool (FLEXPAT) created in a previous study and to establish a pass–fail level for proficiency.


Eighteen physicians with different levels of experience (novices, intermediates, and experienced) were recruited to the study. Each performed an FPL on two patients. These procedures were video recorded, blinded, and assessed by two specialists. The score was expressed as the percentage of a possible max score. Cronbach’s α was used to analyze internal consistency of the data, and a generalizability analysis was performed. The scores of the three different groups were explored, and a pass–fail level was determined using the contrasting groups’ standard setting method.


Internal consistency was strong with a Cronbach’s α of 0.86. We found a generalizability coefficient of 0.72 sufficient for moderate stakes assessment. We found a significant difference between the novice and experienced groups (p < 0.001) and strong correlation between experience and score (Pearson’s r = 0.75). The pass/fail level was established at 72% of the maximum score. Applying this pass–fail level in the test population resulted in half of the intermediary group receiving a failing score.


We gathered validity evidence for the FLEXPAT according to the contemporary framework as described by Messick. Our results support a claim of validity and are comparable to other studies exploring clinical assessment tools. The high rate of physicians underperforming in the intermediary group demonstrates the need for continued educational intervention.


Based on our work, we recommend the use of the FLEXPAT in clinical assessment of FPL and the application of a pass–fail level of 72% for proficiency.


Flexible laryngoscopy Assessment tool Medical education Validity Technical skills Mastery learning 



The authors would like to thank the Olympus company (Tokyo, Japan) for the generous lending of a flexible video laryngoscope for the duration of the study.


The endoscope used in the gathering of data was generously supplied by Olympus (Tokyo, Japan). No funding was provided for the completion of this study.

Compliance with ethical standards

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflict of interest

All authors declare that no conflicts of interest exist.

Supplementary material

405_2018_4971_MOESM1_ESM.jpg (11.5 mb)
Supplementary material 1 (JPG 11740 KB)


  1. 1.
    Sethi RKV, Kozin ED, Remenschneider AK, Lee DJ, Gray ST, Shrime MG et al (2014) Subspecialty emergency room as alternative model for otolaryngologic care: implications for emergency health care delivery. Am J Otolaryngol Head Neck Surg 35:758–765Google Scholar
  2. 2.
    Couch ME (2010) Cummings otolaryngology—head and neck surgery, 5th edn. Elsevier, MosbyGoogle Scholar
  3. 3.
    Reznick RK (1993) Teaching and testing technical skills. Am J Surg 165(3):358–361CrossRefPubMedGoogle Scholar
  4. 4.
    Epstein RM (2007) Assessment in medical education. N Engl J Med 356(4):387–396CrossRefPubMedGoogle Scholar
  5. 5.
    Ericsson KA (2008) Deliberate practice and acquisition of expert performance: a general overview. Acad Emerg Med 15(11):988–994CrossRefPubMedGoogle Scholar
  6. 6.
    McGaghie WC, Issenberg SB, Cohen ER, Barsuk JH, Wayne DB (2011) Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Acad Med 86(6):706–711CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Mcgaghie WC (2015) Mastery learning: it is time for medical education to join the 21st century. Acad Med 90(11):1438–1441CrossRefPubMedGoogle Scholar
  8. 8.
    Lineberry M, Soo Park Y, Cook DA, Yudkowsky R (2015) Making the case for mastery learning assessments. Acad Med 90(November):1445–1450CrossRefPubMedGoogle Scholar
  9. 9.
    Schuwirth LWT, Vleuten CPM, Van Der (2011) General overview of the theories used in assessment: AMEE Guide No. 57. Med Teach 33:783–797CrossRefPubMedGoogle Scholar
  10. 10.
    Messick S (1989) Validity. In: Linn RL (ed) Educational measurement, 3rd edn. American Counsel on Education and Macmillan, New YorkGoogle Scholar
  11. 11.
    Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R (2013) What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Heal Sci Educ 19:1–18Google Scholar
  12. 12.
    Downing S, Yudkowsky R (2009) Assessment in health professions education. Routledge, New YorkGoogle Scholar
  13. 13.
    Melchiors J, Hendriksen M, Charabi B, Konge L, Buchwald C (2018) Diagnostic flexible pharyngo-laryngoscopy: development of a procedure specific assessment tool using a Delphi methodology. Eur Arch Otorhinolaryngol. (accepted for publication)CrossRefPubMedGoogle Scholar
  14. 14.
    Subhi Y, Todsen T, Konge L (2014) An integrable, web-based solution for easy assessment of video-recorded performances. Adv Med Educ Pract 5:103–105CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Bloch R, Norman G (2012) Generalizability theory for the perplexed: a practical introduction and guide: AMEE Guide No. 68. Med Teach 34(11):960–992CrossRefPubMedGoogle Scholar
  16. 16.
    Andersen SAW, Foghsgaard S, Konge L, Cayé-Thomasen P, Sørensen MS (2016) The effect of self-directed virtual reality simulation on dissection training performance in mastoidectomy. Laryngoscope 126(8):1883–1888CrossRefPubMedGoogle Scholar
  17. 17.
    Konge L, Larsen KR, Clementsen P, Arendrup H, von Buchwald C, Ringsted C (2012) Reliable and valid assessment of clinical bronchoscopy performance. Respiration 83(1):53–60CrossRefPubMedGoogle Scholar
  18. 18.
    Ilgen JS, Ma IWY, Hatala R, Cook DA (2015) A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Med Educ 49:161–173CrossRefPubMedGoogle Scholar
  19. 19.
    Hodges B (2013) Assessment in the post-psychometric era: learning to love the subjective and collective. Med Teach 35(7):564–568CrossRefPubMedGoogle Scholar
  20. 20.
    Albanese MA (2000) Challenges in using rater judgements in medical education. J Eval Clin Pract 6(3):305–319CrossRefPubMedGoogle Scholar
  21. 21.
    Streiner DL, Norman GR (2008) Health measurement scales: a practical guide to their development and use, 4th edn. Oxford University Press, OxfordCrossRefGoogle Scholar
  22. 22.
    Barton JR, Corbett S, Van Der Vleuten CP (2012) The validity and reliability of a direct observation of procedural skills assessment tool: assessing colonoscopic skills of senior endoscopists. Gastrointest Endosc 75(3):591–597CrossRefPubMedGoogle Scholar
  23. 23.
    Melchiors J, Todsen T, Nilsson P, Wennervaldt K, Charabi B, Bøttger M et al (2015) Preparing for emergency: a valid, reliable assessment tool for emergency cricothyroidotomy skills. Otolaryngol Head Neck Surg 152(2):260–265CrossRefPubMedGoogle Scholar
  24. 24.
    Todsen T, Tolsgaard MG, Olsen BH, Henriksen BM, Hillingsø JG, Konge L et al (2014) Reliable and valid assessment of point-of-care ultrasonography. Ann Surg 0(0):1–7Google Scholar
  25. 25.
    Ishman SL, Brown DJ, Boss EF, Skinner ML, Tunkel DE, Stavinoha R et al (2010) Development and pilot testing of an operative competency assessment tool for pediatric direct laryngoscopy and rigid bronchoscopy. Laryngoscope 120(11):2294–2300CrossRefPubMedGoogle Scholar
  26. 26.
    Magill RA, Anderson D (2014) Motor learning and control: concepts and applications, 10th edn. Mcgraw-Hill Education, New YorkGoogle Scholar
  27. 27.
    McKinley DW, Norcini JJ (2014) How to set standards on performance-based examinations: AMEE Guide No. 85. Med Teach 36(2):97–110CrossRefPubMedGoogle Scholar
  28. 28.
    Livingston SA, Zieky MJ (1982) Passing scores: a manual for setting standards of performance. Educational Testing Service, PrincetonGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Otorhinolaryngology-Head and Neck Surgery and AudiologyRigshospitaletCopenhagen EDenmark
  2. 2.Copenhagen Academy for Medical Education and SimulationCopenhagenDenmark
  3. 3.Department of Otorhinolaryngology-Head and Neck SurgeryAarhus University HospitalAarhusDenmark

Personalised recommendations