Crowdsourcing Annotation of Surgical Instruments in Videos of Cataract Surgery

  • Tae Soo Kim
  • Anand Malpani
  • Austin Reiter
  • Gregory D. Hager
  • Shameema SikderEmail author
  • S. Swaroop Vedula
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11043)


Automating objective assessment of surgical technical skill is necessary to support training and professional certification at scale, even in settings with limited access to an expert surgeon. Likewise, automated surgical activity recognition can improve operating room workflow efficiency, teaching and self-review, and aid clinical decision support systems. However, current supervised learning methods to do so, rely on large training datasets. Crowdsourcing has become a standard in curating such large training datasets in a scalable manner. The use of crowdsourcing in surgical data annotation and its effectiveness has been studied only in a few settings. In this study, we evaluated reliability and validity of crowdsourced annotations for information on surgical instruments (name of instruments and pixel location of key points on instruments). For 200 images sampled from videos of two cataract surgery procedures, we collected 9 independent annotations per image. We observed an inter-rater agreement of 0.63 (Fleiss’ kappa), and an accuracy of 0.88 for identification of instruments compared against an expert annotation. We obtained a mean pixel error of 5.77 pixels for annotation of instrument tip key points. Our study shows that crowdsourcing is a reliable and accurate alternative to expert annotations to identify instruments and instrument tip key points in videos of cataract surgery.



Wilmer Eye Institute Pooled Professor’s Fund and grant to Wilmer Eye Institute from Research to Prevent Blindness.


  1. 1.
    Vedula, S., Ishii, M., Hager, G.: Objective assessment of surgical technical skill and competency in the operating room. Ann. Rev. Biomed. Eng. 21(19), 301–325 (2017)CrossRefGoogle Scholar
  2. 2.
    Puri, S., Kiely, A., Wang, J., Woodfield, A., Ramanathan, S., Sikder, S.: Comparing resident cataract surgery outcomes under novice versus experienced attending supervision. Clin. Opthalmology 9, 1675–1681 (2015)Google Scholar
  3. 3.
    Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369(15), 1434–1442 (2013)CrossRefGoogle Scholar
  4. 4.
    Forestier, G., Petitjean, F., Senin, P., Riffaud, L., Hénaux, P., Jannin, P.: Finding discriminative and interpretable patterns in sequences of surgical activities. Artif. Intell. Med. 82, 11–19 (2017). Scholar
  5. 5.
    Gao, Y., et al.: The JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In. Modeling and Monitoring of Computer Assisted Interventions (M2CAI), MICCAI (2014)Google Scholar
  6. 6.
    Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery. ArXiv e-prints, May 2018Google Scholar
  7. 7.
    Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692–699. Springer, Cham (2014). Scholar
  8. 8.
    Rieke, N., et al.: Surgical tool tracking and pose estimation in retinal microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part I. LNCS, vol. 9349, pp. 266–273. Springer, Cham (2015). Scholar
  9. 9.
    Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic surgical tools. Int. J. Robot. Res. 33(2), 342–356 (2014). Scholar
  10. 10.
    Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: A study of crowdsourced segment-level surgical skill assessment using pairwise rankings. Int. J. Comput. Assist. Radiol. Surg. 10, 1435–1447 (2015)CrossRefGoogle Scholar
  11. 11.
    Maier-Hein, L., et al.: Can masses of non-experts train highly accurate image classifiers? A crowdsourcing approach to instrument segmentation in laparoscopic images. Med. Image Comput. Comput. Assist. Interv. 17(Pt 2), 438–445 (2014)Google Scholar
  12. 12.
    Little, G., Chilton, L.B., Goldman, M., Miller, R.C.: Turkit: tools for iterative tasks on mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2009, pp. 29–30. ACM, New York (2009).
  13. 13.
    Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(3), 613–619 (1973). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Tae Soo Kim
    • 1
  • Anand Malpani
    • 2
  • Austin Reiter
    • 1
  • Gregory D. Hager
    • 1
    • 2
  • Shameema Sikder
    • 3
    Email author
  • S. Swaroop Vedula
    • 2
  1. 1.Department of Computer ScienceJohns Hopkins UniversityBaltimoreUSA
  2. 2.The Malone Center for Engineering in HealthcareJohns Hopkins UniversityBaltimoreUSA
  3. 3.Wilmer Eye InstituteJohns Hopkins University School of MedicineBaltimoreUSA

Personalised recommendations