Automated Surgical Activity Recognition with One Labeled Sequence

  • Robert DiPietroEmail author
  • Gregory D. Hager
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11768)


Prior work has demonstrated the feasibility of automated activity recognition in robot-assisted surgery from motion data. However, these efforts have assumed the availability of a large number of densely-annotated sequences, which must be provided manually by experts. This process is tedious, expensive, and error-prone. In this paper, we present the first analysis under the assumption of scarce annotations, where as little as one annotated sequence is available for training. We demonstrate feasibility of automated recognition in this challenging setting, and we show that learning representations in an unsupervised fashion, before the recognition phase, leads to significant gains in performance. In addition, our paper poses a new challenge to the community: how much further can we push performance in this important yet relatively unexplored regime?


Surgical activity recognition Gesture recognition Maneuver recognition Semi-supervised learning 



This work was supported by a fellowship for modeling, simulation, and training from the Link Foundation. We also thank Anand Malpani, Madeleine Waldram, Swaroop Vedula, Gyusung I. Lee, and Mija R. Lee for procuring the MISTIC-SL dataset. The procurement of MISTIC-SL was supported by the Johns Hopkins Science of Learning Institute.


  1. 1.
    Ahmidi, N., et al.: A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans. Biomed. Eng. 64(9), 2025–2041 (2017)CrossRefGoogle Scholar
  2. 2.
    Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. New Engl. J. Med. 369(15), 1434–1442 (2013)CrossRefGoogle Scholar
  3. 3.
    Bishop, C.M.: Mixture density networks. Technical report, Aston University (1994)Google Scholar
  4. 4.
    Bodenstedt, S., et al.: Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis. arXiv preprint arXiv:1702.03684 (2017)
  5. 5.
    Chen, Z., et al.: Virtual fixture assistance for needle passing and knot tying. In: Intelligent Robots and Systems (IROS), pp. 2343–2350 (2016)Google Scholar
  6. 6.
    DiPietro, R., et al.: Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int. J. Comput. Assist. Radiol. Surg. (2019)Google Scholar
  7. 7.
    DiPietro, R., Hager, G.D.: Unsupervised learning for surgical motion by learning to predict the future. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 281–288. Springer, Cham (2018). Scholar
  8. 8.
    DiPietro, R., et al.: Recognizing surgical activities with recurrent neural networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9900, pp. 551–558. Springer, Cham (2016). Scholar
  9. 9.
    Gao, Y., Vedula, S.S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Query-by-example surgical activity detection. Int. J. Comput. Assist. Radiol. Surg. 11(6), 987–996 (2016)CrossRefGoogle Scholar
  10. 10.
    Gao, Y., et al.: Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and Monitoring of Computer Assisted Interventions (2014)Google Scholar
  11. 11.
    Gao, Y., Vedula, S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)Google Scholar
  12. 12.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)CrossRefGoogle Scholar
  13. 13.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  14. 14.
    Jacobs, D.M., Poenaru, D. (eds.): Surgical Educators’ Handbook. Association for Surgical Education, Los Angeles (2001)Google Scholar
  15. 15.
    Reiley, C.E., Akinbiyi, T., Burschka, D., Chang, D.C., Okamura, A.M., Yuh, D.D.: Effects of visual force feedback on robot-assisted surgical task performance. J. Thorac. Cardiovasc. Surg. 135(1), 196–202 (2008)CrossRefGoogle Scholar
  16. 16.
    Vedula, S.S., Malpani, A., Ahmidi, N., Khudanpur, S., Hager, G., Chen, C.C.G.: Task-level vs. segment-level quantitative metrics for surgical skill assessment. J. Surg. Educ. 73(3), 482–489 (2016)CrossRefGoogle Scholar
  17. 17.
    Yengera, G., Mutter, D., Marescaux, J., Padoy, N.: Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 (2018)
  18. 18.
    Yu, T., Mutter, D., Marescaux, J., Padoy, N.: Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv preprint arXiv:1812.00033 (2018)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations