Weakly-Supervised Learning for Tool Localization in Laparoscopic Videos

  • Armine VardazaryanEmail author
  • Didier Mutter
  • Jacques Marescaux
  • Nicolas Padoy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11043)


Surgical tool localization is an essential task for the automatic analysis of endoscopic videos. In the literature, existing methods for tool localization, tracking and segmentation require training data that is fully annotated, thereby limiting the size of the datasets that can be used and the generalization of the approaches. In this work, we propose to circumvent the lack of annotated data with weak supervision. We propose a deep architecture, trained solely on image level annotations, that can be used for both tool presence detection and localization in surgical videos. Our architecture relies on a fully convolutional neural network, trained end-to-end, enabling us to localize surgical tools without explicit spatial annotations. We demonstrate the benefits of our approach on a large public dataset, Cholec80, which is fully annotated with binary tool presence information and of which 5 videos have been fully annotated with bounding boxes and tool centers for the evaluation.


Surgical tool localization Endoscopic videos Weakly-supervised learning Cholec80 



This work was supported by French state funds managed within the Investissements d’Avenir program by BPI France (project CONDOR) and by the ANR (references ANR-11-LABX-0004 and ANR-10-IAHU-02). The authors would also like to acknowledge the support of NVIDIA with the donation of a GPU used in this research.


  1. 1.
    Bouget, D., Allan, M., Stoyanov, D., Jannin, P.: Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Med. Image Anal. 35, 633–654 (2017)CrossRefGoogle Scholar
  2. 2.
    Durand, T., Mordan, T., Thome, N., Cord, M.: Wildcat: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5957–5966 (2017)Google Scholar
  3. 3.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  4. 4.
    Garcia-Peraza-Herrera, L.C., et al.: Toolnet: holistically-nested real-time segmentation of robotic surgical tools. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (2017)Google Scholar
  5. 5.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  6. 6.
    Hwang, S., Kim, H.-E.: Self-transfer learning for weakly supervised lesion localization. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016, Part II. LNCS, vol. 9901, pp. 239–246. Springer, Cham (2016). Scholar
  7. 7.
    Jia, Z., Huang, X., Chang, E.I.C., Xu, Y.: Constrained deep weak supervision for histopathology image segmentation. IEEE Trans. Med. Imaging 36(11), 2376–2388 (2017)CrossRefGoogle Scholar
  8. 8.
    Jin, A., et al.: Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 691–699 (2018)Google Scholar
  9. 9.
    Kim, D., Cho, D., Yoo, D.: Two-phase learning for weakly supervised object localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 3554–3563 (2017)Google Scholar
  10. 10.
    Kurmann, T., et al.: Simultaneous recognition and pose estimation of instruments in minimally invasive surgery. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017, Part II. LNCS, vol. 10434, pp. 505–513. Springer, Cham (2017). Scholar
  11. 11.
    Laina, I., et al.: Concurrent segmentation and localization for tracking of surgical instruments. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017, Part II. LNCS, vol. 10434, pp. 664–672. Springer, Cham (2017). Scholar
  12. 12.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 685–694 (2015)Google Scholar
  13. 13.
    Sahu, M., Mukhopadhyay, A., Szengel, A., Zachow, S.: Addressing multi-label imbalance problem of surgical tool detection using CNN. Int. J. Comput. Assisted Radiol. Surg. 12(6), 1013–1020 (2017)CrossRefGoogle Scholar
  14. 14.
    Saleh, F.S., Aliakbarian, M.S., Salzmann, M., Petersson, L., Alvarez, J.M., Gould, S.: Incorporating network built-in priors in weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1382–1396 (2018)CrossRefGoogle Scholar
  15. 15.
    Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  16. 16.
    Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)CrossRefGoogle Scholar
  17. 17.
    Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Armine Vardazaryan
    • 1
    Email author
  • Didier Mutter
    • 2
  • Jacques Marescaux
    • 2
  • Nicolas Padoy
    • 1
  1. 1.ICube, University of Strasbourg, CNRS, IHU StrasbourgStrasbourgFrance
  2. 2.University Hospital of Strasbourg, IRCAD, IHU StrasbourgStrasbourgFrance

Personalised recommendations