Abstract
We propose a novel approach to automatically recognize the presence of surgical tools in surgical videos, which is quite challenging due to the large variation and partially appearance of surgical tools, the complicated surgical scenes, and the co-occurrence of some tools in the same frame. Inspired by human visual attention mechanism, which first orients and selects some important visual cues and then carefully analyzes these focuses of attention, we propose to first leverage a global prediction network to obtain a set of visual attention maps and a global prediction for each tool, and then harness a local prediction network to predict the presence of tools based on these attention maps. We apply a gate function to obtain the final prediction results by balancing the global and the local predictions. The proposed attention-guided network (AGNet) achieves state-of-the-art performance on m2cai16-tool dataset and surpasses the winner in 2016 by a significant margin.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Letouzey, A., Decrouez, M., Agustinos, A., Voros, S.: Instruments localisation and identification for laparoscopic surgeries (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Letouzey-Tool.pdf
Luo, H., Hu, Q., Jia, F.: Surgical tool detection via multiple convolutional neural networks (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Luo-Tool.pdf
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Raju, A., Wang, S., Huang, J.: M2CAI surgical tool detection challenge report (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Raju-Tool.pdf
Rosen, M.L., Stern, C.E., Michalka, S.W., Devaney, K.J., Somers, D.C.: Cognitive Control Network Contributions to Memory-Guided Visual Attention. Cerebral Cortex, New York (2015). bhv028
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Sahu, M., Mukhopadhyay, A., Szengel, A., Zachow, S.: Tool and phase recognition using contextual CNN features. arXiv preprint arXiv:1610.08854 (2016)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)
Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for tool presence detection challenge at M2CAI 2016. arXiv preprint arXiv:1610.08851 (2016)
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)
Zia, A., Castro, D., Essa, I.: Fine-tuning deep architectures for surgical tool detection (2016). http://camma.u-strasbg.fr/m2cai2016/reports/Zia-Tool.pdf
Acknowledgements
The work described in this paper was supported by the following grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 14202514 and CUHK 14203115).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hu, X., Yu, L., Chen, H., Qin, J., Heng, PA. (2017). AGNet: Attention-Guided Network for Surgical Tool Presence Detection. In: Cardoso, M., et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support . DLMIA ML-CDS 2017 2017. Lecture Notes in Computer Science(), vol 10553. Springer, Cham. https://doi.org/10.1007/978-3-319-67558-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-67558-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67557-2
Online ISBN: 978-3-319-67558-9
eBook Packages: Computer ScienceComputer Science (R0)