Tracking of Retinal Microsurgery Tools Using Late Fusion of Responses from Convolutional Neural Network over Pyramidally Decomposed Frames

  • Kaustuv Mishra
  • Rachana Sathish
  • Debdoot SheetEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10481)


Computer vision and robotic assistance are increasingly being used to improve the quality of surgical interventions. Tool tracking becomes critical in interventions viz. endoscopy, laparoscopy and retinal microsurgery (RM) where unlike open surgery the surgeons do not have direct visual and physical access to the surgical site. RM is performed using miniaturized tools and requires careful observation through a surgical microscope by the surgeon. Tracking of surgical tools primarily provides robotic assistance during surgery and also serves as a means to assess the quality of surgery, which is extremely useful during surgical training. In this paper we propose a deep learning based visual tracking of surgical tool using late fusion of responses from convolutional neural network (CNN) which comprises of 3 steps: (i) training of CNN for localizing the tool tip on a frame (ii) coarsely estimating the tool tip region using the trained CNN and (iii) a finer search around the estimated region to accurately localize the tool tip. Scale invariant tracking of tool is ensured by incorporating multi-scale late fusion where the CNN responses are obtained at each level of the Gaussian scale decomposition pyramid. Performance of the proposed method is experimentally validated on the publicly available retinal microscopy instrument tracking (RMIT) dataset ( Our method tracks tools with a maximum accuracy of \(99.13\%\) which substantiates the efficacy of the proposed method in comparison to existing approaches.



We thank NVIDIA Inc. for donating the GTX TitanX GPU used in this work.


  1. 1.
    Allan, M., Ourselin, S., Thompson, S., Hawkes, D.J., Kelly, J., Stoyanov, D.: Toward detection and localization of instruments in minimally invasive surgery. IEEE Trans. Biomed. Eng. 60(4), 1050–1058 (2013)CrossRefGoogle Scholar
  2. 2.
    Baek, Y.M., Tanaka, S., Kanako, H., Sugita, N., Morita, A., Sora, S., Mochizuki, R., Mitsuishi, M.: Full state visual forceps tracking under a microscope using projective contour models. In: International Conference on Robotics and Automation, pp. 2919–2925. IEEE (2012)Google Scholar
  3. 3.
    Balicki, M., Han, J.-H., Iordachita, I., Gehlbach, P., Handa, J., Taylor, R., Kang, J.: Single fiber optical coherence tomography microsurgical instruments for computer and robot-assisted retinal surgery. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5761, pp. 108–115. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04268-3_14 CrossRefGoogle Scholar
  4. 4.
    Chen, C.J., Huang, W.S.W., Song, K.T.: Image tracking of laparoscopic instrument using spiking neural networks. In: International Conference on Control Automation and Systems, pp. 951–955. IEEE (2013)Google Scholar
  5. 5.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Patt. Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  6. 6.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  7. 7.
    LeCun, Y., et al.: Generalization and network design strategies. J. Connect. Perspect. 143–155 (1989)Google Scholar
  8. 8.
    Li, Y., Chen, C., Huang, X., Huang, J.: Instrument tracking via online learning in retinal microsurgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 464–471. Springer, Cham (2014). doi: 10.1007/978-3-319-10404-1_58 Google Scholar
  9. 9.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)Google Scholar
  10. 10.
    Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., di San Filippo, C.A., Belagiannis, V., Eslami, A., Navab, N.: Surgical tool tracking and pose estimation in retinal microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 266–273. Springer, Cham (2015). doi: 10.1007/978-3-319-24553-9_33 CrossRefGoogle Scholar
  11. 11.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arxiv preprint arXiv:1409.1556 (2014)
  12. 12.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  13. 13.
    Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33418-4_70 CrossRefGoogle Scholar
  14. 14.
    Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee, H.: Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of Conference on Computer Vision and Pattern Recognition, pp. 249–258 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Kaustuv Mishra
    • 1
  • Rachana Sathish
    • 1
  • Debdoot Sheet
    • 1
    Email author
  1. 1.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations