Abstract
Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and object detection tasks. Therefore, localization has its unique challenges different from segmentation or detection. In this paper, we propose a zoom-in attentive network (ZIAN) for anatomical landmark localization in ocular images. First, a coarse-to-fine, or “zoom-in” strategy is utilized to learn the contextualized features in different scales. Then, an attentive fusion module is adopted to aggregate multi-scale features, which consists of 1) a co-attention network with a multiple regions-of-interest (ROIs) scheme that learns complementary features from the multiple ROIs, 2) an attention-based fusion module which integrates the multi-ROIs features and non-ROI features. We evaluated ZIAN on two open challenge tasks, i.e., the fovea localization in fundus images and scleral spur localization in AS-OCT images. Experiments show that ZIAN achieves promising performances and outperforms state-of-the-art localization methods. The source code and trained models of ZIAN are available at https://github.com/leixiaofeng-astar/OMIA9-ZIAN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Early Treatment Diabetic Retinopathy Study Research Group: Early photocoagulation for diabetic retinopathy. Ophthalmology 98, 766–785 (1991)
Estudillo-Ayala, M.D.J., Aguirre-Ramos, H., Avina-Cervantes, J.G., Cruz-Duarte, J.M., Cruz-Aceves, I., Ruiz-Pinales, J.: Algorithmic analysis of vesselness and blobness for detecting retinopathies based on fractional Gaussian filters. Mathematics 8(5), 744 (2020)
Meindert, N., Michael, D.A., Bram, V.G.: Fast detection of the optic disc and fovea in color fundus photographs. Med. Image Anal. 13(6), 859–870 (2009)
Asim, K.M., Basit, A., Jalil, A.: Detection and localization of fovea in human retinal fundus images. In: 2012 International Conference on Emerging Technologies (ICET) (2012)
Li, T., et al.: Applications of deep learning in fundus images: a review. Med. Image Anal. 69, 101971 (2021)
Sidey, G., Jenni, A.M.: Machine learning in medicine: a practical introduction. BMC Med. Res. Methodol. 19, 64 (2019)
Zhao, Z.-Q., Zheng, P., Xu, S.-T., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Chen, C., Wang, B., Lu, C.X., Trigoni, N., Markham, A.: A survey on deep learning for localization and mapping: towards the age of spatial machine intelligence. arXiv preprint arXiv:2006.12567 (2020)
Fu, H., et al.: Age challenge: angle closure glaucoma evaluation in anterior segment optical coherence tomography. Med. Image Anal. 66, 101798 (2020)
Noothout, J.M.H., et al.: Deep learning-based regression and classification for automatic landmark localization in medical images. IEEE Trans. Med. Imaging 39, 4011–4022 (2020)
Huang, W., Yang, C., Hou, T.: Spine landmark localization with combining of heatmap regression and direct coordinate regression. arXiv preprint arXiv:2007.05355 (2020)
Tao, R., Zheng, G.: Spine-transformers: vertebra detection and localization in arbitrary field-of-view spine CT with transformers. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 93–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_9
Payer, C., Štern, D., Bischof, H., Urschler, M.: Integrating spatial configuration into heatmap regression based CNNs for landmark localization. Med. Image Anal. 54, 03 (2019)
Bhalodia, R., et al.: Improving pneumonia localization via cross-attention on medical images and reports. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 571–581. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_53
Kang, J., Oh, K., Oh, I.S.: Accurate landmark localization for medical images using perturbations. Appl. Sci. 11(21), 10277 (2021)
Liu, P., et al.: Reproducibility of deep learning based scleral spur localisation and anterior chamber angle measurements from anterior segment optical coherence tomography images. Br. J. Ophthalmol. (2022)
Zhu, H., Yao, Q., Xiao, L., Zhou, S.K.: You only learn once: universal anatomical landmark detection. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 85–95. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_9
Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: unsupervised video object segmentation with co-attention Siamese networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Li, S., Sui, X., Luo, X., Xu, X., Yong, L., Goh, R.S.M.: Medical image segmentation using squeeze-and-expansion transformers. In: The 30th International Joint Conference on Artificial Intelligence (IJCAI) (2021)
Cina, A., et al.: 2-step deep learning model for landmarks localization in spine radiographs. Sci. Rep. 11(1), 1–12 (2021)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. arXiv preprint arXiv:1606.00061 (2016)
Nguyen, D.K., Okatani, T.: Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6087–6096 (2018)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010 (2017)
Voita, E., Talbot, D., Moiseev, F., Sennrich, R., Titov, I.: Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Cordonnier, J.B., Loukas, A., Jaggi, M.: Multi-head attention: collaborate instead of concatenate. arXiv preprint arXiv:2006.16362 (2020)
Xie, R., et al.: End-to-end fovea localisation in colour fundus images with a hierarchical deep regression network. IEEE Trans. Med. Imaging 40(1), 116–128 (2021)
Ke, S., Bin, X., Dong, L., Jingdong, W.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, vol. 97, pp. 6105–6114. Proceedings of Machine Learning Research, PMLR 2019 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Orlando, J.I., et al.: Refuge challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med. Image Anal. 59, 101570 (2020)
Acknowledgements
This work was supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funds (Grant Number: A20H4b0141), and its RIE2020 Health and Biomedical Sciences (HBMS) Industry Alignment Fund Pre-Positioning (IAF-PP, Grant Number: H20c6a0031). Xinxing Xu is the corresponding author.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lei, X. et al. (2022). Localizing Anatomical Landmarks in Ocular Images Using Zoom-In Attentive Networks. In: Antony, B., Fu, H., Lee, C.S., MacGillivray, T., Xu, Y., Zheng, Y. (eds) Ophthalmic Medical Image Analysis. OMIA 2022. Lecture Notes in Computer Science, vol 13576. Springer, Cham. https://doi.org/10.1007/978-3-031-16525-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-16525-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16524-5
Online ISBN: 978-3-031-16525-2
eBook Packages: Computer ScienceComputer Science (R0)