Deep Similarity Fusion Networks for One-Shot Semantic Segmentation

  • Shuchang LyuEmail author
  • Guangliang Cheng
  • Qimin Ding
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12046)


One-shot semantic segmentation is a new challenging task extended from traditional semantic segmentation, which aims to predict unseen object categories for each pixel given only one annotated sample. Previous works employ oversimplified operations to fuse the features from query image and support image, while neglecting to incorporate multi-scale information that is essential for the one-shot segmentation task. In this paper, we propose a novel one-shot based architecture, Deep Similarity Fusion Network (DSFN) to tackle this issue. Specifically, a new similarity feature generator is proposed to generate multi-scale similarity feature maps, which can provide both contextual and spatial information for the following modules. Then, a similarity feature aggregator is employed to fuse different scale feature maps in a coarse-to-fine manner. Finally, a simple yet effective convolutional module is introduced to create the final segmentation mask. Extensive experiments on \(\mathrm {PASCAL-5^{i}}\) demonstrate that DSFN outperforms the state-of-the-art methods by a large margin with mean IoU of 47.7%.


Deep Similarity Fusion Network One-shot learning Semantic segmentation 


  1. 1.
    Adam, S., Sergey, B., Matthew, B., Daan, W., Timothy, L.: Meta-learning with memory-augmented neural networks. In: ICML, pp. 1842–1850 (2016)Google Scholar
  2. 2.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  3. 3.
    Caelles, S., Maninis, K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Gool, L.V.: One-shot video object segmentation. In: IEEE CVPR, pp. 5320–5329 (2017)Google Scholar
  4. 4.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. CoRR abs/1412.7062 (2014).
  5. 5.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  6. 6.
    Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017).
  7. 7.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009)Google Scholar
  9. 9.
    Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: IEEE CVPR, pp. 2393–2402 (2018)Google Scholar
  10. 10.
    Dong, N., Xing, E.: Few-shot semantic segmentation with prototype learning. In: BMVC, p. 79 (2018)Google Scholar
  11. 11.
    Everingham, M., Eslami, S.M.A., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)CrossRefGoogle Scholar
  12. 12.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp. 1126–1135 (2017)Google Scholar
  13. 13.
    Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. CoRR abs/1809.02983 (2018)Google Scholar
  14. 14.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014). Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  17. 17.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE CVPR, pp. 7132–7141 (2018)Google Scholar
  18. 18.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 2261–2269 (2017)Google Scholar
  19. 19.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  21. 21.
    Lin, G., Milan, A., Shen, C., Reid, I.D.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: IEEE CVPR, pp. 5168–5177 (2017)Google Scholar
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR, pp. 3431–3440 (2015)Google Scholar
  23. 23.
    Oriol, V., Charles, B., Timothy, L., Daan, W., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  24. 24.
    Poudel, R.P.K., Bonde, U., Liwicki, S., Zach, C.: Contextnet: Exploring context and detail for semantic segmentation in real-time. In: BMVC, p. 146 (2018)Google Scholar
  25. 25.
    Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: Fast semantic segmentation network. CoRR abs/1902.04502 (2019).
  26. 26.
    Rakelly, K., Shelhamer, E., Darrell, T., Efros, A., Levine, S.: Conditional networks for few-shot semantic segmentation. In: ICLR Workshop (2018)Google Scholar
  27. 27.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  28. 28.
    Sachin, R., Hugo, L.: Optimization as a model for few-shot learning. In: ICLR (2016)Google Scholar
  29. 29.
    Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: BMVC (2017)Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
  31. 31.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE CVPR, pp. 1–9 (2015)Google Scholar
  32. 32.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part XIII. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). Scholar
  33. 33.
    Zhang, X., Wei, Y., Yang, Y., Huang, T.: Sg-one: Similarity guidance network for one-shot semantic segmentation. CoRR abs/1810.09091 (2018).
  34. 34.
    Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). Scholar
  35. 35.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE CVPR, pp. 6230–6239 (2017)Google Scholar
  36. 36.
    Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part IX. LNCS, vol. 11213, pp. 270–286. Springer, Cham (2018). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SensetimeBeijingChina

Personalised recommendations