Adaptive Affinity Fields for Semantic Segmentation

  • Tsung-Wei KeEmail author
  • Jyh-Jing HwangEmail author
  • Ziwei LiuEmail author
  • Stella X. YuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)


Semantic segmentation has made much progress with increasingly powerful pixel-wise classifiers and incorporating structural priors via Conditional Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a simpler alternative that learns to verify the spatial structure of segmentation during training only. Unlike existing approaches that enforce semantic labels on individual pixels and match labels between neighbouring pixels, we propose the concept of Adaptive Affinity Fields (AAF) to capture and match the semantic relations between neighbouring pixels in the label space. We use adversarial learning to select the optimal affinity field size for each semantic category. It is formulated as a minimax problem, optimizing our segmentation neural network in a best worst-case learning scenario. AAF is versatile for representing structures as a collection of pixel-centric relations, easier to train than GAN and more efficient than CRF without run-time inference. Our extensive evaluations on PASCAL VOC 2012, Cityscapes, and GTA5 datasets demonstrate its above-par segmentation performance and robust generalization across domains.


Semantic segmentation Affinity field Adversarial learning 


  1. 1.
    Amir, A., Lindenbaum, M.: Grouping-based nonadditive verification. IEEE Trans. Pattern Anal. Mach. Intell. 20(2), 186–192 (1998)CrossRefGoogle Scholar
  2. 2.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)CrossRefGoogle Scholar
  3. 3.
    Bertasius, G., Shi, J., Torresani, L.: Semantic segmentation with boundary neural fields. In: CVPR (2016)Google Scholar
  4. 4.
    Bertasius, G., Torresani, L., Yu, S.X., Shi, J.: Convolutional random walk networks for semantic image segmentation. In: CVPR (2017)Google Scholar
  5. 5.
    Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: CVPR (2016)Google Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs (2016). arXiv preprint arXiv:1606.00915
  7. 7.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
  8. 8.
    Chen, L.C., Schwing, A., Yuille, A., Urtasun, R.: Learning deep structured models. In: ICML (2015)Google Scholar
  9. 9.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)Google Scholar
  10. 10.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  11. 11.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)CrossRefGoogle Scholar
  12. 12.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  13. 13.
    Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFS with gaussian edge potentials. In: NIPS (2011)Google Scholar
  16. 16.
    Li, X., Liu, Z., Luo, P., Loy, C.C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: CVPR (2017)Google Scholar
  17. 17.
    Lin, G., Shen, C., van den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR (2016)Google Scholar
  18. 18.
    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: NIPS (2017)Google Scholar
  19. 19.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: CVPR (2015)Google Scholar
  20. 20.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Deep learning markov random field for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1814–1828 (2017)CrossRefGoogle Scholar
  21. 21.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  22. 22.
    Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. NIPS Workshop (2016)Google Scholar
  23. 23.
    Maire, M., Narihira, T., Yu, S.X.: Affinity CNN: learning pixel-centric pairwise relations for figure/ground embedding. In: CVPR (2016)Google Scholar
  24. 24.
    Mostajabi, M., Maire, M., Shakhnarovich, G.: Regularizing deep networks by modeling and predicting label structure. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5629–5638 (2018)Google Scholar
  25. 25.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: CVPR (2015)Google Scholar
  26. 26.
    Poggio, T.: Early vision: from computational structure to algorithms and parallel hardware. Comput. Vis. Graph. Image Process. 31(2), 139–155 (1985)CrossRefGoogle Scholar
  27. 27.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015). arXiv preprint arXiv:1511.06434
  28. 28.
    Richter, Stephan R., Vineet, Vibhav, Roth, Stefan, Koltun, Vladlen: Playing for data: ground truth from computer games. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). Scholar
  29. 29.
    Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas: U-Net: convolutional networks for biomedical image segmentation. In: Navab, Nassir, Hornegger, Joachim, Wells, William M., Frangi, Alejandro F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  30. 30.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
  33. 33.
    Wu, Z., Shen, C., Hengel, A.V.D.: High-performance semantic segmentation using very deep fully convolutional networks (2016). arXiv preprint arXiv:1604.04339
  34. 34.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)Google Scholar
  35. 35.
    Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV (2003)Google Scholar
  36. 36.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  37. 37.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

Personalised recommendations