Advertisement

A Projected Gradient Descent Method for CRF Inference Allowing End-to-End Training of Arbitrary Pairwise Potentials

  • Måns LarssonEmail author
  • Anurag Arnab
  • Fredrik Kahl
  • Shuai Zheng
  • Philip Torr
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10746)

Abstract

Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning.

In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.

Keywords

Conditional Random Fields Segmentation Convolutional Neural Networks 

Notes

Acknowledgements

This work has been funded by the Swedish Research Council (grant no. 2016-04445), the Swedish Foundation for Strategic Research (Semantic Mapping and Visual Navigation for Smart Robots), Vinnova/FFI (Perceptron, grant no. 2017-01942), ERC (grant ERC-2012-AdG 321162-HELIOS) and EPSRC (grant Seebibyte EP/M013774/1 and EP/N019474/1).

Supplementary material

466071_1_En_37_MOESM1_ESM.pdf (1.5 mb)
Supplementary material 1 (pdf 1547 KB)

References

  1. 1.
    Adams, A., Baek, J., Davis, M.A.: Fast high-dimensional filtering using the permutohedral lattice. In: Computer Graphics Forum (2010)Google Scholar
  2. 2.
    Arnab, A., Jayasumana, S., Zheng, S., Torr, P.H.S.: Higher order conditional random fields in deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 524–540. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_33 CrossRefGoogle Scholar
  3. 3.
    Belanger, D., McCallum, A.: Structured prediction energy networks. In: International Conference on Machine Learning (2016)Google Scholar
  4. 4.
    Blake, A., Kohli, P., Rother, C.: Markov Random Fields for Vision and Image Processing. MIT Press, Cambridge (2011)zbMATHGoogle Scholar
  5. 5.
    Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47967-8_8 CrossRefGoogle Scholar
  6. 6.
    Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discret. Appl. Math. 123, 155–225 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Bottou, L., Bengio, Y., Le Cun, Y.: Global training of document processing systems using graph transformer networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–494. IEEE (1997)Google Scholar
  8. 8.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  9. 9.
    Chandra, S., Kokkinos, I.: Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 402–418. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_25 Google Scholar
  10. 10.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International Conference on Learning Representations (2015)Google Scholar
  11. 11.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  12. 12.
    Chen, L.C., Schwing, A.G., Yuille, A.L., Urtasun, R.: Learning deep structured models. In: International Conference Machine Learning, Lille, France (2015)Google Scholar
  13. 13.
    Chen, Y., Ye, X.: Projection onto a simplex. arXiv preprint arXiv:1101.6081 (2011)
  14. 14.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  15. 15.
    Desmaison, A., Bunel, R., Kohli, P., Torr, P.H.S., Kumar, M.P.: Efficient continuous relaxations for dense CRF. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 818–833. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_50 CrossRefGoogle Scholar
  16. 16.
    Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 519–534. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_32 CrossRefGoogle Scholar
  17. 17.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  19. 19.
    Jafari, O.H., Groth, O., Kirillov, A., Yang, M.Y., Rother, C.: Analyzing modular CNN architectures for joint depth prediction and semantic segmentation. In: International Conference on Robotics and Automation (2017)Google Scholar
  20. 20.
    Jampani, V., Kiefel, M., Gehler, P.V.: Learning sparse high dimensional filters: image filtering, dense CRFs and bilateral neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2016Google Scholar
  21. 21.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  22. 22.
    Kirillov, A., Schlesinger, D., Zheng, S., Savchynskyy, B., Torr, P.H.S., Rother, C.: Joint training of generic CNN-CRF models with stochastic optimization. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 221–236. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54184-6_14 CrossRefGoogle Scholar
  23. 23.
    Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  24. 24.
    Kraehenbuehl, P., Koltun, V.: Parameter learning and convergent inference for dense random fields. In: Proceedings of the 30th International Conference on Machine Learning, pp. 513–521 (2013)Google Scholar
  25. 25.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Neural Information Processing Systems (2011)Google Scholar
  26. 26.
    Lin, G., Shen, C., Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2016Google Scholar
  27. 27.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: International Conference on Computer Vision (2015)Google Scholar
  28. 28.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  29. 29.
    Peng, J., Bo, L., Xu, J.: Conditional neural fields. In: Advances in Neural Information Processing Systems, pp. 1419–1427 (2009)Google Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (2015)Google Scholar
  31. 31.
    Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. In: ACM Transactions on Graphics, pp. 309–314 (2004)Google Scholar
  32. 32.
    Schwing, A., Urtasun, R.: Fully connected deep structured networks. arXiv preprint arXiv:1503.02351 (2015)
  33. 33.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  34. 34.
    Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. In: Proceeding of the ACM International Conference on Multimedia (2015)Google Scholar
  35. 35.
    Vineet, V., Warrell, J., Torr, P.H.S.: Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 31–44. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_3 CrossRefGoogle Scholar
  36. 36.
    Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  37. 37.
    Wang, W., Fidler, S., Urtasun, R.: Proximal deep structured models. In: Neural Information Processing Systems (2016)Google Scholar
  38. 38.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: International Conference on Computer Vision (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Chalmers University of TechnologyGothenburgSweden
  2. 2.University of OxfordOxfordEngland
  3. 3.Centre for Mathematical SciencesLund UniversityLundSweden

Personalised recommendations