Skip to main content

Joint Training of Generic CNN-CRF Models with Stochastic Optimization

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Abstract

We propose a new CNN-CRF end-to-end learning framework, which is based on joint stochastic optimization with respect to both Convolutional Neural Network (CNN) and Conditional Random Field (CRF) parameters. While stochastic gradient descent is a standard technique for CNN training, it was not used for joint models so far. We show that our learning method is (i) general, i.e. it applies to arbitrary CNN and CRF architectures and potential functions; (ii) scalable, i.e. it has a low memory footprint and straightforwardly parallelizes on GPUs; (iii) easy in implementation. Additionally, the unified CNN-CRF optimization approach simplifies a potential hardware implementation. We empirically evaluate our method on the task of semantic labeling of body parts in depth images and show that it compares favorably to competing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://host.robots.ox.ac.uk:8080/leaderboard.

  2. 2.

    We use the commonly adopted terminology from the CNN literature for technical details, to allow reproducibility of our results.

References

  1. Lin, G., Shen, C., Reid, I.D., van den Hengel, A.: Efficient piecewise training of deep structured models for semantic segmentation. preprint arXiv:1504.01013 (2015)

  2. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)

    Google Scholar 

  3. Chen, L., Schwing, A.G., Yuille, A.L., Urtasun, R.: Learning deep structured models. In: ICML, pp. 1785–1794 (2015)

    Google Scholar 

  4. Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: ICCV (2011)

    Google Scholar 

  5. Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78, 1605–1613 (1990)

    Article  Google Scholar 

  6. Richmond, D.L., Kainmueller, D., Yang, M.Y., Myers, E.W., Rother, C.: Relating cascaded random forests to deep convolutional neural networks for semantic segmentation. preprint arXiv:1507.07583 (2015)

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. preprint arXiv:1411.4038 (2014)

  8. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. TPAMI 35, 1915–1929 (2013)

    Article  Google Scholar 

  9. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. preprint arXiv:1412.7062 (2014)

  10. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS (2011)

    Google Scholar 

  11. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: Proceedings of ICCV (2015)

    Google Scholar 

  12. Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. preprint arXiv:1503.02351 (2015)

  13. Adams, A., Baek, J., Davis, M.A.: Fast high-dimensional filtering using the permutohedral lattice. In: Computer Graphics Forum, vol. 29. Wiley Online Library (2010)

    Google Scholar 

  14. Domke, J.: Learning graphical model parameters with approximate marginal inference. TPAMI 35, 2454–2467 (2013)

    Google Scholar 

  15. Kiefel, M., Gehler, P.V.: Human pose estimation with fields of parts. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 331–346. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_22

    Google Scholar 

  16. Barbu, A.: Training an active random field for real-time image denoising. IEEE Trans. Image Process. 18, 2451–2462 (2009)

    Article  MathSciNet  Google Scholar 

  17. Ross, S., Munoz, D., Hebert, M., Bagnell, J.A.: Learning message-passing inference machines for structured prediction. In: Proceedings of CVPR (2011)

    Google Scholar 

  18. Stoyanov, V., Ropson, A., Eisner, J.: Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In: Proceedings of AISTATS (2011)

    Google Scholar 

  19. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of NIPS (2014)

    Google Scholar 

  20. Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of ICCV (2015)

    Google Scholar 

  21. Sutton, C., McCallum, A.: Piecewise training of undirected models. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2005)

    Google Scholar 

  22. He, X., Zemel, R.S., Carreira-perpiñán, M.Á.: Multiscale conditional random fields for image labeling. In: CVPR. Citeseer (2004)

    Google Scholar 

  23. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends\({\textregistered }\) Mach. Learn. 1, 1–305 (2008)

    Google Scholar 

  24. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. TPAMI 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  25. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)

    Google Scholar 

  26. Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, vol. 65. Wiley, Hoboken (2005)

    MATH  Google Scholar 

  27. Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 473–483 (1992)

    Google Scholar 

  28. Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)

    MATH  Google Scholar 

  29. Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.: Parallel Gibbs sampling: from colored fields to thin junction trees. In: International Conference on Artificial Intelligence and Statistics. pp. 324–332 (2011)

    Google Scholar 

  30. Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)

    Article  MATH  Google Scholar 

  31. Yuille, A.L.: The convergence of contrastive divergences. In: NIPS (2004)

    Google Scholar 

  32. Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: ICML. ACM, New York (2008)

    Google Scholar 

  33. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC 2012) Results

    Google Scholar 

  34. Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  35. Denil, M., Matheson, D., de Freitas, N.: Consistency of online random forests. In: ICML (2013)

    Google Scholar 

  36. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc., New York (2012)

    Google Scholar 

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  38. Ren, S., Cao, X., Wei, Y., Sun, J.: Global refinement of random forest. In: CVPR (2015)

    Google Scholar 

  39. Cheng, M.M., Prisacariu, V.A., Zheng, S., Torr, P.H.S., Rother, C.: Densecut: densely connected CRFs for realtime Grabcut. Comput. Graph. Forum 34, 193–201 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by: European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 647769); German Federal Ministry of Education and Research (BMBF, 01IS14014A-D); EPSRC EP/I001107/2; ERC grant ERC-2012-AdG 321162-HELIOS. The computations were performed on an HPC Cluster at the Center for Information Services and High Performance Computing (ZIH) at TU Dresden.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Schlesinger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kirillov, A., Schlesinger, D., Zheng, S., Savchynskyy, B., Torr, P.H.S., Rother, C. (2017). Joint Training of Generic CNN-CRF Models with Stochastic Optimization. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics