EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

Song, Xiao; Zhao, Xu; Hu, Hanwen; Fang, Liangji

doi:10.1007/978-3-030-20873-8_2

Xiao Song¹⁸,
Xu Zhao¹⁸,
Hanwen Hu¹⁸ &
…
Liangji Fang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Included in the following conference series:

Asian Conference on Computer Vision

3307 Accesses
57 Citations

Abstract

Recent convolutional neural networks, especially end-to-end disparity estimation models, achieve remarkable performance on stereo matching task. However, existed methods, even with the complicated cascade structure, may fail in the regions of non-textures, boundaries and tiny details. Focus on these problems, we propose a multi-task network EdgeStereo that is composed of a backbone disparity network and an edge sub-network. Given a binocular image pair, our model enables end-to-end prediction of both disparity map and edge map. Basically, we design a context pyramid to encode multi-scale context information in disparity branch, followed by a compact residual pyramid for cascaded refinement. To further preserve subtle details, our EdgeStereo model integrates edge cues by feature embedding and edge-aware smoothness loss regularization. Comparative results demonstrates that stereo matching and edge detection can help each other in the unified model. Furthermore, our method achieves state-of-art performance on both KITTI Stereo and Scene Flow benchmarks, which proves the effectiveness of our design.

This research is supported by the funding from NSFC programs (61673269, 61273285, U1764264).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achtelik, M., Bachrach, A., He, R., Prentice, S., Roy, N.: Stereo vision and laser odometry for autonomous helicopters in GPS-denied indoor environments. In: Unmanned Systems Technology XI, vol. 7332, p. 733219 (2009)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. TPAMI 33(5), 898–916 (2011)
Article Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Google Scholar
Bleyer, M., Rother, C., Kohli, P., Scharstein, D., Sinha, S.: Object stereo—joint stereo matching and object segmentation. In: CVPR, pp. 3081–3088 (2011)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: CVPR (2018)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2018)
Article Google Scholar
Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)
Google Scholar
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C.: Flownet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)
Google Scholar
Dosovitskiy, A., Fischery, P., Ilg, E., HUsser, P.: Flownet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Google Scholar
Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: CVPR, pp. 5248–5257 (2017)
Google Scholar
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, pp. 6602–6611 (2017)
Google Scholar
Guney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR, pp. 4165–4175 (2015)
Google Scholar
Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: CVPR, pp. 807–814 (2005)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACMMM, pp. 675–678 (2014)
Google Scholar
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R.: End-to-end learning of geometry and context for deep stereo regression. In: ICCV (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liang, Z., Feng, Y., Guo, Y., Liu, H.: Learning deep correspondence through prior and posterior feature constancy. arXiv preprint arXiv:1712.01039 (2017)
Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. In: ICLR (2016)
Google Scholar
Liu, Y., Lew, M.S.: Learning relaxed deep supervision for better edge detection. In: CVPR, pp. 231–240 (2016)
Google Scholar
Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: CVPR, pp. 5872–5881 (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: CVPR, pp. 5695–5703 (2016)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR, pp. 3061–3070 (2015)
Google Scholar
Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W.: The role of context for object detection and semantic segmentation in the wild. In: CVPR, pp. 891–898 (2014)
Google Scholar
Pang, J., Sun, W., Ren, J., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: ICCV Workshop, vol. 3, pp. 1057–7149 (2017)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1–3), 7–42 (2002)
Article Google Scholar
Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., Suppa, M.: Stereo vision based indoor/outdoor navigation for flying robots. In: IROS, pp. 3955–3962 (2013)
Google Scholar
Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC, vol. 2, p. 4 (2016)
Google Scholar
Seki, A., Pollefeys, M.: SGM-nets: semi-global matching with neural networks. In: CVPR, pp. 21–26 (2017)
Google Scholar
Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective confidence learning. In: CVPR, pp. 4641–4650 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV, pp. 1395–1403 (2015)
Google Scholar
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_49
Chapter Google Scholar
Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J.: SegStereo: exploiting semantic information for disparity estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 660–676. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_39
Chapter Google Scholar
Yu, L., Wang, Y., Wu, Y., Jia, Y.: Deep stereo matching with explicit cost aggregation sub-architecture. In: AAAI (2018)
Google Scholar
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. JMLR 17(1–32), 2 (2016)
MATH Google Scholar
Zhang, L., Seitz, S.M.: Estimating optimal parameters for mrf stereo from a single image pair. TPAMI 29(2), 331–342 (2007)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR, pp. 2881–2890 (2017)
Google Scholar
Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. In: CVPR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Xiao Song, Xu Zhao, Hanwen Hu & Liangji Fang

Authors

Xiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hanwen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liangji Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Zhao .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Zhao, X., Hu, H., Fang, L. (2019). EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-20873-8_2
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics