Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles

Hung, Chun Ho; Xu, Li; Jia, Jiaya

doi:10.1007/s11263-012-0559-y

Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles

Published: 24 August 2012

Volume 102, pages 271–292, (2013)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chun Ho Hung¹,
Li Xu¹ &
Jiaya Jia¹

1124 Accesses
30 Citations
Explore all metrics

Abstract

We propose a depth and image scene flow estimation method taking the input of a binocular video. The key component is motion-depth temporal consistency preservation, making computation in long sequences reliable. We tackle a number of fundamental technical issues, including connection establishment between motion and depth, structure consistency preservation in multiple frames, and long-range temporal constraint employment for error correction. We address all of them in a unified depth and scene flow estimation framework. Our main contributions include development of motion trajectories, which robustly link frame correspondences in a voting manner, rejection of depth/motion outliers through temporal robust regression, novel edge occurrence map estimation, and introduction of anisotropic smoothing priors for proper regularization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Background-oriented schlieren (BOS) techniques

Article Open access 06 March 2015

LSD-SLAM: Large-Scale Direct Monocular SLAM

Multi-modal visual tracking: Review and experimental comparison

Article Open access 03 January 2024

Notes

2D-Plus-Depth: (2009). Stereoscopic video coding format. http://en.wikipedia.org/wiki/2D-plus-depth.
http://www.cse.cuhk.edu.hk/%7eleojia/projects/depth/.

References

Álvarez, L., Deriche, R., Papadopoulo, T., & Sánchez, J. (2007). Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision, 75, 371–385.
Article Google Scholar
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92, 1–31.
Article Google Scholar
Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: a view centered variational approach. In CVPR (pp. 1506–1513).
Google Scholar
Black, M. J. (1994). Recursive non-linear estimation of discontinuous flow fields. In ECCV (Vol. 1, pp. 138–145).
Google Scholar
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In ECCV (Vol. 4, pp. 25–36).
Google Scholar
Brox, T., Bregler, C., & Malik, J. (2009). Large displacement optical flow. In CVPR (pp. 41–48).
Google Scholar
Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estimation: combining highest accuracy with real-time performance. In ICCV (pp. 749–755).
Google Scholar
Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets horn/Schunck: combining local and global optic flow methods. International Journal of Computer Vision, 61, 211–231.
Article Google Scholar
Cech, J., Sanchez-Riera, J., & Horaud, R. (2011). Scene flow estimation by growing correspondence seeds. In CVPR (pp. 3129–3136).
Chapter Google Scholar
Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In CVPR (pp. 1362–1376).
Google Scholar
Hadfield, S., & Bowden, R. (2011). Kinecting the dots: particle based scene flow from depth sensors. In ICCV (pp. 2290–2295).
Google Scholar
Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In ICCV (pp. 1–7).
Google Scholar
Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48, 173–194.
Article MATH Google Scholar
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 147–159.
Article Google Scholar
Min, D. B., & Sohn, K. (2006). Edge-preserving simultaneous joint motion-disparity estimation. In ICPR (Vol. 2, pp. 74–77).
Google Scholar
OpenMP ARB (2012). Open multi-processing. http://openmp.org/.
Patras, I., Alvertos, N., & Tziritas, G. (1996). Joint disparity and motion field estimation in stereoscopic image sequences. In International conference on pattern recognition (Vol. 1, pp. 359–363).
Chapter Google Scholar
Rabe, C., Müller, T., Wedel, A., & Franke, U. (2010). Dense, robust, and accurate motion field estimation from stereo image sequences in real-time. In ECCV (Vol. 4, pp. 582–595).
Google Scholar
Richardt, C., Orr, D., Davies, I., Criminisi, A., & Dodgson, N. A. (2010). Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In ECCV (Vol. 3, pp. 510–523).
Google Scholar
Sand, P., & Teller, S. J. (2006). Particle video: long-range motion estimation using point trajectories. In CVPR (Vol. 2, pp. 2195–2202).
Google Scholar
Sand, P., & Teller, S. J. (2008). Particle video: long-range motion estimation using point trajectories. International Journal of Computer Vision, 80, 72–91.
Article Google Scholar
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42.
Article MATH Google Scholar
Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3d. ACM Transactions on Graphics, 25, 835–846.
Article Google Scholar
Sun, D., Roth, S., Lewis, J. P., & Black, M. J. (2008). Learning optical flow. In ECCV (Vol. 3, pp. 83–97).
Google Scholar
Sundaram, N., Brox, T., & Keutzer, K. (2010). Dense point trajectories by gpu-accelerated large displacement optical flow. In ECCV (Vol. 1, pp. 438–451).
Google Scholar
Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In ICCV (pp. 839–846).
Google Scholar
University of Auckland (2008). Enpeda. Image sequence analysis test site (eisats). http://www.mi.auckland.ac.nz/EISATS/.
Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., & Theobalt, C. (2010). Joint estimation of motion, structure and geometry from stereo sequences. In ECCV (Vol. 4, pp. 568–581).
Google Scholar
Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behavior on synthetic and real-world stereo sequences. In International conference of image and vision computing New Zealand (IVCNZ) (pp. 1–6).
Chapter Google Scholar
Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 475–480.
Article Google Scholar
Vogel, C., Schindler, K., & Roth, S. (2011). 3d scene flow estimation with a rigid motion prior. In ICCV (pp. 1291–1298).
Google Scholar
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In ECCV (Vol. 1, pp. 739–751).
Google Scholar
Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision, 95, 29–51.
Article MATH Google Scholar
Xiao, J., Cheng, H., Sawhney, H. S., Rao, C., & Isnardi, M. A. (2006). Bilateral filtering-based optical flow estimation with occlusion detection. In ECCV (Vol. 1, pp. 211–224).
Google Scholar
Xu, L., Chen, J., & Jia, J. (2008). A segmentation based variational model for accurate optical flow estimation. In ECCV (Vol. 1, pp. 671–684).
Google Scholar
Xu, L., Jia, J., & Matsushita, Y. (2010). Motion detail preserving optical flow estimation. In CVPR (pp. 1293–1300).
Google Scholar
Yoon, K. J., & Kweon, I. S. (2006). Adaptive support-weight approach for correspondence search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 650–656.
Article Google Scholar
Zhang, Z., & Faugeras, O. D. (1992). Estimation of displacements from two 3-d frames obtained from stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1141–1156.
Article Google Scholar
Zhang, Y., & Kambhamettu, C. (2001). On 3d scene flow and structure estimation. In CVPR (Vol. 2, pp. 778–785).
Google Scholar
Zhang, L., Curless, B., & Seitz, S. M. (2003). Spacetime stereo: shape recovery for dynamic scenes. In CVPR (Vol. 2, pp. 367–374).
Google Scholar
Zhang, G., Jia, J., Wong, T. T., & Bao, H. (2009). Consistent depth maps recovery from a video sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 974–988.
Article Google Scholar
Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, B. R. A., & Seidel, H. P. (2009). Complementary optic flow. In EMMCVPR (pp. 207–220).
Google Scholar

Download references

Acknowledgements

The authors would like to thank the associate editor and all the anonymous reviewers for their time and effort. This work is supported by a grant from the Research Grants Council of the Hong Kong SAR (Project No. 413110).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Chun Ho Hung, Li Xu & Jiaya Jia

Authors

Chun Ho Hung
View author publications
You can also search for this author in PubMed Google Scholar
Li Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaya Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaya Jia.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(MP4 121.5 MB)

Appendices

Appendix A

We give the details of solving the Euler-Lagrange equation (21):

With the applied anisotropic diffusion tensor, the smoothness term involves d_hh, d_vv, and d_hv, which relate several neighboring points. We use the indices in Fig. 20 to represent the 2D coordinates: d₁=d(i+1,j+1). q is used to index the current point (i,j). We apply central difference in the second order derivative computation. Specifically, we introduce function ζ(⋅) expressed as

(ζd_v)_v and (ζd_h)_v are defined similarly. Then we discretize a grid with size h _h×h _v to apply Gauss-Seidel relaxation. By defining

we represent the anisotropic factors in simpler forms. The increment Δd can be computed using the following iterations:

where \(\mathcal{N}\) is the set of neighboring pixels, \(\mathcal{N}_{h}(q)=\{2,6\}\), and \(\mathcal{N}_{v}(q)=\{0,4\}\). Further, g ₁ is defined as

and b can be derived as

where

\(\overline{p}=p \mod8\). To facilitate computation, we adopt a standard non-linear multi-grid numerical scheme (Bruhn and Weickert 2005) to accelerate convergence. The Gauss-Seidel relaxation works as the pre- and post-smoother, which is applied twice in each level.

Appendix B

After discretization, the linear equations to approximate Eq. (20) can be easily derived. Δu, Δv, and Δδd are iteratively refined, by fixing the other two variables during update. It leads to the Gauss-Seidel relaxation, written as

where

g ₁,g ₂ are functions defined in Appendix A. The Gauss-Seidel iteration is accelerated by a non-linear Multi-grid numerical scheme similar to the one to compute disparities in Appendix A.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hung, C.H., Xu, L. & Jia, J. Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles. Int J Comput Vis 102, 271–292 (2013). https://doi.org/10.1007/s11263-012-0559-y

Download citation

Received: 05 November 2011
Accepted: 08 August 2012
Published: 24 August 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11263-012-0559-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles

Abstract

Access this article

Similar content being viewed by others

Background-oriented schlieren (BOS) techniques

LSD-SLAM: Large-Scale Direct Monocular SLAM

Multi-modal visual tracking: Review and experimental comparison

Notes

References

Acknowledgements