International Journal of Computer Vision

, Volume 102, Issue 1–3, pp 271–292 | Cite as

Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles

  • Chun Ho Hung
  • Li Xu
  • Jiaya JiaEmail author


We propose a depth and image scene flow estimation method taking the input of a binocular video. The key component is motion-depth temporal consistency preservation, making computation in long sequences reliable. We tackle a number of fundamental technical issues, including connection establishment between motion and depth, structure consistency preservation in multiple frames, and long-range temporal constraint employment for error correction. We address all of them in a unified depth and scene flow estimation framework. Our main contributions include development of motion trajectories, which robustly link frame correspondences in a voting manner, rejection of depth/motion outliers through temporal robust regression, novel edge occurrence map estimation, and introduction of anisotropic smoothing priors for proper regularization.


Video depth estimation Consistent scene flow Chained temporal profiles Stereo matching 



The authors would like to thank the associate editor and all the anonymous reviewers for their time and effort. This work is supported by a grant from the Research Grants Council of the Hong Kong SAR (Project No. 413110).

Supplementary material

(MP4 121.5 MB)


  1. Álvarez, L., Deriche, R., Papadopoulo, T., & Sánchez, J. (2007). Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision, 75, 371–385. CrossRefGoogle Scholar
  2. Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92, 1–31. CrossRefGoogle Scholar
  3. Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: a view centered variational approach. In CVPR (pp. 1506–1513). Google Scholar
  4. Black, M. J. (1994). Recursive non-linear estimation of discontinuous flow fields. In ECCV (Vol. 1, pp. 138–145). Google Scholar
  5. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In ECCV (Vol. 4, pp. 25–36). Google Scholar
  6. Brox, T., Bregler, C., & Malik, J. (2009). Large displacement optical flow. In CVPR (pp. 41–48). Google Scholar
  7. Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estimation: combining highest accuracy with real-time performance. In ICCV (pp. 749–755). Google Scholar
  8. Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets horn/Schunck: combining local and global optic flow methods. International Journal of Computer Vision, 61, 211–231. CrossRefGoogle Scholar
  9. Cech, J., Sanchez-Riera, J., & Horaud, R. (2011). Scene flow estimation by growing correspondence seeds. In CVPR (pp. 3129–3136). CrossRefGoogle Scholar
  10. Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In CVPR (pp. 1362–1376). Google Scholar
  11. Hadfield, S., & Bowden, R. (2011). Kinecting the dots: particle based scene flow from depth sensors. In ICCV (pp. 2290–2295). Google Scholar
  12. Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In ICCV (pp. 1–7). Google Scholar
  13. Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48, 173–194. zbMATHCrossRefGoogle Scholar
  14. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 147–159. CrossRefGoogle Scholar
  15. Min, D. B., & Sohn, K. (2006). Edge-preserving simultaneous joint motion-disparity estimation. In ICPR (Vol. 2, pp. 74–77). Google Scholar
  16. OpenMP ARB (2012). Open multi-processing.
  17. Patras, I., Alvertos, N., & Tziritas, G. (1996). Joint disparity and motion field estimation in stereoscopic image sequences. In International conference on pattern recognition (Vol. 1, pp. 359–363). CrossRefGoogle Scholar
  18. Rabe, C., Müller, T., Wedel, A., & Franke, U. (2010). Dense, robust, and accurate motion field estimation from stereo image sequences in real-time. In ECCV (Vol. 4, pp. 582–595). Google Scholar
  19. Richardt, C., Orr, D., Davies, I., Criminisi, A., & Dodgson, N. A. (2010). Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In ECCV (Vol. 3, pp. 510–523). Google Scholar
  20. Sand, P., & Teller, S. J. (2006). Particle video: long-range motion estimation using point trajectories. In CVPR (Vol. 2, pp. 2195–2202). Google Scholar
  21. Sand, P., & Teller, S. J. (2008). Particle video: long-range motion estimation using point trajectories. International Journal of Computer Vision, 80, 72–91. CrossRefGoogle Scholar
  22. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42. zbMATHCrossRefGoogle Scholar
  23. Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3d. ACM Transactions on Graphics, 25, 835–846. CrossRefGoogle Scholar
  24. Sun, D., Roth, S., Lewis, J. P., & Black, M. J. (2008). Learning optical flow. In ECCV (Vol. 3, pp. 83–97). Google Scholar
  25. Sundaram, N., Brox, T., & Keutzer, K. (2010). Dense point trajectories by gpu-accelerated large displacement optical flow. In ECCV (Vol. 1, pp. 438–451). Google Scholar
  26. Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In ICCV (pp. 839–846). Google Scholar
  27. University of Auckland (2008). Enpeda. Image sequence analysis test site (eisats).
  28. Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., & Theobalt, C. (2010). Joint estimation of motion, structure and geometry from stereo sequences. In ECCV (Vol. 4, pp. 568–581). Google Scholar
  29. Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behavior on synthetic and real-world stereo sequences. In International conference of image and vision computing New Zealand (IVCNZ) (pp. 1–6). CrossRefGoogle Scholar
  30. Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 475–480. CrossRefGoogle Scholar
  31. Vogel, C., Schindler, K., & Roth, S. (2011). 3d scene flow estimation with a rigid motion prior. In ICCV (pp. 1291–1298). Google Scholar
  32. Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In ECCV (Vol. 1, pp. 739–751). Google Scholar
  33. Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision, 95, 29–51. zbMATHCrossRefGoogle Scholar
  34. Xiao, J., Cheng, H., Sawhney, H. S., Rao, C., & Isnardi, M. A. (2006). Bilateral filtering-based optical flow estimation with occlusion detection. In ECCV (Vol. 1, pp. 211–224). Google Scholar
  35. Xu, L., Chen, J., & Jia, J. (2008). A segmentation based variational model for accurate optical flow estimation. In ECCV (Vol. 1, pp. 671–684). Google Scholar
  36. Xu, L., Jia, J., & Matsushita, Y. (2010). Motion detail preserving optical flow estimation. In CVPR (pp. 1293–1300). Google Scholar
  37. Yoon, K. J., & Kweon, I. S. (2006). Adaptive support-weight approach for correspondence search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 650–656. CrossRefGoogle Scholar
  38. Zhang, Z., & Faugeras, O. D. (1992). Estimation of displacements from two 3-d frames obtained from stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1141–1156. CrossRefGoogle Scholar
  39. Zhang, Y., & Kambhamettu, C. (2001). On 3d scene flow and structure estimation. In CVPR (Vol. 2, pp. 778–785). Google Scholar
  40. Zhang, L., Curless, B., & Seitz, S. M. (2003). Spacetime stereo: shape recovery for dynamic scenes. In CVPR (Vol. 2, pp. 367–374). Google Scholar
  41. Zhang, G., Jia, J., Wong, T. T., & Bao, H. (2009). Consistent depth maps recovery from a video sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 974–988. CrossRefGoogle Scholar
  42. Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, B. R. A., & Seidel, H. P. (2009). Complementary optic flow. In EMMCVPR (pp. 207–220). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Chinese University of Hong KongHong KongChina

Personalised recommendations