Skip to main content
Log in

Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Numerous stereo matching algorithms have been proposed to obtain disparity estimation for a single pair of stereo images. However, simply even applying the best of them to temporal frames independently, i.e., without considering the temporal consistency between consecutive frames, may suffer from the undesirable artifacts. Here, we proposed an adaptive, spatiotemporally consistent, constraints-based systematic method that generates spatiotemporally consistent disparity maps for stereo video image sequences. Firstly, a reliable temporal neighborhood is used to enforce the “self-similarity” assumption and prevent errors caused by false optical flow matching from propagating between consecutive frames. Furthermore, we formulate the adaptive temporal predicted disparity map as prior knowledge of the current frame. It is used as a soft constraint to enhance the temporal consistency of disparities, increase the robustness to luminance variance, and restrict the range of the potential disparities for each pixel. Additionally, to further strengthen smooth variation of disparities, the adaptive temporal segment confidence is incorporated as a soft constraint to reduce ambiguities caused by under- and over-segmentation, and retain the disparity discontinuities that align with 3D object boundaries from geometrically smooth, but strong color gradient regions. Experimental evaluations demonstrate that our method significantly improves the spatiotemporal consistency both quantitatively and qualitatively compared with other state-of-the-art methods on the synthetic DCB and realistic KITTI datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bartczak, B., Jung, D., Koch, R.: Real-Time Neighborhood Based Disparity Estimation Incorporating Temporal Evidence, pp. 153–162. Springer, Berlin (2008)

    Google Scholar 

  2. Čech, J., Sanchez-Riera, J., Horaud, R.: Scene flow estimation by growing correspondence seeds. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3129–3136. IEEE (2011)

  3. Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C.: A deep visual correspondence embedding model for stereo matching costs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 972–980 (2015)

  4. Dahan, M.J., Chen, N., Shamir, A., Cohen-Or, D.: Combining color and depth for enhanced image segmentation and retargeting. Vis. Comput. 28(12), 1181–1193 (2012)

    Article  Google Scholar 

  5. Davis, J., Ramamoorthi, R., Rusinkiewicz, S.: Spacetime stereo: a unifying framework for depth from triangulation. In: Proceedings. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, vol. 2, pp. II–359. IEEE (2003)

  6. Dobias, M., Sara, R.: Real-time global prediction for temporally stable stereo. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 704–707 (2011)

  7. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the Kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)

  8. Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017)

  9. Gong, M.: Real-time joint disparity and disparity flow estimation on programmable graphics hardware. Comput. Vis. Image Underst. 113(1), 90–100 (2009)

    Article  Google Scholar 

  10. Guerrero, P., Winnemöller, H., Li, W., Mitra, N.J.: Depthcut: improved depth edge estimation using multiple unreliable channels. Vis. Comput. 34(9), 1165–1176 (2017)

    Article  Google Scholar 

  11. Guney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4165–4175 (2015)

  12. Hamming distance. https://en.wikipedia.org/wiki/Hamming_distance

  13. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)

    Article  Google Scholar 

  14. Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally Consistent Disparity and Optical Flow via Efficient Spatio-Temporal Filtering, pp. 165–177. Springer, Berlin (2012)

    Google Scholar 

  15. Hung, C.H., Xu, L., Jia, J.: Consistent binocular depth and scene flow with chained temporal profiles. Int. J. Comput. Vis. 102(1–3), 271–292 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Jiang, J., Cheng, J., Chen, B., Wu, X.: Disparity prediction between adjacent frames for dynamic scenes. Neurocomputing 142, 335–342 (2014)

    Article  Google Scholar 

  17. Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression (2017). arXiv preprint arxiv:1703.04309

  18. Khoshabeh, R., Chan, S.H., Nguyen, T.Q.: Spatio-temporal consistency in video disparity estimation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 885–888. IEEE (2011)

  19. Kitti 2012 stereo benchmark. http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo

  20. Kitti 2015 stereo benchmark. http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo

  21. Kordelas, G.A., Alexiadis, D.S., Daras, P., Izquierdo, E.: Revisiting guided image filter based stereo matching and scanline optimization for improved disparity estimation. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3803–3807. IEEE (2014)

  22. Larsen, E.S., Mordohai, P., Pollefeys, M., Fuchs, H.: Temporally consistent reconstruction from multiple video streams using enhanced belief propagation. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)

  23. Li, L., Yu, X., Zhang, S., Zhao, X., Zhang, L.: 3d cost aggregation with multiple minimum spanning trees for stereo matching. Appl. Opt. 56(12), 3411–3420 (2017)

    Article  Google Scholar 

  24. Li, X., Liu, J.: Efficient stereo matching using segment optimization. In: ICIP (2016)

  25. Li, Y., Zhang, J., Zhong, Y., Wang, M.: An efficient stereo matching based on fragment matching. Vis. Comput. 1–13 (2018). https://doi.org/10.1007/s00371-018-1491-0

  26. Lin, S.H., Chung, P.C.: Temporal consistency enhancement of depth video sequence. In: 2014 International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), vol. 3, pp. 1897–1900. IEEE (2014)

  27. Liu, F., Philomin, V.: Disparity estimation in stereo sequences using scene flow. In: Proceedings of the British Machine Vision Conference, pp. 55.1–55.11. BMVA Press (2009)

  28. Liu, J., Li, C., Fan, X., Wang, Z., Shi, M., Yang, J.: View synthesis with 3d object segmentation-based asynchronous blending and boundary misalignment rectification. Vis. Comput. 32(6), 989–999 (2016)

    Article  Google Scholar 

  29. Liu, J., Li, C., Mei, F., Wang, Z.: 3d entity-based stereo matching with ground control points and joint second-order smoothness prior. Vis. Comput. 31(9), 1253–1269 (2015)

    Article  Google Scholar 

  30. Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)

  31. Matsuo, T., Fukushima, N., Ishibashi, Y.: Weighted joint bilateral filter with slope depth compensation filter for depth map refinement. VISAPP 2, 300–309 (2013)

    Google Scholar 

  32. Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)

  33. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)

  34. Min, D., Lu, J., Do, M.N.: Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 21(3), 1176–1190 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  35. Min, D., Yea, S., Vetro, A.: Temporally consistent stereo matching using coherence function. In: 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2010, pp. 1–4. IEEE (2010)

  36. Ntouskos, V., Pirri, F.: Confidence driven tgv fusion (2016). arXiv preprint arXiv:1603.09302

  37. Pham, C.C., Nguyen, V.D., Jeon, J.W.: Efficient spatio-temporal local stereo matching using information permeability filtering. In: 2012 19th IEEE International Conference on Image Processing, pp. 2965–2968 (2012)

  38. Qi, F., Zhao, D., Liu, S., Fan, X.: 3d visual saliency detection model with generated disparity map. Multimed. Tools Appl. 76(2), 3087–3103 (2017)

    Article  Google Scholar 

  39. Richardt, C., Orr, D., Davies, I., Criminisi, A., Dodgson, N.A.: Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In: European Conference on Computer Vision, pp. 510–523. Springer (2010)

  40. Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC, vol. 2, p. 4 (2016)

  41. Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective loss (2016). arXiv preprint arxiv:1701.00165

  42. Sizintsev, M., Wildes, R.P.: Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 493–500. IEEE (2009)

  43. Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439 (2010)

  44. Taniai, T., Sinha, S.N., Sato, Y.: Fast multi-frame stereo scene flow with motion segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6891–6900. IEEE (2017)

  45. Vogel, C., Roth, S., Schindler, K.: View-consistent 3d scene flow estimation over multiple frames. In: European Conference on Computer Vision, pp. 263–278. Springer (2014)

  46. Vogel, C., Schindler, K., Roth, S.: 3d scene flow estimation with a piecewise rigid scene model. Int. J. Comput. Vis. 115(1), 1–28 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  47. Vretos, N., Daras, P.: Temporal and color consistent disparity estimation in stereo videos. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3798–3802. IEEE (2014)

  48. Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3d motion understanding. Int. J. Comput. Vis. 95(1), 29–51 (2011)

    Article  MATH  Google Scholar 

  49. Xing, G., Liu, Y., Zhang, W., Ling, H.: Light mixture intrinsic image decomposition based on a single rgb-d image. Vis. Comput. 32(6–8), 1013–1023 (2016)

    Article  Google Scholar 

  50. Xu, S., Zhang, F., He, X., Shen, X., Zhang, X.: Pm-pm: patchmatch with potts model for object segmentation and stereo matching. IEEE Trans. Image Process. 24(7), 2182–2196 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  51. Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: European Conference on Computer Vision, pp. 756–771. Springer (2014)

  52. Yang, W., Zhang, G., Bao, H., Kim, J., Lee, H.Y.: Consistent depth maps recovery from a trinocular video sequence. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1466–1473. IEEE (2012)

  53. Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)

    MATH  Google Scholar 

  54. Zeng, H., Ma, K.K.: Content-adaptive temporal consistency enhancement for depth video. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 3017–3020. IEEE (2012)

  55. Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 974–988 (2009)

    Article  Google Scholar 

  56. Zhu, S., Yan, L.: Local stereo matching algorithm with efficient matching cost and adaptive guided image filter. Vis. Comput. 33(9), 1087–1102 (2017)

    Article  Google Scholar 

Download references

Funding

This study was funded by the National Natural Science Foundation of China (Grant No.: 61802109), the Natural Science Foundation of Hebei province (Grant No.: F2017205066), the Science Foundation of Hebei Normal University (Grant No.: L2017B06, L2018K02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Liu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, L., Liu, J., Ling, H. et al. Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints. Vis Comput 35, 1427–1446 (2019). https://doi.org/10.1007/s00371-018-01622-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-01622-1

Keywords

Navigation