Depth Transfer: Depth Extraction from Videos Using Nonparametric Sampling

Karsch, Kevin; Liu, Ce; Kang, Sing Bing

doi:10.1007/978-3-319-23048-1_9

Kevin Karsch³,
Ce Liu⁴ &
Sing Bing Kang⁵

2288 Accesses
9 Citations

Abstract

In this chapter, a technique that automatically generates plausible depth maps from videos using nonparametric depth sampling is discussed. We demonstrate this method in cases where existing methods fail (nontranslating cameras and dynamic scenes). This technique is applicable to single images as well as videos. For videos, local motion cues are used to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, a Microsoft Kinect-based system is developed to collect a large dataset containing stereoscopic videos with known depths, and this depth estimation technique outperforms the state-of-the-art on benchmark databases. This method can be used to automatically convert a monoscopic video into stereo for 3D visualization demonstrated through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our dataset and code are publicly available at http://kevinkarsch.com/depthtransfer.
2.
Examples: Make3D range image dataset (http://make3d.cs.cornell.edu/data.html), B3DO dataset (http://kinectdata.com/), NYU depth datasets (http://cs.nyu.edu/~silberman/datasets/), RGB-D dataset (http://www.cs.washington.edu/rgbd-dataset/), and our own (http://kevinkarsch.com/depthtransfer).
3.
For further details and discussion of IRLS, see the appendix of Liu’s thesis [18].
4.
In all other types of videos (e.g., those with parallax or fast moving objects/pose), we do not employ this algorithm; equivalently we set the motion segmentation weight to zero (η = 0).
5.
The presentation of stereoscopic (left+right) video to convey the sense of depth.
6.
See http://en.wikipedia.org/wiki/Superman_Returns.

References

Batra, D., Saxena, A.: Learning the right model: efficient max-margin learning in laplacian crfs. In: CVPR (2012)
Google Scholar
Colombari, A., Fusiello, A., Murino, V.: Continuous parallax adjustment for 3D-TV. In: IEEE Eur. Conf. Vis. Media Prod, pp. 194–200 (2005)
Google Scholar
Delage, E., Lee, H., Ng, A.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: CVPR (2006)
Book Google Scholar
Guttmann, M., Wolf, L., Cohen-Or, D.: Semi-automatic stereo extraction from video footage. In: ICCV (2009)
Book Google Scholar
Han, F., Zhu, S.C.: Bayesian reconstruction of 3D shapes and scenes from a single image. In: IEEE HLK (2003)
Google Scholar
Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: CVPR Workshop on Beyond Patches, pp. 15–22 (2006)
Google Scholar
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). doi:10.1109/TPAMI.2010.168
Article Google Scholar
Heikkila, M., Pietikainen, M.: A texture-based method for modeling the background and detecting moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006)
Article Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: ACM SIGGRAPH (2005)
Book Google Scholar
Hoiem, D., Stein, A., Efros, A., Hebert, M.: Recovering occlusion boundaries from a single image. In: ICCV (2007)
Book MATH Google Scholar
Horry, Y., Anjyo, K., Arai, K.: Tour into the picture: using a spidery mesh interface to make animation from a single image. In: SIGGRAPH (1997)
Book Google Scholar
Klein Gunnewiek, R., Berretty, R.P., Barenbrug, B., Magalhães, J.: Coherent spatial and temporal occlusion generation. In: Proc. SPIE 7237, Stereoscopic Displays and Applications XX, vol. 723713 (2009)
Google Scholar
Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., Mukherjee, D.: Automatic 2d-to-3d image conversion using 3d examples from the internet. In: SPIE 8288, Stereoscopic Displays and Applications, vol. 82880F (2012). doi:10.1117/12.766566
Konrad, J., Wang, M., Ishwar, P.: 2d-to-3d image conversion by learning depth from examples. In: 3DCINE (2012)
Google Scholar
Koppal, S., Zitnick, C., Cohen, M., Kang, S., Ressler, B., Colburn, A.: A viewer-centric editor for 3D movies. IEEE Comput. Graph. Appl. 31, 20–35 (2011)
Article Google Scholar
Li, C., Kowdle, A., Saxena, A., Chen, T.: Towards holistic scene understanding: feedback enabled cascaded classification models. In: NIPS (2010)
Google Scholar
Liao, M., Gao, J., Yang, R., Gong, M.: Video stereolization: combining motion analysis with user interaction. IEEE Trans. Vis. Comput. Graph. 18(7), 1079–1088 (2012)
Article Google Scholar
Liu, C.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, MIT (2009)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)
Google Scholar
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)
Book Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2368–2382 (2011)
Article Google Scholar
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Article Google Scholar
Luo, K., Li, D., Feng, Y., M., Z.: Depth-aided inpainting for disocclusion restoration of multi-view images using depth-image-based rendering. J. Zhejiang Univ. Sci. A 10(12), 1738–1749 (2009)
Google Scholar
Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: CVPR (2008)
Book Google Scholar
Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV (2012)
Google Scholar
Oh, B., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: SIGGRAPH (2001)
Book Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Article MATH Google Scholar
Rubinstein, M., Liu, C., Freeman, W.: Annotation propagation: automatic annotation of large image databases via dense image correspondence. In: ECCV (2012)
Google Scholar
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)
Book Google Scholar
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Advances in Neural Information Processing Systems 18 (2005). http://books.nips.cc/papers/files/nips18/NIPS2005_0684.pdf
Saxena, A., Sun, M., Ng, A.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Article Google Scholar
Sheikh, Y., Javed, O., Kanade, T.: Background subtraction for freely moving cameras. In: ICCV (2009)
Book Google Scholar
Tappen, M., Liu, C.: A bayesian approach to alignment-based image hallucination. In: ECCV (2012)
Book Google Scholar
Van Pernis, A., DeJohn, M.: Dimensionalization: converting 2D films to 3D. In: SPIE 6803, Stereoscopic Displays and Applications XIX, vol. 68030T (2008). doi:10.1117/12.766566
Wang, O., Lang, M., Frei, M., Hornung, A., Smolic, A., Gross, M.: StereoBrush: interactive 2D to 3D conversion using discontinuous warps. In: SBIM (2011)
Book Google Scholar
Ward, B., Kang, S.B., Bennett, E.P.: Depth director: a system for adding depth to movies. IEEE Comput. Graph. Appl. 31(1), 36–48 (2011)
Article Google Scholar
Wu, C., Frahm, J.M., Pollefeys, M.: Repetition-based dense single-view reconstruction. In: CVPR (2011)
Book Google Scholar
Zhang, L., Dugas-Phocion, G., Samson, J.S., Seitz, S.: Single view modeling of free-form scenes. J. Vis. Comput. Animat. 13(4), 225–235 (2002)
Article MATH Google Scholar
Zhang, G., Dong, Z., Jia, J., Wan, L., Wong, T.T., Bao, H.: Refilming with depth-inferred videos. IEEE Trans. Vis. Comput. Graph. 15(5), 828–840 (2009)
Article Google Scholar
Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009)
Article Google Scholar
Zhang, G., Jia, J., Hua, W., Bao, H.: Robust bilayer segmentation and motion/depth estimation with a handheld camera. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 603–617 (2011)
Article Google Scholar
Zhang, L., Vazquez, C., Knorr, S.: 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011)
Article Google Scholar

Download references

Acknowledgements

We would like to thank Tom Blank for his critical help in creating our dual-Kinect data collection system.

Author information

Authors and Affiliations

University of Illinois, Urbana, IL, USA
Kevin Karsch
Google Research, Cambridge, MA, USA
Ce Liu
Microsoft Research, Redmond, WA, USA
Sing Bing Kang

Authors

Kevin Karsch
View author publications
You can also search for this author in PubMed Google Scholar
Ce Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sing Bing Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Karsch .

Editor information

Editors and Affiliations

The Open University of Israel, Raanana, Israel
Tal Hassner
Google Research, Cambridge, Massachusetts, USA
Ce Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Karsch, K., Liu, C., Kang, S.B. (2016). Depth Transfer: Depth Extraction from Videos Using Nonparametric Sampling. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-23048-1_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics