Abstract
In this chapter, a technique that automatically generates plausible depth maps from videos using nonparametric depth sampling is discussed. We demonstrate this method in cases where existing methods fail (nontranslating cameras and dynamic scenes). This technique is applicable to single images as well as videos. For videos, local motion cues are used to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, a Microsoft Kinect-based system is developed to collect a large dataset containing stereoscopic videos with known depths, and this depth estimation technique outperforms the state-of-the-art on benchmark databases. This method can be used to automatically convert a monoscopic video into stereo for 3D visualization demonstrated through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our dataset and code are publicly available at http://kevinkarsch.com/depthtransfer.
- 2.
Examples: Make3D range image dataset (http://make3d.cs.cornell.edu/data.html), B3DO dataset (http://kinectdata.com/), NYU depth datasets (http://cs.nyu.edu/~silberman/datasets/), RGB-D dataset (http://www.cs.washington.edu/rgbd-dataset/), and our own (http://kevinkarsch.com/depthtransfer).
- 3.
For further details and discussion of IRLS, see the appendix of Liu’s thesis [18].
- 4.
In all other types of videos (e.g., those with parallax or fast moving objects/pose), we do not employ this algorithm; equivalently we set the motion segmentation weight to zero (η = 0).
- 5.
The presentation of stereoscopic (left+right) video to convey the sense of depth.
- 6.
References
Batra, D., Saxena, A.: Learning the right model: efficient max-margin learning in laplacian crfs. In: CVPR (2012)
Colombari, A., Fusiello, A., Murino, V.: Continuous parallax adjustment for 3D-TV. In: IEEE Eur. Conf. Vis. Media Prod, pp. 194–200 (2005)
Delage, E., Lee, H., Ng, A.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: CVPR (2006)
Guttmann, M., Wolf, L., Cohen-Or, D.: Semi-automatic stereo extraction from video footage. In: ICCV (2009)
Han, F., Zhu, S.C.: Bayesian reconstruction of 3D shapes and scenes from a single image. In: IEEE HLK (2003)
Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: CVPR Workshop on Beyond Patches, pp. 15–22 (2006)
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). doi:10.1109/TPAMI.2010.168
Heikkila, M., Pietikainen, M.: A texture-based method for modeling the background and detecting moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006)
Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: ACM SIGGRAPH (2005)
Hoiem, D., Stein, A., Efros, A., Hebert, M.: Recovering occlusion boundaries from a single image. In: ICCV (2007)
Horry, Y., Anjyo, K., Arai, K.: Tour into the picture: using a spidery mesh interface to make animation from a single image. In: SIGGRAPH (1997)
Klein Gunnewiek, R., Berretty, R.P., Barenbrug, B., Magalhães, J.: Coherent spatial and temporal occlusion generation. In: Proc. SPIE 7237, Stereoscopic Displays and Applications XX, vol. 723713 (2009)
Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., Mukherjee, D.: Automatic 2d-to-3d image conversion using 3d examples from the internet. In: SPIE 8288, Stereoscopic Displays and Applications, vol. 82880F (2012). doi:10.1117/12.766566
Konrad, J., Wang, M., Ishwar, P.: 2d-to-3d image conversion by learning depth from examples. In: 3DCINE (2012)
Koppal, S., Zitnick, C., Cohen, M., Kang, S., Ressler, B., Colburn, A.: A viewer-centric editor for 3D movies. IEEE Comput. Graph. Appl. 31, 20–35 (2011)
Li, C., Kowdle, A., Saxena, A., Chen, T.: Towards holistic scene understanding: feedback enabled cascaded classification models. In: NIPS (2010)
Liao, M., Gao, J., Yang, R., Gong, M.: Video stereolization: combining motion analysis with user interaction. IEEE Trans. Vis. Comput. Graph. 18(7), 1079–1088 (2012)
Liu, C.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, MIT (2009)
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2368–2382 (2011)
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Luo, K., Li, D., Feng, Y., M., Z.: Depth-aided inpainting for disocclusion restoration of multi-view images using depth-image-based rendering. J. Zhejiang Univ. Sci. A 10(12), 1738–1749 (2009)
Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: CVPR (2008)
Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV (2012)
Oh, B., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: SIGGRAPH (2001)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Rubinstein, M., Liu, C., Freeman, W.: Annotation propagation: automatic annotation of large image databases via dense image correspondence. In: ECCV (2012)
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Advances in Neural Information Processing Systems 18 (2005). http://books.nips.cc/papers/files/nips18/NIPS2005_0684.pdf
Saxena, A., Sun, M., Ng, A.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Sheikh, Y., Javed, O., Kanade, T.: Background subtraction for freely moving cameras. In: ICCV (2009)
Tappen, M., Liu, C.: A bayesian approach to alignment-based image hallucination. In: ECCV (2012)
Van Pernis, A., DeJohn, M.: Dimensionalization: converting 2D films to 3D. In: SPIE 6803, Stereoscopic Displays and Applications XIX, vol. 68030T (2008). doi:10.1117/12.766566
Wang, O., Lang, M., Frei, M., Hornung, A., Smolic, A., Gross, M.: StereoBrush: interactive 2D to 3D conversion using discontinuous warps. In: SBIM (2011)
Ward, B., Kang, S.B., Bennett, E.P.: Depth director: a system for adding depth to movies. IEEE Comput. Graph. Appl. 31(1), 36–48 (2011)
Wu, C., Frahm, J.M., Pollefeys, M.: Repetition-based dense single-view reconstruction. In: CVPR (2011)
Zhang, L., Dugas-Phocion, G., Samson, J.S., Seitz, S.: Single view modeling of free-form scenes. J. Vis. Comput. Animat. 13(4), 225–235 (2002)
Zhang, G., Dong, Z., Jia, J., Wan, L., Wong, T.T., Bao, H.: Refilming with depth-inferred videos. IEEE Trans. Vis. Comput. Graph. 15(5), 828–840 (2009)
Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009)
Zhang, G., Jia, J., Hua, W., Bao, H.: Robust bilayer segmentation and motion/depth estimation with a handheld camera. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 603–617 (2011)
Zhang, L., Vazquez, C., Knorr, S.: 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011)
Acknowledgements
We would like to thank Tom Blank for his critical help in creating our dual-Kinect data collection system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Karsch, K., Liu, C., Kang, S.B. (2016). Depth Transfer: Depth Extraction from Videos Using Nonparametric Sampling. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-23048-1_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)