Abstract
This paper studies the use of temporal consistency to match appearance descriptors and handle complex ambiguities when computing dynamic depth maps from stereo. Previous attempts have designed 3D descriptors over the spacetime volume and have been mostly used for monocular action recognition, as they cannot deal with perspective changes. Our approach is based on a state-of-the-art 2D dense appearance descriptor which we extend in time by means of optical flow priors, and can be applied to wide-baseline stereo setups. The basic idea behind our approach is to capture the changes around a feature point in time instead of trying to describe the spatiotemporal volume. We demonstrate its effectiveness on very ambiguous synthetic video sequences with ground truth data, as well as real sequences.
Work supported by the Spanish Ministry of Science and Innovation under projects RobTaskCoop (DPI2010-17112), PAU+ (DPI2011-27510) and MIPRCV (Consolider-Ingenio 2010)(CSD2007-00018), and the EU ARCAS Project FP7-ICT-2011-287617. E. Trulls is supported by scholarship from Universitat Politècnica de Catalunya.
Chapter PDF
References
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. T. PAMI 27 (2005)
Tola, E., Lepetit, V., Fua, P.: Daisy: An efficient dense descriptor applied to wide-baseline stereo. T. PAMI 32 (2010)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Int. Conf. on Multimedia (2007)
Derpanis, K., Sizintsev, M., Cannons, K., Wildes, R.: Efficient action spotting based on a spacetime oriented structure representation. In: CVPR (2010)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Sizintsev, M., Wildes, R.: Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching. In: CVPR (2009)
Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR, Washington, USA, pp. 511–517 (2004)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. CVIU (2008)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. T. PAMI (2002)
Kokkinos, I., Yuille, A.: Scale invariance without scale selection. In: CVPR (2008)
Moreno-Noguer, F.: Deformation and illumination invariant feature point descriptor. In: CVPR (2011)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. T. PAMI 23, 1222–1239 (2001)
Zhang, L., Curless, B., Seitz, S.M.: Spacetime stereo: Shape recovery for dynamic scenes. In: CVPR (2003)
Davis, J., Ramamoothi, R., Rusinkiewicz, S.: Spacetime stereo: A unifying framework for depth from triangulation. In: CVPR (2003)
Rodriguez, M., Ahmed, J., Shah, M.: Action MACH: A spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)
Carceroni, R., Kutulakos, K.: Multi-view scene capture by surfel sampling: From video streams to non-rigid 3D motion, shape reflectance. In: ICCV, pp. 60–67 (2001)
Zhang, Y., Kambhamettu, C., Kambhamettu, R.: On 3D scene flow and structure estimation. In: CVPR, pp. 778–785 (2001)
Huguet, F., Devernay, F.: A variational method for scene flow estimation from stereo sequences. In: ICCV, Rio de Janeiro, Brasil (2007)
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., Cremers, D.: Efficient Dense Scene Flow from Sparse or Dense Stereo Data. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 739–751. Springer, Heidelberg (2008)
Liu, C.: Beyond pixels: Exploring new representations and applications for motion analysis. PhD Thesis, MIT (2009)
Fusiello, A., Trucco, E., Verri, A.: A compact algorithm for rectification of stereo pairs. Machine Vision and Applications 12, 16–22 (2000)
Moreno-Noguer, F., Porta, J.M., Fua, P.: Exploring Ambiguities for Monocular Non-Rigid Shape Estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 370–383. Springer, Heidelberg (2010)
Simo-Serra, E., Ramisa, A., Alenya, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: CVPR (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trulls, E., Sanfeliu, A., Moreno-Noguer, F. (2012). Spatiotemporal Descriptor for Wide-Baseline Stereo Reconstruction of Non-rigid and Ambiguous Scenes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-33712-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)