Spatiotemporal Descriptor for Wide-Baseline Stereo Reconstruction of Non-rigid and Ambiguous Scenes

Trulls, Eduard; Sanfeliu, Alberto; Moreno-Noguer, Francesc

doi:10.1007/978-3-642-33712-3_32

Spatiotemporal Descriptor for Wide-Baseline Stereo Reconstruction of Non-rigid and Ambiguous Scenes

Eduard Trulls²¹,
Alberto Sanfeliu²¹ &
Francesc Moreno-Noguer²¹

Conference paper

9516 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7574))

Abstract

This paper studies the use of temporal consistency to match appearance descriptors and handle complex ambiguities when computing dynamic depth maps from stereo. Previous attempts have designed 3D descriptors over the spacetime volume and have been mostly used for monocular action recognition, as they cannot deal with perspective changes. Our approach is based on a state-of-the-art 2D dense appearance descriptor which we extend in time by means of optical flow priors, and can be applied to wide-baseline stereo setups. The basic idea behind our approach is to capture the changes around a feature point in time instead of trying to describe the spatiotemporal volume. We demonstrate its effectiveness on very ambiguous synthetic video sequences with ground truth data, as well as real sequences.

Work supported by the Spanish Ministry of Science and Innovation under projects RobTaskCoop (DPI2010-17112), PAU+ (DPI2011-27510) and MIPRCV (Consolider-Ingenio 2010)(CSD2007-00018), and the EU ARCAS Project FP7-ICT-2011-287617. E. Trulls is supported by scholarship from Universitat Politècnica de Catalunya.

Download to read the full chapter text

Chapter PDF

References

Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. T. PAMI 27 (2005)
Google Scholar
Tola, E., Lepetit, V., Fua, P.: Daisy: An efficient dense descriptor applied to wide-baseline stereo. T. PAMI 32 (2010)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Int. Conf. on Multimedia (2007)
Google Scholar
Derpanis, K., Sizintsev, M., Cannons, K., Wildes, R.: Efficient action spotting based on a spacetime oriented structure representation. In: CVPR (2010)
Google Scholar
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Google Scholar
Sizintsev, M., Wildes, R.: Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching. In: CVPR (2009)
Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: CVPR, Washington, USA, pp. 511–517 (2004)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. CVIU (2008)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. T. PAMI (2002)
Google Scholar
Kokkinos, I., Yuille, A.: Scale invariance without scale selection. In: CVPR (2008)
Google Scholar
Moreno-Noguer, F.: Deformation and illumination invariant feature point descriptor. In: CVPR (2011)
Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. T. PAMI 23, 1222–1239 (2001)
Article Google Scholar
Zhang, L., Curless, B., Seitz, S.M.: Spacetime stereo: Shape recovery for dynamic scenes. In: CVPR (2003)
Google Scholar
Davis, J., Ramamoothi, R., Rusinkiewicz, S.: Spacetime stereo: A unifying framework for depth from triangulation. In: CVPR (2003)
Google Scholar
Rodriguez, M., Ahmed, J., Shah, M.: Action MACH: A spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)
Google Scholar
Carceroni, R., Kutulakos, K.: Multi-view scene capture by surfel sampling: From video streams to non-rigid 3D motion, shape reflectance. In: ICCV, pp. 60–67 (2001)
Google Scholar
Zhang, Y., Kambhamettu, C., Kambhamettu, R.: On 3D scene flow and structure estimation. In: CVPR, pp. 778–785 (2001)
Google Scholar
Huguet, F., Devernay, F.: A variational method for scene flow estimation from stereo sequences. In: ICCV, Rio de Janeiro, Brasil (2007)
Google Scholar
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., Cremers, D.: Efficient Dense Scene Flow from Sparse or Dense Stereo Data. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 739–751. Springer, Heidelberg (2008)
Chapter Google Scholar
Liu, C.: Beyond pixels: Exploring new representations and applications for motion analysis. PhD Thesis, MIT (2009)
Google Scholar
Fusiello, A., Trucco, E., Verri, A.: A compact algorithm for rectification of stereo pairs. Machine Vision and Applications 12, 16–22 (2000)
Article Google Scholar
Moreno-Noguer, F., Porta, J.M., Fua, P.: Exploring Ambiguities for Monocular Non-Rigid Shape Estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 370–383. Springer, Heidelberg (2010)
Chapter Google Scholar
Simo-Serra, E., Ramisa, A., Alenya, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: CVPR (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut de Robòtica i Informàtica Industrial (CSIC/UPC), C/ Llorens i Artigas 4-6, 08028, Barcelona, Spain
Eduard Trulls, Alberto Sanfeliu & Francesc Moreno-Noguer

Authors

Eduard Trulls
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Sanfeliu
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Moreno-Noguer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trulls, E., Sanfeliu, A., Moreno-Noguer, F. (2012). Spatiotemporal Descriptor for Wide-Baseline Stereo Reconstruction of Non-rigid and Ambiguous Scenes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-33712-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics