Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors

Engelmann, Francis; Stückler, Jörg; Leibe, Bastian

doi:10.1007/978-3-319-45886-1_18

Francis Engelmann¹⁵,
Jörg Stückler¹⁵ &
Bastian Leibe¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9796))

Included in the following conference series:

German Conference on Pattern Recognition

2551 Accesses
29 Citations

Abstract

Estimating the pose and 3D shape of a large variety of instances within an object class from stereo images is a challenging problem, especially in realistic conditions such as urban street scenes. We propose a novel approach for using compact shape manifolds of the shape within an object class for object segmentation, pose and shape estimation. Our method first detects objects and estimates their pose coarsely in the stereo images using a state-of-the-art 3D object detection method. An energy minimization method then aligns shape and pose concurrently with the stereo reconstruction of the object. In experiments, we evaluate our approach for detection, pose and shape estimation of cars in real stereo images of urban street scenes. We demonstrate that our shape manifold alignment method yields improved results over the initial stereo reconstruction and object detection method in depth and pose accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, S., Mierle, K.: Ceres solver. http://ceres-solver.org
Bao, S.Y., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction with semantic priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: Proceedings of Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011)
Chapter Google Scholar
Geiger, A., Wang, C.: Joint 3D object and layout inference from a single RGB-D image. In: Gall, J., et al. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 183–195. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24947-6_15
Chapter Google Scholar
Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Häne, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). pp. 97–104 (2013)
Google Scholar
Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 30(2), 328–341 (2008)
Article Google Scholar
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014)
Google Scholar
Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. (JMLR) 6, 1783–1816 (2005)
MathSciNet MATH Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Proceedings of SIGGRAPH (1987)
Google Scholar
Menze, M., Heipke, C., Geiger, A.: Joint 3D estimation of vehicles and scene flow. In: Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (2015)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)
Google Scholar
Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)
Google Scholar
Ranftl, R., Gehrig, S., Pock, T., Bischof, H.: Pushing the limits of stereo using variational stereo estimation. In: Proceedings of the Intelligent Vehicles Symposium (2012)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: A nonrigid kernel-based framework for 2D–3D pose estimation and 2D image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1098–1115 (2011)
Article Google Scholar
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of Neural Information Processing Systems (NIPS) (2005)
Google Scholar
Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010)
Chapter Google Scholar
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Depth-from-recognition: inferring meta-data by cognitive feedback. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2007)
Google Scholar
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 756–771. Springer, Heidelberg (2014)
Google Scholar
Zheng, S., Prisacariu, V.A., Averkiou, M., Cheng, M.-M., Mitra, N.J., Shotton, J., Torr, P.H.S., Rother, C.: Object proposals estimation in depth image using compact 3D shape manifolds. In: Gall, J., et al. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 196–208. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24947-6_16
Chapter Google Scholar
Zhou, C., Güney, F., Wang, Y., Geiger, A.: Exploiting object similarity in 3D reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 2608–2623 (2013)
Article Google Scholar

Download references

Acknowledgements

This work has been supported by ERC Starting Grant CV-SUPER (ERC-2012-StG-307432).

Author information

Authors and Affiliations

Computer Vision Group, Visual Computing Institute, RWTH Aachen University, Aachen, Germany
Francis Engelmann, Jörg Stückler & Bastian Leibe

Authors

Francis Engelmann
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Stückler
View author publications
You can also search for this author in PubMed Google Scholar
Bastian Leibe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francis Engelmann .

Editor information

Editors and Affiliations

University of Hannover, Hannover, Germany
Bodo Rosenhahn
Max Planck Institute for Informatics, Saarbrücken, Germany
Bjoern Andres

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5280 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Engelmann, F., Stückler, J., Leibe, B. (2016). Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors. In: Rosenhahn, B., Andres, B. (eds) Pattern Recognition. GCPR 2016. Lecture Notes in Computer Science(), vol 9796. Springer, Cham. https://doi.org/10.1007/978-3-319-45886-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-45886-1_18
Published: 27 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45885-4
Online ISBN: 978-3-319-45886-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics