Advertisement

Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors

  • Francis EngelmannEmail author
  • Jörg Stückler
  • Bastian Leibe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9796)

Abstract

Estimating the pose and 3D shape of a large variety of instances within an object class from stereo images is a challenging problem, especially in realistic conditions such as urban street scenes. We propose a novel approach for using compact shape manifolds of the shape within an object class for object segmentation, pose and shape estimation. Our method first detects objects and estimates their pose coarsely in the stereo images using a state-of-the-art 3D object detection method. An energy minimization method then aligns shape and pose concurrently with the stereo reconstruction of the object. In experiments, we evaluate our approach for detection, pose and shape estimation of cars in real stereo images of urban street scenes. We demonstrate that our shape manifold alignment method yields improved results over the initial stereo reconstruction and object detection method in depth and pose accuracy.

Keywords

Object Class Stereo Image Shape Prior Monocular Image Semantic Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work has been supported by ERC Starting Grant CV-SUPER (ERC-2012-StG-307432).

Supplementary material

419026_1_En_18_MOESM1_ESM.pdf (5.2 mb)
Supplementary material 1 (pdf 5280 KB)

References

  1. 1.
    Agarwal, S., Mierle, K.: Ceres solver. http://ceres-solver.org
  2. 2.
    Bao, S.Y., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction with semantic priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  3. 3.
    Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: Proceedings of Neural Information Processing Systems (NIPS) (2015)Google Scholar
  4. 4.
    Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  5. 5.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  6. 6.
    Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Geiger, A., Wang, C.: Joint 3D object and layout inference from a single RGB-D image. In: Gall, J., et al. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 183–195. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24947-6_15 CrossRefGoogle Scholar
  8. 8.
    Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  9. 9.
    Häne, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). pp. 97–104 (2013)Google Scholar
  10. 10.
    Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 30(2), 328–341 (2008)CrossRefGoogle Scholar
  11. 11.
    Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. (JMLR) 6, 1783–1816 (2005)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Proceedings of SIGGRAPH (1987)Google Scholar
  14. 14.
    Menze, M., Heipke, C., Geiger, A.: Joint 3D estimation of vehicles and scene flow. In: Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (2015)Google Scholar
  15. 15.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)Google Scholar
  16. 16.
    Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)Google Scholar
  17. 17.
    Ranftl, R., Gehrig, S., Pock, T., Bischof, H.: Pushing the limits of stereo using variational stereo estimation. In: Proceedings of the Intelligent Vehicles Symposium (2012)Google Scholar
  18. 18.
    Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  19. 19.
    Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: A nonrigid kernel-based framework for 2D–3D pose estimation and 2D image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1098–1115 (2011)CrossRefGoogle Scholar
  20. 20.
    Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of Neural Information Processing Systems (NIPS) (2005)Google Scholar
  21. 21.
    Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Depth-from-recognition: inferring meta-data by cognitive feedback. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2007)Google Scholar
  23. 23.
    Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 756–771. Springer, Heidelberg (2014)Google Scholar
  24. 24.
    Zheng, S., Prisacariu, V.A., Averkiou, M., Cheng, M.-M., Mitra, N.J., Shotton, J., Torr, P.H.S., Rother, C.: Object proposals estimation in depth image using compact 3D shape manifolds. In: Gall, J., et al. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 196–208. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24947-6_16 CrossRefGoogle Scholar
  25. 25.
    Zhou, C., Güney, F., Wang, Y., Geiger, A.: Exploiting object similarity in 3D reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  26. 26.
    Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 2608–2623 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Francis Engelmann
    • 1
    Email author
  • Jörg Stückler
    • 1
  • Bastian Leibe
    • 1
  1. 1.Computer Vision Group, Visual Computing InstituteRWTH Aachen UniversityAachenGermany

Personalised recommendations