Advertisement

Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images

  • Siddharth MahendranEmail author
  • Haider Ali
  • René Vidal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11129)

Abstract

Current CNN-based algorithms for recovering the 3D pose of an object in an image assume knowledge about both the object category and its 2D localization in the image. In this paper, we relax one of these constraints and propose to solve the task of joint object category and 3D pose estimation from an image assuming known 2D localization. We design a new architecture for this task composed of a feature network that is shared between subtasks, an object categorization network built on top of the feature network, and a collection of category dependent pose regression networks. We also introduce suitable loss functions and a training method for the new architecture. Experiments on the challenging PASCAL3D+ dataset show state-of-the-art performance in the joint categorization and pose estimation task. Moreover, our performance on the joint task is comparable to the performance of state-of-the-art methods on the simpler 3D pose estimation with known object category task.

Keywords

3D pose estimation Category-dependent pose networks Multi-task networks ResNet architecture 

Notes

Acknowledgement

This research was supported by NSF grant 1527340.

References

  1. 1.
    Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1510–1519, June 2015Google Scholar
  2. 2.
    Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694, December 2015Google Scholar
  3. 3.
    Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640, July 2017Google Scholar
  4. 4.
    Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2011–2018, May 2017Google Scholar
  5. 5.
    Wu, J., et al.: Single image 3D interpreter network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 365–382. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_22CrossRefGoogle Scholar
  6. 6.
    Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: IEEE International Conference on Computer Vision Workshop on Recovering 6D Object Pose (2017)Google Scholar
  7. 7.
    Kokkinos, I.: UberNet: training a ‘universal’ convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  8. 8.
    Elhoseiny, M., El-Gaaly, T., Bakry, A., Elgammal, A.: A comparative analysis and study of multiview CNN models for joint object categorization and pose estimation. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. ICML 2016, vol. 18, pp. 888–897. JMLR.org (2016)Google Scholar
  9. 9.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82, March 2014Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016Google Scholar
  11. 11.
    Hara, K., Vemulapalli, R., Chellappa, R.: Designing deep convolutional neural networks for continuous object orientation estimation. CoRR abs/1702.01499 (2017)Google Scholar
  12. 12.
    López-Sastre, R.J., Tuytelaars, T., Savarese, S.: Deformable part models revisited: a performance evaluation for object category pose estimation. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1052–1059, November 2011Google Scholar
  13. 13.
    Hejrati, M., Ramanan, D.: Analysis by synthesis: 3D object recognition by object reconstruction. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2449–2456, June 2014Google Scholar
  14. 14.
    Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3762–3769, June 2014Google Scholar
  15. 15.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3362–3369, June 2012Google Scholar
  16. 16.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, October 2007Google Scholar
  17. 17.
    Bakry, A., El-Gaaly, T., Elhoseiny, M., Elgammal, A.: Joint object recognition and pose estimation using a nonlinear view-invariant latent generative model. In: IEEE Winter Applications of Computer Vision Conference (2016)Google Scholar
  18. 18.
    Mottaghi, R., Xiang, Y., Savarese, S.: A coarse-to-fine model for 3D pose estimation and sub-category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  19. 19.
    Juranek, R., Herout, A., Dubska, M., Zemcik, P.: Real-time pose estimation piggybacked on object detection. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  20. 20.
    Wang, Y., Li, S., Jia, M., Liang, W.: Viewpoint estimation for objects with convolutional neural network trained on synthetic images. In: Chen, E., Gong, Y., Tie, Y. (eds.) PCM 2016. LNCS, vol. 9917, pp. 169–179. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48896-7_17CrossRefGoogle Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  22. 22.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)Google Scholar
  23. 23.
    Li, W., Luo, Y., Wang, P., Qin, Z., Zhou, H., Qiao, H.: Recent advances on application of deep learning for recovering object pose. In: International Conference on Robotics and Biomimetics (2016)Google Scholar
  24. 24.
    Braun, M., Rao, Q., Wang, Y., Flohr, F.: Pose-RCNN: joint object detection and pose estimation using 3D object proposals. In: International Conference on Intelligent Transportation Systems (2016)Google Scholar
  25. 25.
    Massa, F., Aubry, M., Marlet, R.: Convolutional neural networks for joint object detection and pose estimation: a comparative study. CoRR abs/1412.7190 (2014)Google Scholar
  26. 26.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (2016)Google Scholar
  27. 27.
    Oñoro-Rubio, D., López-Sastre, R.J., Redondo-Cabrera, C., Gil-Jiménez, P.: The challenge of simultaneous object detection and pose estimation: a comparative study. coRR abs/1801.08110 (2018)Google Scholar
  28. 28.
    Afifi, A.J., Hellwich, O., Soomro, T.A.: Simultaneous object classification and viewpoint estimation using deep multi-task convolutional neural network. In: The 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2018)Google Scholar
  29. 29.
    Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. coRR abs/1801.08103 (2018)Google Scholar
  30. 30.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)Google Scholar
  31. 31.
    Chollet, F.: Keras (2015). https://github.com/fchollet/keras
  32. 32.
    TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org
  33. 33.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  34. 34.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, June 2009Google Scholar
  35. 35.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Center for Imaging ScienceJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations