Skip to main content

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

Abstract

The estimation of viewpoints and keypoints effectively enhance object detection methods by extracting valuable traits of the object instances. While the output of both processes differ, i.e., angles vs. list of characteristic points, they indeed share the same focus on how the object is placed in the scene, inducing that there is a certain level of correlation between them. Therefore, we propose a convolutional neural network that jointly computes the viewpoint and keypoints for different object categories. By training both tasks together, each task improves the accuracy of the other. Since the labelling of object keypoints is very time consuming for human annotators, we also introduce a new synthetic dataset with automatically generated viewpoint and keypoints annotations. Our proposed network can also be trained on datasets that contain viewpoint and keypoints annotations or only one of them. The experiments show that the proposed approach successfully exploits this implicit correlation between the tasks and outperforms previous techniques that are trained independently .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 468–475 (2017)

    Google Scholar 

  2. Chang, A.X., et al.: Shapenet: An information-rich 3D model repository. CoRR abs/1512.3012 (2015)

    Google Scholar 

  3. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5669–5678 (2017)

    Google Scholar 

  4. Divon, G., Tal, A.: Viewpoint estimation–insights & model. In: IEEE European Conference on Computer Vision, pp. 252–268 (2018)

    Google Scholar 

  5. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  6. Fenzi, M., Leal-Taixe, L., Rosenhahn, B., Ostermann, J.: Class generative models based on feature regression for pose estimation of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 755–762 (2013)

    Google Scholar 

  7. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)

    Google Scholar 

  8. Ghodrati, A., Pedersoli, M., Tuytelaars, T.: Is 2D information enough for viewpoint estimation? In: British Machine Vision Conference, pp. 1–12 (2014)

    Google Scholar 

  9. Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)

    Google Scholar 

  10. He, K., Sigal, L., Sclaroff, S.: Parameterizing object detectors in the continuous pose space. In: IEEE European Conference on Computer Vision, pp. 450–465 (2014)

    Google Scholar 

  11. Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)

    Google Scholar 

  12. Keys, R.G.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)

    Article  MathSciNet  Google Scholar 

  13. Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1688–1695 (2010)

    Google Scholar 

  14. Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Advances in Neural Information Processing Systems, pp. 1601–1609 (2014)

    Google Scholar 

  15. Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (2016)

    Google Scholar 

  16. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: IEEE European Conference on Computer Vision, pp. 483–499 (2016)

    Google Scholar 

  17. Panareda Busto, P., Gall, J.: Viewpoint refinement and estimation with adapted synthetic data. Comput. Vis. Image Underst. 169, 75–89 (2018)

    Article  Google Scholar 

  18. Panareda Busto, P., Liebelt, J., Gall, J.: Adaptation of synthetic data for coarse-to-fine viewpoint refinement. In: British Machine Vision Conference (2015)

    Google Scholar 

  19. Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: IEEE International Conference on Robotics and Automation, pp. 2011–2018 (2017)

    Google Scholar 

  20. Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: IEEE International Conference on Computer Vision, pp. 1278–1286 (2015)

    Google Scholar 

  21. Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3362–3369 (2012)

    Google Scholar 

  22. Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.: 3D object class detection in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition: Workshops, pp. 1–10 (2015)

    Google Scholar 

  23. Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B.: Learning people detection models from few training samples. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1480 (2011)

    Google Scholar 

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  25. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)

    Google Scholar 

  26. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)

    Google Scholar 

  27. Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: IEEE International Conference on Computer Vision, pp. 2603–2610 (2011)

    Google Scholar 

  28. Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

    Google Scholar 

  29. Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)

    Google Scholar 

  30. Wang, Y., et al.: 3D pose estimation for fine-grained object categories. In: IEEE European Conference on Computer Vision: Workshops (2018)

    Google Scholar 

  31. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

    Google Scholar 

  32. Wu, J., et al.: Single image 3d interpreter network. In: IEEE European Conference on Computer Vision, pp. 365–382 (2016)

    Google Scholar 

  33. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)

    Google Scholar 

  34. Xiang, Y., et al.: Objectnet3D: a large scale database for 3D object recognition. In: IEEE European Conference on Computer Vision, pp. 160–176 (2016)

    Google Scholar 

  35. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)

    Google Scholar 

  36. Zhou, X., Karpur, A., Luo, L., Huang, Q.: Starmap for category-agnostic keypoint and viewpoint estimation. In: IEEE European Conference on Computer Vision, pp. 318–334 (2018)

    Google Scholar 

  37. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)

    Google Scholar 

Download references

Acknowledgement

The work has been supported by the ERC Starting Grant ARCA (677650).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pau Panareda Busto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Panareda Busto, P., Gall, J. (2019). Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33676-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33675-2

  • Online ISBN: 978-3-030-33676-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics