Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

Panareda Busto, Pau; Gall, Juergen

doi:10.1007/978-3-030-33676-9_8

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

German Conference on Pattern Recognition

1846 Accesses
3 Altmetric

Abstract

The estimation of viewpoints and keypoints effectively enhance object detection methods by extracting valuable traits of the object instances. While the output of both processes differ, i.e., angles vs. list of characteristic points, they indeed share the same focus on how the object is placed in the scene, inducing that there is a certain level of correlation between them. Therefore, we propose a convolutional neural network that jointly computes the viewpoint and keypoints for different object categories. By training both tasks together, each task improves the accuracy of the other. Since the labelling of object keypoints is very time consuming for human annotators, we also introduce a new synthetic dataset with automatically generated viewpoint and keypoints annotations. Our proposed network can also be trained on datasets that contain viewpoint and keypoints annotations or only one of them. The experiments show that the proposed approach successfully exploits this implicit correlation between the tasks and outperforms previous techniques that are trained independently .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 468–475 (2017)
Google Scholar
Chang, A.X., et al.: Shapenet: An information-rich 3D model repository. CoRR abs/1512.3012 (2015)
Google Scholar
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5669–5678 (2017)
Google Scholar
Divon, G., Tal, A.: Viewpoint estimation–insights & model. In: IEEE European Conference on Computer Vision, pp. 252–268 (2018)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Fenzi, M., Leal-Taixe, L., Rosenhahn, B., Ostermann, J.: Class generative models based on feature regression for pose estimation of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 755–762 (2013)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Google Scholar
Ghodrati, A., Pedersoli, M., Tuytelaars, T.: Is 2D information enough for viewpoint estimation? In: British Machine Vision Conference, pp. 1–12 (2014)
Google Scholar
Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)
Google Scholar
He, K., Sigal, L., Sclaroff, S.: Parameterizing object detectors in the continuous pose space. In: IEEE European Conference on Computer Vision, pp. 450–465 (2014)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Google Scholar
Keys, R.G.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)
Article MathSciNet Google Scholar
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1688–1695 (2010)
Google Scholar
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Advances in Neural Information Processing Systems, pp. 1601–1609 (2014)
Google Scholar
Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (2016)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: IEEE European Conference on Computer Vision, pp. 483–499 (2016)
Google Scholar
Panareda Busto, P., Gall, J.: Viewpoint refinement and estimation with adapted synthetic data. Comput. Vis. Image Underst. 169, 75–89 (2018)
Article Google Scholar
Panareda Busto, P., Liebelt, J., Gall, J.: Adaptation of synthetic data for coarse-to-fine viewpoint refinement. In: British Machine Vision Conference (2015)
Google Scholar
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: IEEE International Conference on Robotics and Automation, pp. 2011–2018 (2017)
Google Scholar
Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: IEEE International Conference on Computer Vision, pp. 1278–1286 (2015)
Google Scholar
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3362–3369 (2012)
Google Scholar
Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.: 3D object class detection in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition: Workshops, pp. 1–10 (2015)
Google Scholar
Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B.: Learning people detection models from few training samples. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1480 (2011)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)
Google Scholar
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)
Google Scholar
Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: IEEE International Conference on Computer Vision, pp. 2603–2610 (2011)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)
Google Scholar
Wang, Y., et al.: 3D pose estimation for fine-grained object categories. In: IEEE European Conference on Computer Vision: Workshops (2018)
Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Google Scholar
Wu, J., et al.: Single image 3d interpreter network. In: IEEE European Conference on Computer Vision, pp. 365–382 (2016)
Google Scholar
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)
Google Scholar
Xiang, Y., et al.: Objectnet3D: a large scale database for 3D object recognition. In: IEEE European Conference on Computer Vision, pp. 160–176 (2016)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)
Google Scholar
Zhou, X., Karpur, A., Luo, L., Huang, Q.: Starmap for category-agnostic keypoint and viewpoint estimation. In: IEEE European Conference on Computer Vision, pp. 318–334 (2018)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)
Google Scholar

Download references

Acknowledgement

The work has been supported by the ERC Starting Grant ARCA (677650).

Author information

Authors and Affiliations

Computer Vision Group, University of Bonn, Bonn, Germany
Pau Panareda Busto & Juergen Gall

Authors

Pau Panareda Busto
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Gall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pau Panareda Busto .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Münster, Münster, Germany
Xiaoyi Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Panareda Busto, P., Gall, J. (2019). Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-33676-9_8
Published: 25 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics