Towards Meaningful Uncertainty Information for CNN Based 6D Pose Estimates

Richter-Klug, Jesse; Frese, Udo

doi:10.1007/978-3-030-34995-0_37

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11754))

Included in the following conference series:

International Conference on Computer Vision Systems

2658 Accesses
4 Citations

Abstract

Image based object recognition and pose estimation is nowadays a heavily focused research field important for robotic object manipulation. Despite the impressive recent success of CNNs to our knowledge none includes a self-estimation of its predicted pose’s uncertainty.

In this paper we introduce a novel fusion-based CNN output architecture for 6d object pose estimation obtaining competitive performance on the YCB-Video dataset while also providing a meaningful uncertainty information per 6d pose estimate. It is motivated by the recent success in semantic segmentation, which means that CNNs can learn to know what they see in a pixel. Therefore our CNN produces a per-pixel output of a point in object coordinates with image space uncertainty, which is then fused by (generalized) PnP resulting in a 6d pose with \(6\times 6\) covariance matrix. We show that a CNN can compute image space uncertainty while the way from there to pose uncertainty is well solved analytically. In addition, the architecture allows to fuse additional sensor and context information (e.g. binocular or depth data) and makes the CNN independent of the camera parameters by which a training sample was taken. (Code available under https://github.com/JesseRK/6D-UI-CNN-Output.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is, because the network is trained to distinguish each and every object point based on its appearance (which is the same for multiple points within a symmetrical object).

References

Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Chapter Google Scholar
Crivellaro, A., Rad, M., Verdie, Y., Moo Yi, K., Fua, P., Lepetit, V.: A novel representation of parts for accurate 3D object detection and tracking in monocular images. In: ICCV, pp. 4391–4399 (2015)
Google Scholar
Hertzberg, C., Wagner, R., Frese, U., Schröder, L.: Integrating generic sensor fusion algorithms with sound state representations through encapsulation of manifolds. Inf. Fusion 14(1), 57–77 (2013)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: ICCV (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: NIPS, pp. 971–980 (2017)
Google Scholar
Kneip, L., Furgale, P.: OpenGV: a unified and generalized approach to real-time calibrated geometric vision. In: ICRA, pp. 1–8. IEEE (2014)
Google Scholar
Kneip, L., Furgale, P., Siegwart, R.: Using multi-camera systems in robotics: efficient solutions to the NPnP problem. In: ICRA, pp. 3770–3776. IEEE (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. IJCV 81(2), 155 (2009)
Article Google Scholar
Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 125–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_8
Chapter Google Scholar
Periyasamy, A.S., Schwarz, M., Behnke, S.: Robust 6D object pose estimation in cluttered scenes using semantic segmentation and pose regression networks. In: IROS, pp. 6660–6666. IEEE (2018)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C (2nd ed.): The Art of Scientific Computing. Cambridge University Press, New York (1992)
MATH Google Scholar
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: ICCV, pp. 3828–3836 (2017)
Google Scholar
Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond (2018)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV (2015)
Google Scholar
Sundermeyer, M., Marton, Z.-C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 712–729. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_43
Chapter Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR, pp. 292–301 (2018)
Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
MATH Google Scholar
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790 (2018)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: dense 6D pose object detector in RGB images. arXiv preprint arXiv:1902.11020 (2019)
Zeng, A., et al.: Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge. In: ICRA, pp. 1383–1386. IEEE (2017)
Google Scholar
Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., Okutomi, M.: Revisiting the PnP problem: a fast, general and optimal solution. In: ICCV, pp. 2344–2351 (2013)
Google Scholar

Download references

Acknowledgments

The research reported in this paper has been supported by the German Research Foundation DFG, as part of Collaborative Research Center (Sonderforschungsbereich) 1320 EASE - Everyday Activity Science and Engineering, University of Bremen (http://www.ease-crc.org/). The research was conducted in subproject R02 ‘Multi-cue perception supported by background knowledge’.

Author information

Authors and Affiliations

University Bremen, 28359, Bremen, Germany
Jesse Richter-Klug & Udo Frese

Authors

Jesse Richter-Klug
View author publications
You can also search for this author in PubMed Google Scholar
Udo Frese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesse Richter-Klug .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas (CERTH-ITI), Thessaloniki, Greece
Dimitrios Tzovaras
Centre for Research and Technology Hellas (CERTH-ITI), Thessaloniki, Greece
Dimitrios Giakoumis
Vienna University of Technology, Vienna, Austria
Markus Vincze
Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece
Antonis Argyros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Richter-Klug, J., Frese, U. (2019). Towards Meaningful Uncertainty Information for CNN Based 6D Pose Estimates. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds) Computer Vision Systems. ICVS 2019. Lecture Notes in Computer Science(), vol 11754. Springer, Cham. https://doi.org/10.1007/978-3-030-34995-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-34995-0_37
Published: 23 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34994-3
Online ISBN: 978-3-030-34995-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics