Abstract
Image based object recognition and pose estimation is nowadays a heavily focused research field important for robotic object manipulation. Despite the impressive recent success of CNNs to our knowledge none includes a self-estimation of its predicted pose’s uncertainty.
In this paper we introduce a novel fusion-based CNN output architecture for 6d object pose estimation obtaining competitive performance on the YCB-Video dataset while also providing a meaningful uncertainty information per 6d pose estimate. It is motivated by the recent success in semantic segmentation, which means that CNNs can learn to know what they see in a pixel. Therefore our CNN produces a per-pixel output of a point in object coordinates with image space uncertainty, which is then fused by (generalized) PnP resulting in a 6d pose with \(6\times 6\) covariance matrix. We show that a CNN can compute image space uncertainty while the way from there to pose uncertainty is well solved analytically. In addition, the architecture allows to fuse additional sensor and context information (e.g. binocular or depth data) and makes the CNN independent of the camera parameters by which a training sample was taken. (Code available under https://github.com/JesseRK/6D-UI-CNN-Output.)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is, because the network is trained to distinguish each and every object point based on its appearance (which is the same for multiple points within a symmetrical object).
References
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Crivellaro, A., Rad, M., Verdie, Y., Moo Yi, K., Fua, P., Lepetit, V.: A novel representation of parts for accurate 3D object detection and tracking in monocular images. In: ICCV, pp. 4391–4399 (2015)
Hertzberg, C., Wagner, R., Frese, U., Schröder, L.: Integrating generic sensor fusion algorithms with sound state representations through encapsulation of manifolds. Inf. Fusion 14(1), 57–77 (2013)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: ICCV (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: NIPS, pp. 971–980 (2017)
Kneip, L., Furgale, P.: OpenGV: a unified and generalized approach to real-time calibrated geometric vision. In: ICRA, pp. 1–8. IEEE (2014)
Kneip, L., Furgale, P., Siegwart, R.: Using multi-camera systems in robotics: efficient solutions to the NPnP problem. In: ICRA, pp. 3770–3776. IEEE (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. IJCV 81(2), 155 (2009)
Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 125–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_8
Periyasamy, A.S., Schwarz, M., Behnke, S.: Robust 6D object pose estimation in cluttered scenes using semantic segmentation and pose regression networks. In: IROS, pp. 6660–6666. IEEE (2018)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C (2nd ed.): The Art of Scientific Computing. Cambridge University Press, New York (1992)
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: ICCV, pp. 3828–3836 (2017)
Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond (2018)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV (2015)
Sundermeyer, M., Marton, Z.-C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 712–729. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_43
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR, pp. 292–301 (2018)
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790 (2018)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: dense 6D pose object detector in RGB images. arXiv preprint arXiv:1902.11020 (2019)
Zeng, A., et al.: Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge. In: ICRA, pp. 1383–1386. IEEE (2017)
Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., Okutomi, M.: Revisiting the PnP problem: a fast, general and optimal solution. In: ICCV, pp. 2344–2351 (2013)
Acknowledgments
The research reported in this paper has been supported by the German Research Foundation DFG, as part of Collaborative Research Center (Sonderforschungsbereich) 1320 EASE - Everyday Activity Science and Engineering, University of Bremen (http://www.ease-crc.org/). The research was conducted in subproject R02 ‘Multi-cue perception supported by background knowledge’.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Richter-Klug, J., Frese, U. (2019). Towards Meaningful Uncertainty Information for CNN Based 6D Pose Estimates. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds) Computer Vision Systems. ICVS 2019. Lecture Notes in Computer Science(), vol 11754. Springer, Cham. https://doi.org/10.1007/978-3-030-34995-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-34995-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34994-3
Online ISBN: 978-3-030-34995-0
eBook Packages: Computer ScienceComputer Science (R0)