Initial Pose Estimation of 3D Object with Severe Occlusion Using Deep Learning

Lomaliza, Jean-Pierre; Park, Hanhoon

doi:10.1007/978-3-030-40605-9_28

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12002))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

1480 Accesses
1 Citations

Abstract

During the last decade, augmented reality (AR) has gained explosive attention and demonstrated high potential on educational and training applications. As a core technique, AR requires a tracking method to get 3D poses of a camera or an object. Hence, providing fast, accurate, robust, and consistent tracking methods have been a main research topic in the AR field. Fortunately, tracking the camera pose using a relatively small and less-textured known object placed on the scene has been successfully mastered through various types of model-based tracking (MBT) methods. However, MBT methods requires a good initial camera pose estimator and estimating an initial camera pose from partially visible objects remains an open problem. Moreover, severe occlusions are also challenging problems for initial camera pose estimation. Thus, in this paper, we propose a deep learning method to estimate an initial camera pose from a partially visible object that may also be severely occluded. The proposed method handles such challenging scenarios by relying on the information of detected subparts of a target object to be tracked. Specifically, we first detect subparts of the target object using a state-of-the-art convolutional neural networks (CNN). The object detector returns two dimensional bounding boxes, associated classes, and confidence scores. We then use the bounding boxes and classes information to train a deep neural network (DNN) that regresses to camera’s 6-DoF pose. After initial pose estimation, we attempt to use a tweaked version of an existing MBT method to keep tracking the target object in real time on mobile platform. Experimental results demonstrate that the proposed method can estimate accurately initial camera poses from objects that are partially visible or/and severely occluded. Finally, we analyze the performance of the proposed method in more detail by comparing the estimation errors when different number of subparts are detected.

This work was supported by a Research Grant of Pukyong National University (2019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Augmented Reality in Education. http://k3hamilton.com/AR/AR-Home.html. Accessed 29 Aug 2019
Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Comput. Graph. Appl. 21(3), 6–8 (2001)
Google Scholar
Dias, A.: Technology enhanced learning and augmented reality: an application on multimedia interactive books. Int. Bus. Econ. Rev. 1(1), 69–79 (2009)
Google Scholar
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: 2nd IEEE and ACM International Workshop on Augmented Reality, pp. 85–94 (1999)
Google Scholar
Wagner, D., Schmalstieg, D.: First steps towards handheld augmented reality. In: Seventh IEEE International Symposium on Wearable Computers, pp. 127–135 (2003)
Google Scholar
Comport, A.I., Marchand, É., Chaumette, F.: A real-time tracker for markerless augmented reality. In: 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 36 (2003)
Google Scholar
Comport, A.I., Marchand, E., Pressigout, M., Chaumette, F.: Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 12(4), 615–628 (2006)
Article Google Scholar
Seo, B.-K., Park, H., Park, J.-I., Hinterstoisser, S., Ilic, S.: Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. TVCG 20(1), 99–110 (2014)
Google Scholar
Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. IJCV 98(3), 335–354 (2012)
Article MathSciNet Google Scholar
Henning, T., Ulrich, S., Elmar, S.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: ICCV, pp. 124–132 (2017)
Google Scholar
Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular segmentation and pose tracking of multiple objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 423–438. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_26
Chapter Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)
Google Scholar
Akgul, O., Penekli, H.I., Genc, Y.: Applying deep learning in augmented reality tracking. In: SITIS, pp. 47–54 (2016)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Tensorflow: Machine Learning Library. https://www.tensorflow.org/. Accessed 29 Aug 2019

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, Pukyong National University, Busan, 48513, South Korea
Jean-Pierre Lomaliza & Hanhoon Park
School of ITMS, University of South Australia, Adelaide, SA, 5001, Australia
Hanhoon Park

Authors

Jean-Pierre Lomaliza
View author publications
You can also search for this author in PubMed Google Scholar
Hanhoon Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanhoon Park .

Editor information

Editors and Affiliations

DGA, Paris, France
Jacques Blanc-Talon
University of Auckland, Auckland, New Zealand
Patrice Delmas
Ghent University, Ghent, Belgium
Wilfried Philips
CSIRO, Canberra, Australia
Dan Popescu
University of Antwerp, Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lomaliza, JP., Park, H. (2020). Initial Pose Estimation of 3D Object with Severe Occlusion Using Deep Learning. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2020. Lecture Notes in Computer Science(), vol 12002. Springer, Cham. https://doi.org/10.1007/978-3-030-40605-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-40605-9_28
Published: 06 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40604-2
Online ISBN: 978-3-030-40605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics