Abstract
During the last decade, augmented reality (AR) has gained explosive attention and demonstrated high potential on educational and training applications. As a core technique, AR requires a tracking method to get 3D poses of a camera or an object. Hence, providing fast, accurate, robust, and consistent tracking methods have been a main research topic in the AR field. Fortunately, tracking the camera pose using a relatively small and less-textured known object placed on the scene has been successfully mastered through various types of model-based tracking (MBT) methods. However, MBT methods requires a good initial camera pose estimator and estimating an initial camera pose from partially visible objects remains an open problem. Moreover, severe occlusions are also challenging problems for initial camera pose estimation. Thus, in this paper, we propose a deep learning method to estimate an initial camera pose from a partially visible object that may also be severely occluded. The proposed method handles such challenging scenarios by relying on the information of detected subparts of a target object to be tracked. Specifically, we first detect subparts of the target object using a state-of-the-art convolutional neural networks (CNN). The object detector returns two dimensional bounding boxes, associated classes, and confidence scores. We then use the bounding boxes and classes information to train a deep neural network (DNN) that regresses to camera’s 6-DoF pose. After initial pose estimation, we attempt to use a tweaked version of an existing MBT method to keep tracking the target object in real time on mobile platform. Experimental results demonstrate that the proposed method can estimate accurately initial camera poses from objects that are partially visible or/and severely occluded. Finally, we analyze the performance of the proposed method in more detail by comparing the estimation errors when different number of subparts are detected.
This work was supported by a Research Grant of Pukyong National University (2019).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Augmented Reality in Education. http://k3hamilton.com/AR/AR-Home.html. Accessed 29 Aug 2019
Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Comput. Graph. Appl. 21(3), 6–8 (2001)
Dias, A.: Technology enhanced learning and augmented reality: an application on multimedia interactive books. Int. Bus. Econ. Rev. 1(1), 69–79 (2009)
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: 2nd IEEE and ACM International Workshop on Augmented Reality, pp. 85–94 (1999)
Wagner, D., Schmalstieg, D.: First steps towards handheld augmented reality. In: Seventh IEEE International Symposium on Wearable Computers, pp. 127–135 (2003)
Comport, A.I., Marchand, É., Chaumette, F.: A real-time tracker for markerless augmented reality. In: 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 36 (2003)
Comport, A.I., Marchand, E., Pressigout, M., Chaumette, F.: Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 12(4), 615–628 (2006)
Seo, B.-K., Park, H., Park, J.-I., Hinterstoisser, S., Ilic, S.: Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. TVCG 20(1), 99–110 (2014)
Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. IJCV 98(3), 335–354 (2012)
Henning, T., Ulrich, S., Elmar, S.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: ICCV, pp. 124–132 (2017)
Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular segmentation and pose tracking of multiple objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 423–438. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_26
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)
Akgul, O., Penekli, H.I., Genc, Y.: Applying deep learning in augmented reality tracking. In: SITIS, pp. 47–54 (2016)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Tensorflow: Machine Learning Library. https://www.tensorflow.org/. Accessed 29 Aug 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lomaliza, JP., Park, H. (2020). Initial Pose Estimation of 3D Object with Severe Occlusion Using Deep Learning. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2020. Lecture Notes in Computer Science(), vol 12002. Springer, Cham. https://doi.org/10.1007/978-3-030-40605-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-40605-9_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40604-2
Online ISBN: 978-3-030-40605-9
eBook Packages: Computer ScienceComputer Science (R0)