Skip to main content

Initial Pose Estimation of 3D Object with Severe Occlusion Using Deep Learning

  • Conference paper
  • First Online:
Advanced Concepts for Intelligent Vision Systems (ACIVS 2020)

Abstract

During the last decade, augmented reality (AR) has gained explosive attention and demonstrated high potential on educational and training applications. As a core technique, AR requires a tracking method to get 3D poses of a camera or an object. Hence, providing fast, accurate, robust, and consistent tracking methods have been a main research topic in the AR field. Fortunately, tracking the camera pose using a relatively small and less-textured known object placed on the scene has been successfully mastered through various types of model-based tracking (MBT) methods. However, MBT methods requires a good initial camera pose estimator and estimating an initial camera pose from partially visible objects remains an open problem. Moreover, severe occlusions are also challenging problems for initial camera pose estimation. Thus, in this paper, we propose a deep learning method to estimate an initial camera pose from a partially visible object that may also be severely occluded. The proposed method handles such challenging scenarios by relying on the information of detected subparts of a target object to be tracked. Specifically, we first detect subparts of the target object using a state-of-the-art convolutional neural networks (CNN). The object detector returns two dimensional bounding boxes, associated classes, and confidence scores. We then use the bounding boxes and classes information to train a deep neural network (DNN) that regresses to camera’s 6-DoF pose. After initial pose estimation, we attempt to use a tweaked version of an existing MBT method to keep tracking the target object in real time on mobile platform. Experimental results demonstrate that the proposed method can estimate accurately initial camera poses from objects that are partially visible or/and severely occluded. Finally, we analyze the performance of the proposed method in more detail by comparing the estimation errors when different number of subparts are detected.

This work was supported by a Research Grant of Pukyong National University (2019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Augmented Reality in Education. http://k3hamilton.com/AR/AR-Home.html. Accessed 29 Aug 2019

  2. Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Comput. Graph. Appl. 21(3), 6–8 (2001)

    Google Scholar 

  3. Dias, A.: Technology enhanced learning and augmented reality: an application on multimedia interactive books. Int. Bus. Econ. Rev. 1(1), 69–79 (2009)

    Google Scholar 

  4. Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: 2nd IEEE and ACM International Workshop on Augmented Reality, pp. 85–94 (1999)

    Google Scholar 

  5. Wagner, D., Schmalstieg, D.: First steps towards handheld augmented reality. In: Seventh IEEE International Symposium on Wearable Computers, pp. 127–135 (2003)

    Google Scholar 

  6. Comport, A.I., Marchand, É., Chaumette, F.: A real-time tracker for markerless augmented reality. In: 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 36 (2003)

    Google Scholar 

  7. Comport, A.I., Marchand, E., Pressigout, M., Chaumette, F.: Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 12(4), 615–628 (2006)

    Article  Google Scholar 

  8. Seo, B.-K., Park, H., Park, J.-I., Hinterstoisser, S., Ilic, S.: Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. TVCG 20(1), 99–110 (2014)

    Google Scholar 

  9. Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. IJCV 98(3), 335–354 (2012)

    Article  MathSciNet  Google Scholar 

  10. Henning, T., Ulrich, S., Elmar, S.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: ICCV, pp. 124–132 (2017)

    Google Scholar 

  11. Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular segmentation and pose tracking of multiple objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 423–438. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_26

    Chapter  Google Scholar 

  12. Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)

    Google Scholar 

  13. Akgul, O., Penekli, H.I., Genc, Y.: Applying deep learning in augmented reality tracking. In: SITIS, pp. 47–54 (2016)

    Google Scholar 

  14. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)

    Google Scholar 

  15. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR (2018)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  17. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  18. Tensorflow: Machine Learning Library. https://www.tensorflow.org/. Accessed 29 Aug 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanhoon Park .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lomaliza, JP., Park, H. (2020). Initial Pose Estimation of 3D Object with Severe Occlusion Using Deep Learning. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2020. Lecture Notes in Computer Science(), vol 12002. Springer, Cham. https://doi.org/10.1007/978-3-030-40605-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40605-9_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40604-2

  • Online ISBN: 978-3-030-40605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics