Robust camera pose estimation by viewpoint classification using deep learning
Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.
Keywordspose estimation augmented reality (AR) deep learning convolutional neural network
- Lee, T.; Hollerer, T. Hybrid feature tracking and user interaction for markerless augmented reality. In: Proceedings of IEEE Virtual Reality Conference, 145–152, 2008.Google Scholar
- Maidi, M.; Preda, M.; Le, V. H. Markerless tracking for mobile augmented reality. In: Proceedings of IEEE International Conference on Signal and Image Processing Applications, 301–306, 2011.Google Scholar
- Yoshida, T.; Saito, H.; Shimizu, M.; Taguchi, A. Stable keypoint recognition using viewpoint generative learning. In: Proceedings of the International Conference on Computer Vision Theory and Applications, Vol. 2, 310–315, 2013.Google Scholar
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.MathSciNetCrossRefGoogle Scholar
- Agrawal, P.; Carreira, J.; Malik, J. Learning to see by moving. In: Proceedings of IEEE International Conference on Computer Vision, 37–45, 2015.Google Scholar
- Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural network. In: Proceedings of Advances in Neural Information Processing Systems, 1097–1105, 2012.Google Scholar
- Yu, G.; Morel, J.-M. ASIFT: An algorithm for fully affine invariant comparison. Image Processing On Line Vol. 1, 1–28, 2011.Google Scholar
- Tokui, S.; Oono, K.; Hido, S.; Clayton, J. Chainer: A next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in the 29th Annual Conference on Neural Information Processing Systems, 2015.Google Scholar
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv preprint arXiv:1312.4400, 2013.Google Scholar
- Alcantarilla, P. F.; Nuevo, J.; Bartoli, A. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: Proceedings of British Machine Vision Conference, 13.1–13.11, 2013.Google Scholar
Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.