Alignment of Deep Features in 3D Models for Camera Pose Estimation

Su, Jui-Yuan; Cheng, Shyi-Chyi; Chang, Chin-Chun; Hsieh, Jun-Wei

doi:10.1007/978-3-030-05716-9_36

Alignment of Deep Features in 3D Models for Camera Pose Estimation

Jui-Yuan Su^19,20,
Shyi-Chyi Cheng²⁰,
Chin-Chun Chang²⁰ &
…
Jun-Wei Hsieh²⁰

Conference paper
First Online: 11 December 2018

2196 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Abstract

Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering—with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Wolf, P.R., Dewitt, B.A.: Elements of Photogrammetry: With Applications in GIS. McGraw-Hill, New York (2000)
Google Scholar
Ackermann, F.: Airborne laser scanning – present status and further expectations. ISPRS J. Photogram. Remote Sens. 54, 64–67 (1999)
Article Google Scholar
Davison, A., Reid, I., Molton, N., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Article Google Scholar
Goldlucke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vision 106(2), 172–191 (2014)
Article MathSciNet Google Scholar
Maier, R., Kim, K., Cremers, D., Kautz, J., Nießner, M.: Intrinsic3d: high-quality 3D reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Zhou, Q., Park, J., Koltun, V.: Open3D: A modern library for 3D data processing. arXiv:1801.09847 (2018)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 303–312 (1996)
Google Scholar
Park, J., Zhou, Q.-Y., Koltun, V.: Colored point cloud registration revisited. In: Proceedings of ICCV (2017)
Google Scholar
Choi, S., Zhou, Q.-Y., Koltun, V.: Robust reconstruction of indoor scenes. In: Proceedings of CVPR (2015)
Google Scholar
Johnson, A.E., Kang, S.B.: Registration and integration of textured 3D data. Image Vis. Comput. 17, 135–147 (1999)
Article Google Scholar
Korn, M., Holzkothen, M., Pauli, J.: Color supported generalized-ICP. In: Proceedings of VISAPP (2014)
Google Scholar
Men, H., Gebre, B., Pochiraju, K.: Color point cloud registration with 4D ICP algorithm. In: Proceedings of ICRA (2011)
Google Scholar
Li, J.N., Wang, L.H., Li, Y., Zhang, J.F., Li, D.X., Zhang, M.: Local optimized and scalable frame-to-model SLAM. Multimedia Tools Appl. 75, 8675–8694 (2016)
Article Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, James M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Kan, M., Shan, S., Chen, X.: Multi-view deep network for cross-view classification. In: Proceedings of IEEE ICCVPR (2016)
Google Scholar
Cheng, S.-C., Su, J.-Y., Chen, J.-M., Hsieh, J.-W.: Model-based 3D scene reconstruction using a moving RGB-D camera. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 214–225. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_18
Chapter Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of ICCV (2015)
Google Scholar
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1–32 (2016)
MATH Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L. J.: Volumetric and multi-view CNNs for object classification on 3D data. arXiv:1604.03265v2 [cs.CV] 29 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)
Article Google Scholar
Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012)
Google Scholar
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML (2016)
Google Scholar
Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)
Article Google Scholar

Download references

Acknowledgement

This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 107-2221-E-019 -033 -MY2 and 107-2634-F-019 -001.

Author information

Authors and Affiliations

Department of New Media and Communications Administration, Ming Chuan University, Taipei, Taiwan
Jui-Yuan Su
Department of Computer Science and Information Engineering, National Taiwan Ocean University, Keelung, Taiwan
Jui-Yuan Su, Shyi-Chyi Cheng, Chin-Chun Chang & Jun-Wei Hsieh

Authors

Jui-Yuan Su
View author publications
You can also search for this author in PubMed Google Scholar
Shyi-Chyi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Chun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Wei Hsieh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jui-Yuan Su .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, JY., Cheng, SC., Chang, CC., Hsieh, JW. (2019). Alignment of Deep Features in 3D Models for Camera Pose Estimation. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_36
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics