Abstract
Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering—with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wolf, P.R., Dewitt, B.A.: Elements of Photogrammetry: With Applications in GIS. McGraw-Hill, New York (2000)
Ackermann, F.: Airborne laser scanning – present status and further expectations. ISPRS J. Photogram. Remote Sens. 54, 64–67 (1999)
Davison, A., Reid, I., Molton, N., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Goldlucke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vision 106(2), 172–191 (2014)
Maier, R., Kim, K., Cremers, D., Kautz, J., Nießner, M.: Intrinsic3d: high-quality 3D reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Zhou, Q., Park, J., Koltun, V.: Open3D: A modern library for 3D data processing. arXiv:1801.09847 (2018)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 303–312 (1996)
Park, J., Zhou, Q.-Y., Koltun, V.: Colored point cloud registration revisited. In: Proceedings of ICCV (2017)
Choi, S., Zhou, Q.-Y., Koltun, V.: Robust reconstruction of indoor scenes. In: Proceedings of CVPR (2015)
Johnson, A.E., Kang, S.B.: Registration and integration of textured 3D data. Image Vis. Comput. 17, 135–147 (1999)
Korn, M., Holzkothen, M., Pauli, J.: Color supported generalized-ICP. In: Proceedings of VISAPP (2014)
Men, H., Gebre, B., Pochiraju, K.: Color point cloud registration with 4D ICP algorithm. In: Proceedings of ICRA (2011)
Li, J.N., Wang, L.H., Li, Y., Zhang, J.F., Li, D.X., Zhang, M.: Local optimized and scalable frame-to-model SLAM. Multimedia Tools Appl. 75, 8675–8694 (2016)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, James M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Kan, M., Shan, S., Chen, X.: Multi-view deep network for cross-view classification. In: Proceedings of IEEE ICCVPR (2016)
Cheng, S.-C., Su, J.-Y., Chen, J.-M., Hsieh, J.-W.: Model-based 3D scene reconstruction using a moving RGB-D camera. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 214–225. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_18
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of ICCV (2015)
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1–32 (2016)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L. J.: Volumetric and multi-view CNNs for object classification on 3D data. arXiv:1604.03265v2 [cs.CV] 29 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)
Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML (2016)
Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)
Acknowledgement
This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 107-2221-E-019 -033 -MY2 and 107-2634-F-019 -001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Su, JY., Cheng, SC., Chang, CC., Hsieh, JW. (2019). Alignment of Deep Features in 3D Models for Camera Pose Estimation. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)