Skip to main content

Alignment of Deep Features in 3D Models for Camera Pose Estimation

  • Conference paper
  • First Online:
  • 2196 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Abstract

Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering—with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wolf, P.R., Dewitt, B.A.: Elements of Photogrammetry: With Applications in GIS. McGraw-Hill, New York (2000)

    Google Scholar 

  2. Ackermann, F.: Airborne laser scanning – present status and further expectations. ISPRS J. Photogram. Remote Sens. 54, 64–67 (1999)

    Article  Google Scholar 

  3. Davison, A., Reid, I., Molton, N., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  4. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  5. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)

    Article  Google Scholar 

  6. Goldlucke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vision 106(2), 172–191 (2014)

    Article  MathSciNet  Google Scholar 

  7. Maier, R., Kim, K., Cremers, D., Kautz, J., Nießner, M.: Intrinsic3d: high-quality 3D reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  8. Zhou, Q., Park, J., Koltun, V.: Open3D: A modern library for 3D data processing. arXiv:1801.09847 (2018)

  9. Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 303–312 (1996)

    Google Scholar 

  10. Park, J., Zhou, Q.-Y., Koltun, V.: Colored point cloud registration revisited. In: Proceedings of ICCV (2017)

    Google Scholar 

  11. Choi, S., Zhou, Q.-Y., Koltun, V.: Robust reconstruction of indoor scenes. In: Proceedings of CVPR (2015)

    Google Scholar 

  12. Johnson, A.E., Kang, S.B.: Registration and integration of textured 3D data. Image Vis. Comput. 17, 135–147 (1999)

    Article  Google Scholar 

  13. Korn, M., Holzkothen, M., Pauli, J.: Color supported generalized-ICP. In: Proceedings of VISAPP (2014)

    Google Scholar 

  14. Men, H., Gebre, B., Pochiraju, K.: Color point cloud registration with 4D ICP algorithm. In: Proceedings of ICRA (2011)

    Google Scholar 

  15. Li, J.N., Wang, L.H., Li, Y., Zhang, J.F., Li, D.X., Zhang, M.: Local optimized and scalable frame-to-model SLAM. Multimedia Tools Appl. 75, 8675–8694 (2016)

    Article  Google Scholar 

  16. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, James M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  17. Kan, M., Shan, S., Chen, X.: Multi-view deep network for cross-view classification. In: Proceedings of IEEE ICCVPR (2016)

    Google Scholar 

  18. Cheng, S.-C., Su, J.-Y., Chen, J.-M., Hsieh, J.-W.: Model-based 3D scene reconstruction using a moving RGB-D camera. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 214–225. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_18

    Chapter  Google Scholar 

  19. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of ICCV (2015)

    Google Scholar 

  20. Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1–32 (2016)

    MATH  Google Scholar 

  21. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L. J.: Volumetric and multi-view CNNs for object classification on 3D data. arXiv:1604.03265v2 [cs.CV] 29 (2016)

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015)

  23. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  24. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012)

    Google Scholar 

  25. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML (2016)

    Google Scholar 

  26. Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 107-2221-E-019 -033 -MY2 and 107-2634-F-019 -001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jui-Yuan Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Su, JY., Cheng, SC., Chang, CC., Hsieh, JW. (2019). Alignment of Deep Features in 3D Models for Camera Pose Estimation. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05716-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05715-2

  • Online ISBN: 978-3-030-05716-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics