Advertisement

Evaluation of CNN-Based Single-Image Depth Estimation Methods

  • Tobias KochEmail author
  • Lukas Liebel
  • Friedrich Fraundorfer
  • Marco Körner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)

Abstract

While an increasing interest in deep models for single-image depth estimation (SIDE) can be observed, established schemes for their evaluation are still limited. We propose a set of novel quality criteria, allowing for a more detailed analysis by focusing on specific characteristics of depth maps. In particular, we address the preservation of edges and planar regions, depth consistency, and absolute distance accuracy. In order to employ these metrics to evaluate and compare state-of-the-art SIDE approaches, we provide a new high-quality RGB-D dataset. We used a digital single-lens reflex (DSLR) camera together with a laser scanner to acquire high-resolution images and highly accurate depth maps. Experimental results show the validity of our proposed evaluation protocol.

Keywords

Single-image depth estimation Deep learning CNN RGB-D Benchmark Evaluation Dataset Error metrics 

Notes

Acknowledgements

This research was funded by the German Research Foundation (DFG) for Tobias Koch and the Federal Ministry of Transport and Digital Infrastructure (BMVI) for Lukas Liebel. We thank our colleagues from the Chair of Geodesy for providing all the necessary equipment and our student assistant Leonidas Stöckle for his help during the data acquisition campaign.

References

  1. 1.
    Asuni, N., Giachetti, A.: TESTIMAGES: a large-scale archive for testing visual devices and basic image processing algorithms. In: Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference. The Eurographics Association (2014).  https://doi.org/10.2312/stag.20141242
  2. 2.
    Bülthoff, H.H., Yuille, A.L.: Shape from X: psychophysics and computation. In: Computational Models of Visual Processing, pp. 305–330. MIT Press (1991)Google Scholar
  3. 3.
    Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2658–2666 (2016)Google Scholar
  4. 4.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158 (2017)
  5. 5.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  6. 6.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2015)CrossRefGoogle Scholar
  7. 7.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658 (2015).  https://doi.org/10.1109/ICCV.2015.304
  8. 8.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 2, pp. 2366–2374 (2014)Google Scholar
  9. 9.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011 (2018)Google Scholar
  10. 10.
    Garg, R., Vijay Kumar, B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_45CrossRefGoogle Scholar
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)Google Scholar
  12. 12.
    Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. arXiv preprint arXiv:1803.08673 (2018)
  13. 13.
    Kim, S., Park, K., Sohn, K., Lin, S.: Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 143–159. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_9CrossRefGoogle Scholar
  14. 14.
    Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4) (2017)CrossRefGoogle Scholar
  15. 15.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)Google Scholar
  16. 16.
    Lee, J.H., Heo, M., Kim, K.R., Kim, C.S.: Single-image depth estimation based on fourier domain analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 330–339 (2018)Google Scholar
  17. 17.
    Li, B., Dai, Y., Chen, H., He, M.: Single image depth estimation by dilated deep residual convolutional neural network and soft-weight-sum inference. arXiv preprint arXiv:1705.00534 (2017)
  18. 18.
    Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1119–1127 (2015)Google Scholar
  19. 19.
    Li, J., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 22–29 (2017)Google Scholar
  20. 20.
    Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2579–2588 (2018)Google Scholar
  21. 21.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170 (2015)Google Scholar
  22. 22.
    Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016)CrossRefGoogle Scholar
  23. 23.
    Moreno-Noguer, F., Lepetit, V., Fua, P.: Accurate non-iterative o(n) solution to the PnP problem. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1–8 (2007)Google Scholar
  24. 24.
    Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5506–5514 (2016)  https://doi.org/10.1109/cvpr.2016.594
  25. 25.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRefGoogle Scholar
  26. 26.
    Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  27. 27.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 519–528 (2006).  https://doi.org/10.1109/CVPR.2006.19
  28. 28.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar
  29. 29.
    Strecha, C., Von Hansen, W., Van Gool, L., Fua, P., Thoennessen, U.: On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  30. 30.
    Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. arXiv preprint arXiv:1704.03489 (2017)
  31. 31.
    Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2800–2809 (2015)Google Scholar
  32. 32.
    Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. arXiv preprint arXiv:1704.02157 (2017)
  33. 33.
    Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3917–3925 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tobias Koch
    • 1
    Email author
  • Lukas Liebel
    • 1
  • Friedrich Fraundorfer
    • 2
    • 3
  • Marco Körner
    • 1
  1. 1.Chair of Remote Sensing TechnologyTechnical University of MunichMunichGermany
  2. 2.Institute of Computer Graphics and VisionGraz University of TechnologyGrazAustria
  3. 3.Remote Sensing Technology InstituteGerman Aerospace Center (DLR)OberpfaffenhofenGermany

Personalised recommendations