Road Perspective Depth Reconstruction from Single Images Using Reduce-Refine-Upsample CNNs

  • José E. Valdez-RodríguezEmail author
  • Hiram CalvoEmail author
  • Edgardo M. Felipe-Riverón
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10632)


Depth reconstruction from single images has been a challenging task due to the complexity and the quantity of depth cues that images have. Convolutional Neural Networks (CNN) have been successfully used to reconstruct depth of general object scenes; however, these works have not been tailored for the particular problem of road perspective depth reconstruction. As we aim to build a computational efficient model, we focus on single-stage CNNs. In this paper we propose two different models for solving this task. A particularity is that our models perform refinement in the same single-stage training; thus, we call them Reduce-Refine-Upsample (RRU) models because of the order of the CNN operations. We compare our models with the current state of the art in depth reconstruction, obtaining improvements in both global and local views for images of road perspectives.


Depth reconstruction Convolutional Neural Networks One stage training Embedded refining layer Stereo matching 


  1. 1.
    Howard, I.P.: Perceiving in Depth, Volume 1: Basic Mechanisms. Oxford University Press, Oxford (2012)CrossRefGoogle Scholar
  2. 2.
    Bills, C., Chen, J., Saxena, A.: Autonomous MAV flight in indoor environments using single image perspective cues. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 5776–5783. IEEE (2011)Google Scholar
  3. 3.
    Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). Scholar
  4. 4.
    Häne, C., Sattler, T., Pollefeys, M.: Obstacle detection for self-driving cars using only monocular cameras and wheel odometry. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5101–5108. IEEE (2015)Google Scholar
  5. 5.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp. 2366–2374 (2014)Google Scholar
  6. 6.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)Google Scholar
  7. 7.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)Google Scholar
  8. 8.
    Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2024–2039 (2016)CrossRefGoogle Scholar
  9. 9.
    Afifi, A.J., Hellwich, O.: Object depth estimation from a single image using fully convolutional neural network. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2016)Google Scholar
  10. 10.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  11. 11.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30, 328–341 (2008)CrossRefGoogle Scholar
  12. 12.
    Konolige, K.: Small vision systems: hardware and implementation. In: Shirai, Y., Hirose, S. (eds.) Robotics Research, pp. 203–212. Springer, London (1998)CrossRefGoogle Scholar
  13. 13.
    Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011). Scholar
  14. 14.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  15. 15.
    LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  17. 17.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  18. 18.
    Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:1611.01491 (2016)
  19. 19.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  20. 20.
    Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. arXiv preprint arXiv:1703.03872 (2017)
  21. 21.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). Scholar
  22. 22.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  23. 23.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Centro de Investigación en ComputaciónInstituto Politécnico NacionalMexico CityMexico

Personalised recommendations