Advertisement

Recovering 3D Planes from a Single Image via Convolutional Neural Networks

  • Fengting YangEmail author
  • Zihan Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

In this paper, we study the problem of recovering 3D planar surfaces from a single image of man-made environment. We show that it is possible to directly train a deep neural network to achieve this goal. A novel plane structure-induced loss is proposed to train the network to simultaneously predict a plane segmentation map and the parameters of the 3D planes. Further, to avoid the tedious manual labeling process, we show how to leverage existing large-scale RGB-D dataset to train our network without explicit 3D plane annotations, and how to take advantage of the semantic labels come with the dataset for accurate planar and non-planar classification. Experiment results demonstrate that our method significantly outperforms existing methods, both qualitatively and quantitatively. The recovered planes could potentially benefit many important visual tasks such as vision-based navigation and human-robot interaction.

Keywords

3D reconstruction Plane segmentation Deep learning 

Notes

Acknowledgement

This work is supported in part by a startup fund from Penn State and a hardware donation from Nvidia.

Supplementary material

474197_1_En_6_MOESM1_ESM.pdf (8.1 mb)
Supplementary material 1 (pdf 8247 KB)

References

  1. 1.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)CrossRefGoogle Scholar
  2. 2.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  3. 3.
    Barinova, O., Konushin, V., Yakubenko, A., Lee, K.C., Lim, H., Konushin, A.: Fast automatic single-view 3-d reconstruction of urban scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 100–113. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_8CrossRefGoogle Scholar
  4. 4.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)Google Scholar
  5. 5.
    Dasgupta, S., Fang, K., Chen, K., Savarese, S.: DeLay: robust spatial layout estimation for cluttered indoor scenes. In: CVPR, pp. 616–624 (2016)Google Scholar
  6. 6.
    Delage, E., Lee, H., Ng, A.Y.: Automatic single-image 3d reconstructions of indoor manhattan world scenes. In: Thrun, S., Brooks, R., Durrant-Whyte, H. (eds.) Robotics Research. ISRR, vol. 28, pp. 305–321. Springer, Heidelberg (2005).  https://doi.org/10.1007/978-3-540-48113-3_28CrossRefGoogle Scholar
  7. 7.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS, pp. 2366–2374 (2014)Google Scholar
  8. 8.
    Fouhey, D.F., Gupta, A., Hebert, M.: Data-driven 3D primitives for single image understanding. In: ICCV, pp. 3392–3399 (2013)Google Scholar
  9. 9.
    Fouhey, D.F., Gupta, A., Hebert, M.: Unfolding an Indoor origami world. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 687–702. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_44CrossRefGoogle Scholar
  10. 10.
    Garg, R., Kumar, B.G.V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_45CrossRefGoogle Scholar
  11. 11.
    Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2015)CrossRefGoogle Scholar
  12. 12.
    Han, F., Zhu, S.C.: Bottom-Up/Top-Down image parsing by attribute graph grammar. In: ICCV, pp. 1778–1785 (2005)Google Scholar
  13. 13.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  14. 14.
    Hedau, V., Hoiem, D., Forsyth, D.A.: Recovering the spatial layout of cluttered rooms. In: ICCV, pp. 1849–1856 (2009)Google Scholar
  15. 15.
    Hedau, V., Hoiem, D., Forsyth, D.: Thinking inside the box: using appearance models and context based on room geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15567-3_17CrossRefGoogle Scholar
  16. 16.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007)CrossRefGoogle Scholar
  17. 17.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  18. 18.
    Ladický, L., Zeisl, B., Pollefeys, M.: Discriminatively trained dense surface normal estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 468–484. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_31CrossRefGoogle Scholar
  19. 19.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV, pp. 239–248 (2016)Google Scholar
  20. 20.
    Lee, C., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: End-to-End room layout estimation. In: ICCV, pp. 4875–4884 (2017)Google Scholar
  21. 21.
    Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: CVPR, pp. 2136–2143 (2009)Google Scholar
  22. 22.
    Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: CVPR (2018)Google Scholar
  23. 23.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  24. 24.
    Magri, L., Fusiello, A.: Multiple models fitting as a set coverage problem. In: CVPR, pp. 3318–3326 (2016)Google Scholar
  25. 25.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016)Google Scholar
  26. 26.
    Micusík, B., Wildenauer, H., Kosecka, J.: Detection and matching of rectilinear structures. In: CVPR (2008)Google Scholar
  27. 27.
    Micusík, B., Wildenauer, H., Vincze, M.: Towards detection of orthogonal planes in monocular images of indoor environments. In: ICRA, pp. 999–1004 (2008)Google Scholar
  28. 28.
    Ramalingam, S., Pillai, J.K., Jain, A., Taguchi, Y.: Manhattan junction catalogue for spatial reasoning of indoor scenes. In: CVPR, pp. 3065–3072 (2013)Google Scholar
  29. 29.
    Ros, G., Sellart, L., Materzynska, J., Vázquez, D., López, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, pp. 3234–3243 (2016)Google Scholar
  30. 30.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRefGoogle Scholar
  31. 31.
    Toldo, R., Fusiello, A.: Robust multiple structures estimation with J-linkage. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 537–547. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_41CrossRefGoogle Scholar
  32. 32.
    Witkin, A.P., Tenenbaum, J.M.: On the role of structure in vision. In: Beck, J., Hope, B., Rosenfeld, A. (eds.) Human and Machine Vision, pp. 481–543. Academic Press, Cambridge (1983)CrossRefGoogle Scholar
  33. 33.
    Xiao, J., Russell, B.C., Torralba, A.: Localizing 3D cuboids in single-view images. In: NIPS, pp. 755–763 (2012)Google Scholar
  34. 34.
    Yang, H., Zhang, H.: Efficient 3D room shape recovery from a single panorama. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.The Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations