Advertisement

Learning to Reconstruct High-Quality 3D Shapes with Cascaded Fully Convolutional Networks

  • Yan-Pei Cao
  • Zheng-Ning Liu
  • Zheng-Fei Kuang
  • Leif Kobbelt
  • Shi-Min Hu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)

Abstract

We present a data-driven approach to reconstructing high-resolution and detailed volumetric representations of 3D shapes. Although well studied, algorithms for volumetric fusion from multi-view depth scans are still prone to scanning noise and occlusions, making it hard to obtain high-fidelity 3D reconstructions. In this paper, inspired by recent advances in efficient 3D deep learning techniques, we introduce a novel cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner. To this end, we also develop an algorithm for end-to-end training of the proposed cascaded structure. Qualitative and quantitative experimental results on both simulated and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work in terms of quality and fidelity of reconstructed models.

Keywords

High-fidelity 3D reconstruction Cascaded architecture 

Notes

Acknowledgement

This work was supported by the Joint NSFC-DFG Research Program (project number 61761136018), and the Natural Science Foundation of China (Project Number 61521002).

Supplementary material

474192_1_En_38_MOESM1_ESM.pdf (158 kb)
Supplementary material 1 (pdf 157 KB)

References

  1. 1.
    Alliez, P., Cohen-Steiner, D., Tong, Y., Desbrun, M.: Voronoi-based variational reconstruction of unoriented point sets. In: Symposium on Geometry Processing, vol. 7, pp. 39–48 (2007)Google Scholar
  2. 2.
    Berger, M., et al.: A survey of surface reconstruction from point clouds. In: Computer Graphics Forum, vol. 36, pp. 301–329. Wiley Online Library (2017)Google Scholar
  3. 3.
    Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo - stereo matching with slanted support windows. In: BMVC, January 2011. https://www.microsoft.com/en-us/research/publication/patchmatch-stereo-stereo-matching-with-slanted-support-windows/
  4. 4.
    Calakli, F., Taubin, G.: SSD: smooth signed distance surface reconstruction. In: Computer Graphics Forum, vol. 30, pp. 1993–2002. Wiley Online Library (2011)Google Scholar
  5. 5.
    Carr, J.C., et al.: Reconstruction and representation of 3D objects with radial basis functions. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 67–76. ACM (2001)Google Scholar
  6. 6.
    Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  7. 7.
    Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. IEEE (2017)Google Scholar
  8. 8.
    Chauve, A.L., Labatut, P., Pons, J.P.: Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1261–1268. IEEE (2010)Google Scholar
  9. 9.
    Chen, K., Lai, Y.K., Hu, S.M.: 3D indoor scene modeling from RGB-D data: a survey. Comput. Vis. Media 1(4), 267–278 (2015)CrossRefGoogle Scholar
  10. 10.
    Chen, K., Lai, Y., Wu, Y.X., Martin, R.R., Hu, S.M.: Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graph. 33(6) (2014)CrossRefGoogle Scholar
  11. 11.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  12. 12.
    Chen, Q., Koltun, V.: Fast MRF optimization with application to depth reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3914–3921 (2014)Google Scholar
  13. 13.
    Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5556–5565, June 2015Google Scholar
  14. 14.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
  15. 15.
    Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46723-8_49CrossRefGoogle Scholar
  16. 16.
    Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69:1–69:13 (2015).  https://doi.org/10.1145/2766945CrossRefGoogle Scholar
  17. 17.
    Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 303–312. ACM, New York (1996).  https://doi.org/10.1145/237170.237269
  18. 18.
    Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. 36(3), 24:1–24:18 (2017).  https://doi.org/10.1145/3054739CrossRefGoogle Scholar
  19. 19.
    Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3 (2017)Google Scholar
  20. 20.
    Dou, Q., et al.: 3D deeply supervised network for automated segmentation of volumetric medical images. Med. Image Anal. 41, 40–54 (2017)CrossRefGoogle Scholar
  21. 21.
    Fanello, S.R., et al.: Ultrastereo: efficient learning-based matching for active stereo systems. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6535–6544. IEEE (2017)Google Scholar
  22. 22.
    Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5431–5440 (2016)Google Scholar
  23. 23.
    Fuhrmann, S., Goesele, M.: Fusion of depth maps with multiple scales. In: ACM Transactions on Graphics (TOG), vol. 30, p. 148. ACM (2011)CrossRefGoogle Scholar
  24. 24.
    Gallup, D., Pollefeys, M., Frahm, J.-M.: 3D reconstruction using an n-layer heightmap. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds.) DAGM 2010. LNCS, vol. 6376, pp. 1–10. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15986-2_1CrossRefGoogle Scholar
  25. 25.
    Guennebaud, G., Gross, M.: Algebraic point set surfaces. In: ACM Transactions on Graphics (TOG), vol. 26, p. 23. ACM (2007)CrossRefGoogle Scholar
  26. 26.
    Han, X., Li, Z., Huang, H., Kalogerakis, E., Yu, Y.: High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  27. 27.
    Häne, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. arXiv preprint arXiv:1704.00710 (2017)
  28. 28.
    Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. arXiv preprint arXiv:1708.01749 (2017)
  29. 29.
    Kähler, O., Prisacariu, V.A., Murray, D.W.: Real-time large-scale dense 3D reconstruction with loop closure. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 500–516. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_30CrossRefGoogle Scholar
  30. 30.
    Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. 32(3), 29:1–29:13 (2013).  https://doi.org/10.1145/2487228.2487237CrossRefzbMATHGoogle Scholar
  31. 31.
    Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3D reconstruction in dynamic scenes using point-based fusion. In: 2013 International Conference on 3D Vision-3DV 2013, pp. 1–8. IEEE (2013)Google Scholar
  32. 32.
    Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGB-D cameras. In: 2013 IEEE International Conference on Robotics and Automation, pp. 3748–3754, May 2013.  https://doi.org/10.1109/ICRA.2013.6631104
  33. 33.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  34. 34.
    Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. 38(3), 199–218 (2000)CrossRefGoogle Scholar
  35. 35.
    Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: ACM SIGGRAPH Computer Graphics, vol. 21, pp. 163–169. ACM (1987)Google Scholar
  36. 36.
    Macedo, I., Gois, J.P., Velho, L.: Hermite radial basis functions implicits. In: Computer Graphics Forum, vol. 30, pp. 27–42. Wiley Online Library (2011)Google Scholar
  37. 37.
    McIlroy, P., Izadi, S., Fitzgibbon, A.: Kinectrack: 3D pose estimation using a projected dense dot pattern. IEEE Trans. Vis. Comput. Graph. 20(6), 839–851 (2014)CrossRefGoogle Scholar
  38. 38.
    Meilland, M., Comport, A.I.: On unifying key-frame and voxel-based dense visual slam at large scales. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3677–3683. IEEE (2013)Google Scholar
  39. 39.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136, October 2011Google Scholar
  40. 40.
    Oeztireli, A.C., Guennebaud, G., Gross, M.: Feature preserving point set surfaces based on non-linear kernel regression. Comput. Graph. Forum (2009).  https://doi.org/10.1111/j.1467-8659.2009.01388.xCrossRefGoogle Scholar
  41. 41.
    Öztireli, A.C., Guennebaud, G., Gross, M.: Feature preserving point set surfaces based on non-linear kernel regression. In: Computer Graphics Forum, vol. 28, pp. 493–501. Wiley Online Library (2009)Google Scholar
  42. 42.
    Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: Proceedings of the International Conference on 3D Vision (2017)Google Scholar
  43. 43.
    Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 3 (2017)Google Scholar
  44. 44.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  45. 45.
    Schnabel, R., Degener, P., Klein, R.: Completion and reconstruction with primitive shapes. In: Computer Graphics Forum, vol. 28, pp. 503–512. Wiley Online Library (2009)Google Scholar
  46. 46.
    Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., Seitz, S.M.: Occluding contours for multi-view stereo. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4002–4009, June 2014Google Scholar
  47. 47.
    Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. (TOG) 31(6), 136 (2012)CrossRefGoogle Scholar
  48. 48.
    Sharma, A., Grau, O., Fritz, M.: VConv-DAE: deep volumetric shape learning without object labels. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 236–250. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_20CrossRefGoogle Scholar
  49. 49.
    Shen, C.H., Fu, H., Chen, K., Hu, S.M.: Structure recovery by part assembly. ACM Trans. Graph. 31(6), 180:1–180:11 (2012).  https://doi.org/10.1145/2366145.2366199CrossRefGoogle Scholar
  50. 50.
    Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: SurfNet: generating 3D shape surfaces using deep residual networks. In: Proceedings of CVPR (2017)Google Scholar
  51. 51.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 190–198. IEEE (2017)Google Scholar
  52. 52.
    Steinbrcker, F., Sturm, J., Cremers, D.: Real-time visual odometry from dense RGB-D images. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 719–722, November 2011.  https://doi.org/10.1109/ICCVW.2011.6130321
  53. 53.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: IEEE International Conference on Computer Vision (ICCV) (2017). http://lmb.informatik.uni-freiburg.de/Publications/2017/TDB17b
  54. 54.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_20CrossRefGoogle Scholar
  55. 55.
    Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR, vol. 1, p. 3 (2017)Google Scholar
  56. 56.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (SIGGRAPH) 36(4) (2017)Google Scholar
  57. 57.
    Wang, W., Huang, Q., You, S., Yang, C., Neumann, U.: Shape inpainting using 3D generative adversarial network and recurrent convolutional networks. arXiv preprint arXiv:1711.06375 (2017)
  58. 58.
    Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: Elasticfusion: dense slam without a pose graph. Robot.: Sci. Syst. (2015)Google Scholar
  59. 59.
    Whelan, T., Salas-Moreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: ElasticFusion: real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016).  https://doi.org/10.1177/0278364916669237CrossRefGoogle Scholar
  60. 60.
    Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)Google Scholar
  61. 61.
    Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)Google Scholar
  62. 62.
    Wurm, K.M., Hornung, A., Bennewitz, M., Stachniss, C., Burgard, W.: Octomap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In: Proceedings of the ICRA 2010 Workshop on Best Practice in 3D Perception and Modeling for Mobile Manipulation, vol. 2 (2010)Google Scholar
  63. 63.
    Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems, pp. 1696–1704 (2016)Google Scholar
  64. 64.
    Yang, B., Wen, H., Wang, S., Clark, R., Markham, A., Trigoni, N.: 3D object reconstruction from a single depth view with adversarial learning. arXiv preprint arXiv:1708.07969 (2017)
  65. 65.
    Zach, C., Pock, T., Bischof, H.: A globally optimal algorithm for robust TV-L 1 range image integration. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yan-Pei Cao
    • 1
    • 2
  • Zheng-Ning Liu
    • 1
  • Zheng-Fei Kuang
    • 1
  • Leif Kobbelt
    • 3
  • Shi-Min Hu
    • 1
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Owlii Inc.BeijingChina
  3. 3.RWTH Aachen UniversityAachenGermany

Personalised recommendations