Inferring 3D Shapes from Image Collections Using Adversarial Networks

Abstract

We investigate the problem of learning a probabilistic distribution over three-dimensional shapes given two-dimensional views of multiple objects taken from unknown viewpoints. Our approach called projective generative adversarial network (PrGAN) trains a deep generative model of 3D shapes whose projections (or renderings) matches the distribution of the provided 2D views. The addition of a differentiable projection module allows us to infer the underlying 3D shape distribution without access to any explicit 3D or viewpoint annotation during the learning phase. We show that our approach produces 3D shapes of comparable quality to GANs trained directly on 3D data. Experiments also show that the disentangled representation of 2D shapes into geometry and viewpoint leads to a good generative model of 2D shapes. The key advantage of our model is that it estimates 3D shape, viewpoint, and generates novel views from an input image in a completely unsupervised manner. We further investigate how the generative models can be improved if additional information such as depth, viewpoint or part segmentations is available at training time. To this end, we present new differentiable projection operators that can be used to learn better 3D generative models. Our experiments show that PrGAN can successfully leverage extra visual cues to create more diverse and accurate shapes.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    We later relax this assumption to incorporate extra supervision.

References

  1. Achlioptas, P., Diamanti, O., Mitliagkas, I., & Guibas, L. J. (2017). Learning representations and generative models for 3d point clouds. arXiv preprint arXiv:1707.02392.

  2. Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In Computer vision and pattern recognition (CVPR). IEEE.

  3. Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. Transactions of Pattern Analysis and Machine Intelligence (PAMI), 37, 1670–1687.

    Article  Google Scholar 

  4. Barrow, H., & Tenenbaum, J. (1978). Recovering intrinsic scene characteristics. In A. Hanson & E. Riseman (Eds.), Comput. vis. syst. (pp. 3–26).

  5. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). ACM Press/Addison-Wesley Publishing Co.

  6. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012.

  7. Cheng, Z., Gadelha, M., Maji, S., & Sheldon, D. (2019). A Bayesian perspective on the deep image prior. In The IEEE conference on computer vision and pattern recognition (CVPR).

  8. Dosovitskiy, A., Tobias Springenberg, J., & Brox, T. (2015). Learning to generate chairs with convolutional neural networks. In Conference on computer vision and pattern recognition (CVPR).

  9. Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In International conference on computer vision (ICCV).

  10. Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In Computer vision and pattern recognition (CVPR).

  11. Gadelha, M., Maji, S., & Wang, R. (2017). 3D shape generation using spatially ordered point clouds. In British machine vision conference (BMVC).

  12. Gadelha, M., Maji, S., & Wang, R. (2017). 3D shape induction from 2D views of multiple objects. In International conference on 3D vision (3DV).

  13. Gadelha, M., Wang, R., & Maji, S. (2018). Multiresolution tree networks for 3D point cloud processing. In European conference on computer vision (ECCV).

  14. Gadelha, M., Wang, R., & Maji, S. (2019). Shape reconstruction using differentiable projections and deep priors. In International conference on computer vision (ICCV).

  15. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (NIPS).

  16. Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2006). A kernel method for the two-sample-problem. In Advances in neural information processing systems (NIPS).

  17. Groueix, T., Fisher, M., Kim, V. G., Russell, B., & Aubry, M. (2018). AtlasNet: A Papier-Mâché approach to learning 3D surface generation. In Computer vision and pattern recognition (CVPR).

  18. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    Google Scholar 

  19. Henderson, P., & Ferrari, V. (2018). Learning to generate and reconstruct 3D meshes with only 2D supervision. In British machine vision conference (BMVC).

  20. Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In International conference on computer vision (ICCV).

  21. Kanazawa, A., Tulsiani, S., Efros, A. A., & Malik, J. (2018). Learning category-specific mesh reconstruction from image collections. In European conference on computer vision (ECCV).

  22. Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In Computer vision and pattern recognition (CVPR).

  23. Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3d mesh renderer. In Computer vision and pattern recognition (CVPR).

  24. Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. (2015). Deep convolutional inverse graphics network. In Advances in neural information processing systems (NIPS).

  25. Land, E. H., & McCann, J. J. (1971). Lightness and retinex theory. JOSA, 61(1), 1–11.

    Article  Google Scholar 

  26. Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2), 150–162.

    Article  Google Scholar 

  27. Li, T.-M., Aittala, M., Durand, F., & Lehtinen, J. (2018). Differentiable monte carlo ray tracing through edge sampling. ACM Transactions on Graph (SIGGRAPH Asia), 37, 1–11.

    Google Scholar 

  28. Lin, C.-H., Kong, C., & Lucey, S. (2018). Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI conference on artificial intelligence (AAAI).

  29. Liu, H. T. D., Tao, M., & Jacobson, A. (2018). Paparazzi: Surface editing by way of multi-view image processing. ACM Transactions on Graphcs, 37, 221.

    Google Scholar 

  30. Lun, Z., Gadelha, M., Kalogerakis, E., Maji, S., & Wang, R. (2017). 3D shape reconstruction from sketches via multi-view convolutional networks. In International conference on 3D vision (3DV) (pp. 67–77).

  31. Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML).

  32. Nalbach, O., Arabadzhiyska, E., Mehta, D., Seidel, H.-P., & Ritschel, T. (2016). Deep shading: Convolutional neural networks for screen-space shading. arXiv preprint arXiv:1603.06078.

  33. Nguyen-Phuoc, T., Li, C., Balaban, S., & Yang, Y.-L. (2018). Rendernet: A deep convolutional network for differentiable rendering from 3d shapes. In Advances in neural information processing systems 31.

  34. Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill.

  35. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

  36. Rezende, D. J., Eslami, S. M., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised learning of 3D structure from images. In Advances in neural information processing systems (NIPS).

  37. Savarese, S., & Fei-Fei, L. (2007). 3D generic object categorization, localization and pose estimation. In International conference on computer vision (ICCV).

  38. Saxena, A., Chung, S. H., & Ng, A. (2005). Learning depth from single monocular images. In Advances in neural information processing systems (NIPS).

  39. Schwing, A. G., & Urtasun, R. (2012). Efficient exact inference for 3d indoor scene understanding. In European conference on computer vision (ECCV).

  40. Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for CNN: Viewpoint estimation in images using cnns trained with rendered 3D model views. In International conference on computer vision (ICCV).

  41. Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3D models from single images with a convolutional network. In European conference on computer vision (ECCV).

  42. Tulsiani, S., Carreira, J., & Malik, J.. (2015). Pose induction for novel object categories. In International conference on computer vision (ICCV).

  43. Tulsiani, S., Efros, A. A., & Malik, J. (2018). Multi-view consistency as supervisory signal for learning shape and pose prediction. In Computer vision and pattern regognition (CVPR).

  44. Tulsiani, S., Zhou, T., Efros, A. A., & Malik, J. (2017). Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Computer vision and pattern regognition (CVPR).

  45. Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1), 191139–191139.

    Article  Google Scholar 

  46. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Conference on computer vision and pattern recognition (CVPR).

  47. Wu, J., Zhang, C., Xue, T., Freeman, W. T., & Tenenbaum, J. B. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In Advances in neural information processing systems (NIPS).

  48. Yan, X., Yang, J., Yumer, E., Guo, Y., & Lee, H. (2016). Perspective transformer nets: Learning single-view 3D object reconstruction without 3D supervision. In Advances in neural information processing systems.

  49. Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (ECCV).

Download references

Acknowledgements

This research was supported in part by the NSF Grants 1617917, 1749833, 1661259 and 1908669. The experiments were performed using equipment obtained under a grant from the Collaborative R&D Fund managed by the Massachusetts Tech Collaborative.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Matheus Gadelha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gadelha, M., Rai, A., Maji, S. et al. Inferring 3D Shapes from Image Collections Using Adversarial Networks. Int J Comput Vis (2020). https://doi.org/10.1007/s11263-020-01335-w

Download citation

Keywords

  • 3D generative models
  • Unsupervised learning
  • Differentiable rendering
  • Adversarial networks