3D Surface Reconstruction by Pointillism

  • Olivia WilesEmail author
  • Andrew Zisserman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)


The objective of this work is to infer the 3D shape of an object from a single image. We use sculptures as our training and test bed, as these have great variety in shape and appearance.

To achieve this we build on the success of multiple view geometry (MVG) which is able to accurately provide correspondences between images of 3D objects under varying viewpoint and illumination conditions, and make the following contributions: first, we introduce a new loss function that can harness image-to-image correspondences to provide a supervisory signal to train a deep network to infer a depth map. The network is trained end-to-end by differentiating through the camera. Second, we develop a processing pipeline to automatically generate a large scale multi-view set of correspondences for training the network. Finally, we demonstrate that we can indeed obtain a depth map of a novel object from a single image for a variety of sculptures with varying shape/texture, and that the network generalises at test time to new domains (e.g. synthetic images).



The authors would like to thank Fatma Guney for helpful feedback and suggestions. This work was funded by an EPSRC studentship and EPSRC Programme Grant Seebibyte EP/M013774/1.

Supplementary material

478822_1_En_21_MOESM1_ESM.pdf (3.2 mb)
Supplementary material 1 (pdf 3268 KB)


  1. 1.
    Koenderink, J.J.: What does the occluding contour tell us about solid shape? Perception 13, 321–330 (1984)CrossRefGoogle Scholar
  2. 2.
    Witkin, A.P.: Recovering surface shape and orientation from texture. Artif. Intell. 17(1–3), 17–45 (1981)CrossRefGoogle Scholar
  3. 3.
    Malik, J., Rosenholtz, R.: Computing local surface orientation and shape from texture for curved surfaces. IJCV 23(2), 149–168 (1997)CrossRefGoogle Scholar
  4. 4.
    Blake, A., Marinos, C.: Shape from texture: estimation, isotropy and moments. Artif. Intell. 45(3), 323–380 (1990)CrossRefGoogle Scholar
  5. 5.
    Fleming, R.W., Torralba, A., Adelson, E.H.: Specular reflections and the perception of shape. J. Vis. 4(9), 798–820 (2004)CrossRefGoogle Scholar
  6. 6.
    Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE PAMI 21(8), 690–706 (1999)CrossRefGoogle Scholar
  7. 7.
    Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE PAMI 37, 1670–1687 (2015)CrossRefGoogle Scholar
  8. 8.
    Koenderink, J.J., van Doorn, A.J.: Photometric invariants related to solid shape. Optica Acta 27(7), 981–996 (1980)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518CrossRefGoogle Scholar
  10. 10.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Snavely, N., Seitz, S., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: Proceedings of the ACM SIGGRAPH, vol. 3, pp. 835–846 (2006)Google Scholar
  12. 12.
    Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002). Scholar
  13. 13.
    Arandjelović, R., Zisserman, A.: Name that sculpture. In: ACM International Conference on Multimedia RetrievalD (2012)Google Scholar
  14. 14.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  15. 15.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV) (2016)Google Scholar
  16. 16.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the CVPR (2015)Google Scholar
  17. 17.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NIPS (2016)Google Scholar
  18. 18.
    Zoran, D., Isola, P., Krishnan, D., Freeman, W.T.: Learning ordinal relationships for mid-level vision. In: Proceedings of the ICCV (2015)Google Scholar
  19. 19.
    Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SfM-Net: learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)
  20. 20.
    Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the CVPR (2017)Google Scholar
  21. 21.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the CVPR (2017)Google Scholar
  22. 22.
    Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: Proceedings of the CVPR (2017)Google Scholar
  23. 23.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). Scholar
  24. 24.
    Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). Scholar
  25. 25.
    Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the CVPR (2017)Google Scholar
  26. 26.
    Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: SurfNet: generating 3D shape surfaces using deep residual networks. In: Proceedings of the CVPR (2017)Google Scholar
  27. 27.
    Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NIPS, pp. 82–90 (2016)Google Scholar
  28. 28.
    Soltani, A.A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the CVPR (2017)Google Scholar
  29. 29.
    Tulsiani, S., Zhou, T., Efros, A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the CVPR (2017)Google Scholar
  30. 30.
    Rezende, D., Eslami, S.M.A., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3D structure from images. In: NIPS, pp. 4997–5005 (2016)Google Scholar
  31. 31.
    Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: Learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)Google Scholar
  32. 32.
    Gadelha, M., Maji, S., Wang, R.: 3D shape induction from 2D views of multiple objects. arXiv preprint arXiv:1612.05872 (2016)
  33. 33.
    Zhu, R., Kiani, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: Proceedings of the ICCV (2017)Google Scholar
  34. 34.
    Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: Proceedings of the ICCV (2017)Google Scholar
  35. 35.
    Li, Z., Snavely, N.: MegaDepth: Learning single-view depth prediction from internet photos. In: Proceedings of the CVPR (2018)Google Scholar
  36. 36.
    Hong, J.H., Zach, C., Fitzgibbon, A., Cipolla, R.: Projective bundle adjustment from arbitrary initialization using the variable projection method. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 477–493. Springer, Cham (2016). Scholar
  37. 37.
    Hong, J.H., Zach, C., Fitzgibbon, A.: Revisiting the variable projection method for separable nonlinear least squares problems. In: Proceedings of the CVPR (2017)Google Scholar
  38. 38.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Strang, G.: Linear Algebra and Its Applications, 2nd edn. Academic Press, Inc., Cambridge (1980)zbMATHGoogle Scholar
  40. 40.
    Papadopoulo, T., Lourakis, M.I.A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 554–570. Springer, Heidelberg (2000). Scholar
  41. 41.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  42. 42.
    Todd, J.T.: The visual perception of 3D shape. Trends Cogn. Sci. 8(3), 115–121 (2004)CrossRefGoogle Scholar
  43. 43.
    Koenderink, J.J., Van Doorn, A.J., Kappers, A.M.: Surface perception in pictures. Percept. Psychophys. 52(5), 487–496 (1992)CrossRefGoogle Scholar
  44. 44.
    Belhumeur, P.N., Kriegman, D.J., Yuille, A.L.: The bas-relief ambiguity. IJCV 35(1), 33–44 (1999)CrossRefGoogle Scholar
  45. 45.
    Liwicki, S., Zach, C., Miksik, O., Torr, P.H.S.: Coarse-to-fine planar regularization for dense monocular depth estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 458–474. Springer, Cham (2016). Scholar
  46. 46.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  47. 47.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the CVPR (2017)Google Scholar
  48. 48.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and Checkerboard Artifacts. Distill (2016)Google Scholar
  49. 49.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the ICCV (2017)Google Scholar
  50. 50.
    Arandjelović, R., Zisserman, A.: Smooth object retrieval using a bag of boundaries. In: Proceedings of the ICCV (2011)Google Scholar
  51. 51.
    Fouhey, D.F., Gupta, A., Zisserman, A.: 3D shape attributes. In: Proceedings of the CVPR (2016)Google Scholar
  52. 52.
    Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4) (2017)CrossRefGoogle Scholar
  53. 53.
    Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks with identity mappings for high-resolution semantic segmentation. In: Proceedings of the CVPR (2017)Google Scholar
  54. 54.
    Moulon, P., Monasse, P., Marlet, R., Others: OpenMVG. An open multiple view geometry library.
  55. 55.
    Wiles, O., Zisserman, A.: SilNet: single- and multi-view reconstruction by learning from silhouettes. In: Proceedings of the BMVC (2017)Google Scholar
  56. 56.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Technical report arXiv:1512.03012 [cs.GR] (2015)
  57. 57.
    Blender Online Community: Blender - a 3D modelling and rendering package (2017)Google Scholar
  58. 58.
    Choi, S., Zhou, Q.Y., Miller, S., Koltun, V.: A large dataset of object scans. arXiv:1602.02481 (2016)
  59. 59.
  60. 60.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the CVPR (2016)Google Scholar
  61. 61.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). Scholar
  62. 62.
    Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Visual Geometry Group, Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations