Single Image Intrinsic Decomposition Without a Single Intrinsic Image

  • Wei-Chiu MaEmail author
  • Hang Chu
  • Bolei Zhou
  • Raquel Urtasun
  • Antonio Torralba
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)


Intrinsic image decomposition—decomposing a natural image into a set of images corresponding to different physical causes—is one of the key and fundamental problems of computer vision. Previous intrinsic decomposition approaches either address the problem in a fully supervised manner, or require multiple images of the same scene as input. These approaches are less desirable in practice, as ground truth intrinsic images are extremely difficult to acquire, and requirement of multiple images pose severe limitation on applicable scenarios. In this paper, we propose to bring the best of both worlds. We present a two stream convolutional neural network framework that is capable of learning the decomposition effectively in the absence of any ground truth intrinsic images, and can be easily extended to a (semi-)supervised setup. At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image. We demonstrate the effectiveness of our framework through extensive experimental study on both synthetic and real-world datasets, showing superior performance over previous approaches in both single-image and multi-image settings. Notably, our approach outperforms previous state-of-the-art single image methods while using only 50% of ground truth supervision.


Intrinsic decomposition Unsupervised learning Self-supervised learning 


  1. 1.
    Adelson, E.H., Pentland, A.P.: The perception of shading and reflectance. In: Perception as Bayesian Inference. Cambridge University Press, New York (1996)Google Scholar
  2. 2.
    Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: CVPR (2013)Google Scholar
  3. 3.
    Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. In: PAMI (2015)Google Scholar
  4. 4.
    Barrow, H., Tenenbaum, J.: Recovering intrinsic scene characteristics from images. Comput. Vis. Syst. 2, 3–26 (1978)Google Scholar
  5. 5.
    Bell, M., Freeman, E.: Learning local evidence for shading and reflectance. In: ICCV (2001)Google Scholar
  6. 6.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. TOG 33(4), 159 (2014)CrossRefGoogle Scholar
  7. 7.
    Bonneel, N., Sunkavalli, K., Tompkin, J., Sun, D., Paris, S., Pfister, H.: Interactive intrinsic video editing. TOG 33(6), 197 (2014)CrossRefGoogle Scholar
  8. 8.
    Bousseau, A., Paris, S., Durand, F.: User-assisted intrinsic images. TOG 28(5), 130 (2009)CrossRefGoogle Scholar
  9. 9.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  10. 10.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv (2015)Google Scholar
  11. 11.
    Chen, Q., Koltun, V.: A simple model for intrinsic image decomposition with depth cues. In: ICCV (2013)Google Scholar
  12. 12.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NIPS (2016)Google Scholar
  13. 13.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  14. 14.
    Finlayson, G.D., Hordley, S.D., Drew, M.S.: Removing shadows from images using retinex. In: Color and Imaging Conference (2002)Google Scholar
  15. 15.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2016)Google Scholar
  16. 16.
    Grosse, R., Johnson, M.K., Adelson, E.H., Freeman, W.T.: Ground truth dataset and baseline evaluations for intrinsic image algorithms. In: ICCV (2009)Google Scholar
  17. 17.
    Hauagge, D., Wehrwein, S., Bala, K., Snavely, N.: Photometric ambient occlusion. In: CVPR (2013)Google Scholar
  18. 18.
    Hauagge, D.C., Wehrwein, S., Upchurch, P., Bala, K., Snavely, N.: Reasoning about photo collections using models of outdoor illumination. In: BMVC (2014)Google Scholar
  19. 19.
    Horn, B.: Robot Vision. Springer, Heidelberg (1986). Scholar
  20. 20.
    Hui, Z., Sankaranarayanan, A.C., Sunkavalli, K., Hadap, S.: White balance under mixed illumination using flash photography. In: ICCP (2016)Google Scholar
  21. 21.
    Janner, M., Wu, J., Kulkarni, T.D., Yildirim, I., Tenenbaum, J.: Self-supervised intrinsic image decomposition. In: NIPS (2017)Google Scholar
  22. 22.
    Jayaraman, D., Grauman, K.: Learning image representations tied to ego-motion. In: ICCV (2015)Google Scholar
  23. 23.
    Jeon, J., Cho, S., Tong, X., Lee, S.: Intrinsic image decomposition using structure-texture separation and surface normals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 218–233. Springer, Cham (2014). Scholar
  24. 24.
    Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic objects into legacy photographs. TOG 30(6), 157 (2011)CrossRefGoogle Scholar
  25. 25.
    Kim, S., Park, K., Sohn, K., Lin, S.: Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 143–159. Springer, Cham (2016). Scholar
  26. 26.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)Google Scholar
  27. 27.
    Kong, N., Black, M.J.: Intrinsic depth: improving depth transfer with intrinsic images. In: ICCV (2015)Google Scholar
  28. 28.
    Kong, N., Gehler, P.V., Black, M.J.: Intrinsic video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 360–375. Springer, Cham (2014). Scholar
  29. 29.
    Laffont, P.Y., Bazin, J.C.: Intrinsic decomposition of image sequences from local temporal variations. In: ICCV (2015)Google Scholar
  30. 30.
    Laffont, P.Y., Bousseau, A., Drettakis, G.: Rich intrinsic image decomposition of outdoor scenes from multiple views. In: TVCG (2013)Google Scholar
  31. 31.
    Land, E.H., McCann, J.J.: Lightness and retinex theory. J. Opt. Soc. Am. 61(1), 1–11 (1971)CrossRefGoogle Scholar
  32. 32.
    Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). Scholar
  33. 33.
    Li, Z., Snavely, N.: Learning intrinsic image decomposition from watching the world. In: CVPR (2018)Google Scholar
  34. 34.
    Liu, X., Jiang, L., Wong, T.T., Fu, C.W.: Statistical invariance for texture synthesis. In: TVCG (2012)Google Scholar
  35. 35.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  36. 36.
    Matsushita, Y., Nishino, K., Ikeuchi, K., Sakauchi, M.: Illumination normalization with time-dependent intrinsic images for video surveillance. In: PAMI (2004)Google Scholar
  37. 37.
    Meka, A., Maximov, M., Zollhöfer, M., Chatterjee, A., Richardt, C., Theobalt, C.: Live intrinsic material estimation. arXiv (2018)Google Scholar
  38. 38.
    Meka, A., Zollhöfer, M., Richardt, C., Theobalt, C.: Live intrinsic video. TOG 35(4), 109 (2016)CrossRefGoogle Scholar
  39. 39.
    Narihira, T., Maire, M., Yu, S.X.: Direct intrinsics: learning Albedo-shading decomposition by convolutional regression. In: ICCV (2015)Google Scholar
  40. 40.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  41. 41.
    Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: Computer Graphics and Interactive Techniques (2001)Google Scholar
  42. 42.
    Omer, I., Werman, M.: Color lines: image specific color representation. In: CVPR (2004)Google Scholar
  43. 43.
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)Google Scholar
  44. 44.
    Rezende, D.J., Eslami, S.A., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3D structure from images. In: NIPS (2016)Google Scholar
  45. 45.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MIC-CAI (2015)Google Scholar
  46. 46.
    Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., Gehler, P.V.: Recovering intrinsic images with a global sparsity prior on reflectance. In: NIPS (2011)Google Scholar
  47. 47.
    Shen, J., Yang, X., Jia, Y., Li, X.: Intrinsic images using optimization. In: CVPR (2011)Google Scholar
  48. 48.
    Shi, J., Dong, Y., Su, H., Yu, S.X.: Learning non-lambertian object intrinsics across shapenet categories (2017)Google Scholar
  49. 49.
    Shu, Z., Yumer, E., Hadap, S., Sunkavalli, K., Shechtman, E., Samaras, D.: Neural face editing with intrinsic image disentangling. In: CVPR (2017)Google Scholar
  50. 50.
    Tappen, M.F., Freeman, W.T., Adelson, E.H.: Recovering intrinsic images from a single image. In: NIPS (2003)Google Scholar
  51. 51.
    Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS (2017)Google Scholar
  52. 52.
    Tung, H.Y.F., Harley, A.W., Seto, W., Fragkiadaki, K.: Adversarial inverse graphics networks: learning 2D-to-3D lifting and image-to-image translation from unpaired supervision. In: ICCV (2017)Google Scholar
  53. 53.
    Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SFM-Net: learning of structure and motion from video. arXiv (2017)Google Scholar
  54. 54.
    Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)Google Scholar
  55. 55.
    Weiss, Y.: Deriving intrinsic images from image sequences. In: ICCV (2001)Google Scholar
  56. 56.
    Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)Google Scholar
  57. 57.
    Yang, J., Reed, S.E., Yang, M.H., Lee, H.: Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In: NIPS (2015)Google Scholar
  58. 58.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). Scholar
  59. 59.
    Zhao, H., Gan, C., Rouditchenko, A., Vondrick, C., McDermott, J., Torralba, A.: The sound of pixels. arXiv (2018)Google Scholar
  60. 60.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)Google Scholar
  61. 61.
    Zhou, T., Krahenbuhl, P., Efros, A.A.: Learning data-driven reflectance priors for intrinsic image decomposition. In: ICCV (2015)Google Scholar
  62. 62.
    Zoran, D., Isola, P., Krishnan, D., Freeman, W.T.: Learning ordinal relationships for mid-level vision. In: ICCV (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Wei-Chiu Ma
    • 1
    • 2
    Email author
  • Hang Chu
    • 3
  • Bolei Zhou
    • 1
  • Raquel Urtasun
    • 2
    • 3
  • Antonio Torralba
    • 1
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.Uber Advanced Technologies GroupPittsburghUSA
  3. 3.University of TorontoTorontoCanada

Personalised recommendations