Rendering Portraitures from Monocular Camera and Beyond

  • Xiangyu XuEmail author
  • Deqing Sun
  • Sifei Liu
  • Wenqi Ren
  • Yu-Jin Zhang
  • Ming-Hsuan Yang
  • Jian Sun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Shallow Depth-of-Field (DoF) is a desirable effect in photography which renders artistic photos. Usually, it requires single-lens reflex cameras and certain photography skills to generate such effects. Recently, dual-lens on cellphones is used to estimate scene depth and simulate DoF effects for portrait shots. However, this technique cannot be applied to photos already taken and does not work well for whole-body scenes where the subject is at a distance from the cameras. In this work, we introduce an automatic system that achieves portrait DoF rendering for monocular cameras. Specifically, we first exploit Convolutional Neural Networks to estimate the relative depth and portrait segmentation maps from a single input image. Since these initial estimates from a single input are usually coarse and lack fine details, we further learn pixel affinities to refine the coarse estimation maps. With the refined estimation, we conduct depth and segmentation-aware blur rendering to the input image with a Conditional Random Field and image matting. In addition, we train a spatially-variant Recursive Neural Network to learn and accelerate this rendering process. We show that the proposed algorithm can effectively generate portraitures with realistic DoF effects using one single input. Experimental results also demonstrate that our depth and segmentation estimation modules perform favorably against the state-of-the-art methods both quantitatively and qualitatively.



This work is supported in part by National Nature Science Foundation of P.R. China (No. 611711184, 61673234, U1636124), the NSF CAREER Grant (No. 1149783), and gifts from Adobe and Nvidia.

Supplementary material

474192_1_En_3_MOESM1_ESM.pdf (94.6 mb)
Supplementary material 1 (pdf 96882 KB)


  1. 1.
    Bae, S., Durand, F.: Defocus magnification. Comput. Graph. Forum 26, 571–579 (2007)CrossRefGoogle Scholar
  2. 2.
    Barron, J.T., Adams, A., Shih, Y., Hernández, C.: Fast bilateral-space stereo for synthetic defocus. In: CVPR (2015)Google Scholar
  3. 3.
    Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: ICCV (2001)Google Scholar
  4. 4.
    Campbell, F.: The depth of field of the human eye. Optica Acta: Int. J. Opt. 4, 157–164 (1957)CrossRefGoogle Scholar
  5. 5.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  6. 6.
    Chen, Q., Li, D., Tang, C.: KNN matting. PAMI 25, 2175–2188 (2013)CrossRefGoogle Scholar
  7. 7.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NIPS (2016)Google Scholar
  8. 8.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  9. 9.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  10. 10.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  11. 11.
    Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part VIII. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). Scholar
  12. 12.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)Google Scholar
  13. 13.
    Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)Google Scholar
  14. 14.
    Huhle, B., Schairer, T., Jenke, P., Straßer, W.: Realistic depth blur for images with range data. In: Kolb, A., Koch, R. (eds.) Dyn3D 2009. LNCS, vol. 5742, pp. 84–95. Springer, Heidelberg (2009). Scholar
  15. 15.
    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? In: ECCV (2002)Google Scholar
  16. 16.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  17. 17.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV (2016)Google Scholar
  18. 18.
    Lee, S., Eisemann, E., Seidel, H.: Real-time lens blur effects and focus control. ACM Trans. Graph. (SIGGRAPH) 29, 1–7 (2010)Google Scholar
  19. 19.
    Liu, C., Xu, X., Zhang, Y.J.: Temporal attention network for action proposal. In: ICIP (2018)Google Scholar
  20. 20.
    Liu, N., Han, J.: DHSNet: deep hierarchical saliency network for salient object detection. In: CVPR (2016)Google Scholar
  21. 21.
    Liu, S., Pan, J., Yang, M.: Learning recursive filters for low-level vision via a hybrid neural network. In: ECCV (2016)Google Scholar
  22. 22.
    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.-H., Kautz, J.: Learning affinity via spatial propagation networks. In: NIPS (2017)Google Scholar
  23. 23.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  24. 24.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and Support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). Scholar
  25. 25.
    Shen, X., et al.: Automatic portrait segmentation for image stylization. Comput. Graph. Forum (Eurographics) 35, 93–102 (2016)CrossRefGoogle Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  27. 27.
    Soler, C., Subr, K., Durand, F., Holzschuch, N., Sillion, F.X.: Fourier depth of field. ACM Trans. Graph. 28, 18 (2009)CrossRefGoogle Scholar
  28. 28.
    Wu, J., Zheng, C., Hu, X., Wang, Y., Zhang, L.: Realistic rendering of bokeh effect based on optical aberrations. Vis. Comput. 26, 555–563 (2010)CrossRefGoogle Scholar
  29. 29.
    Wu, Z., Huang, Y., Yu, Y., Wang, L., Tan, T.: Early hierarchical contexts learned by convolutional networks for image segmentation. In: ICPR (2014)Google Scholar
  30. 30.
    Xie, J., Girshick, R.B., Farhadi, A.: Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In: ECCV (2016)Google Scholar
  31. 31.
    Xu, L., Ren, J., Yan, Q., Liao, R., Jia, J.: Deep edge-aware filters. In: ICML (2015)Google Scholar
  32. 32.
    Xu, X., Pan, J., Zhang, Y.J., Yang, M.H.: Motion blur kernel estimation via deep learning. TIP 27, 194–205 (2018)MathSciNetGoogle Scholar
  33. 33.
    Xu, X., Sun, D., Pan, J., Zhang, Y., Pfister, H., Yang, M.H.: Learning to super-resolve blurry face and text images. In: ICCV (2017)Google Scholar
  34. 34.
    Yu, X., Wang, R., Yu, J.: Real-time depth of field rendering via dynamic light field generation and filtering. Comput. Graph. Forum 29, 2099–2107 (2010)CrossRefGoogle Scholar
  35. 35.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar
  36. 36.
    Zhou, T., Chen, J.X., Pullen, J.M.: Accurate depth of field simulation in real time. Comput. Graph. Forum 26, 15–23 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiangyu Xu
    • 1
    • 2
    Email author
  • Deqing Sun
    • 3
  • Sifei Liu
    • 3
  • Wenqi Ren
    • 4
  • Yu-Jin Zhang
    • 1
  • Ming-Hsuan Yang
    • 5
    • 6
  • Jian Sun
    • 7
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.SenseTimeBeijingChina
  3. 3.NvidiaSanta ClaraUSA
  4. 4.Tencent AI LabBellevueUSA
  5. 5.UC MercedMercedUSA
  6. 6.GoogleMenlo ParkUSA
  7. 7.Face++BeijingChina

Personalised recommendations