Skip to main content

A Holistic Approach for Data-Driven Object Cutout

  • Conference paper
  • First Online:
  • 2995 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10111))

Abstract

Object cutout is a fundamental operation for image editing and manipulation, yet it is extremely challenging to automate it in real-world images, which typically contain considerable background clutter. In contrast to existing cutout methods, which are based mainly on low-level image analysis, we propose a more holistic approach, which considers the entire shape of the object of interest by leveraging higher-level image analysis and learnt global shape priors. Specifically, we leverage a deep neural network (DNN) trained for objects of a particular class (chairs) for realizing this mechanism. Given a rectangular image region, the DNN outputs a probability map (P-map) that indicates for each pixel inside the rectangle how likely it is to be contained inside an object from the class of interest. We show that the resulting P-maps may be used to evaluate how likely a rectangle proposal is to contain an instance of the class, and further process good proposals to produce an accurate object cutout mask. This amounts to an automatic end-to-end pipeline for catergory-specific object cutout. We evaluate our approach on segmentation benchmark datasets, and show that it significantly outperforms the state-of-the-art on them.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    AP is short for average precision, which is the area under precision-recall (PR) curve. IoU is short for Intersection over Union, i.e., \({A(P \bigcap G)}/{A(P \bigcup G)}\), where P and G are segmentation prediction and ground truth, respectively, while \(A(\bullet )\) indicates their areas. To measure the precision of segmentation, \(AP^r\) is used, which is region based AP. Here, a segmentation is considered to be positive when it reaches 0.5 IoU.

References

  1. Aubry, M., Maturana, D., Efros, A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: Proceedings of the CVPR, pp. 3762–3769. IEEE (2014)

    Google Scholar 

  2. Chen, Y.T., Liu, X., Yang, M.H.: Multi-instance object segmentation with occlusion handling. In: Proceedings of the CVPR, pp. 3470–3478 (2015)

    Google Scholar 

  3. Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2photo: Internet image montage. ACM Trans. Graph. 28, 124:1–124:10 (2009)

    Google Scholar 

  4. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. PAMI 24, 603–619 (2002)

    Article  Google Scholar 

  5. Dai, J., He, K., Sun, J.: Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the ICCV (2015)

    Google Scholar 

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the CVPR, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  7. Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the CVPR, pp. 1538–1546. IEEE (2015)

    Google Scholar 

  8. Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. arXiv preprint arXiv:1506.02753 (2015)

  9. Everingham, M., Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88, 303–338 (2009)

    Article  Google Scholar 

  10. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59, 167–181 (2004)

    Article  Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587. IEEE (2014)

    Google Scholar 

  12. Girshick, R.: Fast R-CNN. In: Proceedings of the ICCV (2015)

    Google Scholar 

  13. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_20

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10578-9_23

    Google Scholar 

  15. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE PAMI 26, 147–159 (2004)

    Article  MATH  Google Scholar 

  16. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) NIPS, pp. 109–117. Curran Associates, Inc. (2011)

    Google Scholar 

  17. Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: Proceedings of the CVPR, pp. 1574–1582 (2015)

    Google Scholar 

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  19. Li, Y., Su, H., Qi, C.R., Fish, N., Cohen-Or, D., Guibas, L.J.: Joint embeddings of shapes and images via CNN image purification. ACM Trans. Graph. 34(6), 234 (2015)

    Google Scholar 

  20. Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)

  21. Liang, X., Wei, Y., Shen, X., Jie, Z., Feng, J., Lin, L., Yan, S.: Reversible recursive instance-level object segmentation. arXiv preprint arXiv:1511.04517 (2015)

  22. Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Trans. Graph. 23, 303–308 (2004)

    Article  Google Scholar 

  23. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  24. Mortensen, E.N., Barrett, W.A.: Intelligent scissors for image composition. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1995, pp. 191–198. ACM, New York (1995)

    Google Scholar 

  25. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the ICCV (2015)

    Google Scholar 

  26. Papandreou, G., Chen, L.C., Murphy, K., Yuille, A.L.: Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. In: Proceedings of the ICCV (2015)

    Google Scholar 

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the NIPS (2015)

    Google Scholar 

  28. Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309–314 (2004)

    Article  Google Scholar 

  29. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  30. Silberman, N., Sontag, D., Fergus, R.: Instance segmentation of indoor scenes using a coverage loss. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 616–631. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_40

    Google Scholar 

  31. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the ICCV (2015)

    Google Scholar 

  32. Su, H., Huang, Q., Mitra, N.J., Li, Y., Guibas, L.: Estimating image depth using shape collections. ACM Trans. Graph. 33, 37:1–37:11 (2014)

    Google Scholar 

  33. Su, H., Yi, E., Savva, M., Chang, A., Song, S., Yu, F., Li, Z., Xiao, J., Huang, Q., Savarese, S., Funkhouser, T., Hanrahan, P., Guibas, L.: Shapenet: an ongoing effort to establish a richly-annotated, large-scale dataset of 3d shapes (2015). http://shapenet.org

  34. Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104, 154–171 (2013)

    Article  Google Scholar 

  35. Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., Xiong, Y.: Photo-inspired model-driven 3D object modeling. ACM Trans. Graph. 30, 80:1–80:10 (2011)

    Google Scholar 

  36. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: Proceedings of the ICCV (2015)

    Google Scholar 

  37. Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 1–11 (2012)

    Google Scholar 

  38. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26

    Google Scholar 

Download references

Acknowledgement

We would first like to thank all the reviewers for their valuable comments and suggestions. This work is supported in part by grants from National 973 Program (2015CB352501), NSFC-ISF(61561146397), Shenzhen Knowledge innovation program for basic research (JCYJ20150402105524053).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangyan Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17228 KB)

Supplementary material 2 (pdf 26877 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Xu, H., Li, Y., Chen, W., Lischinski, D., Cohen-Or, D., Chen, B. (2017). A Holistic Approach for Data-Driven Object Cutout. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10111. Springer, Cham. https://doi.org/10.1007/978-3-319-54181-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54181-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54180-8

  • Online ISBN: 978-3-319-54181-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics