Skip to main content

Semantic Object Segmentation

  • Chapter
  • First Online:
Video Segmentation and Its Applications
  • 997 Accesses

Abstract

Semantic object segmentation is to label each pixel in an image or a video sequence to one of the object classes with semantic meanings. It has drawn a lot of research interest because of its wide applications to image and video search, editing and compression. It is a very challenging problem because a large number of object classes need to be distinguished and there is a large visual variability within each object class. In order to successfully segment objects, local appearance of objects, local consistency between labels of neighboring pixels, and long-range contextual information in an image need to be integrated under a unified framework. Such integration can be achieved using conditional random fields. Conditional random fields are discriminative models. Although they can learn the models of object classes more accurately and efficiently, they require training examples labeled at pixel-level and the labeling cost is expensive. The models of object classes can be learned with different levels of supervision. In some applications, such as web-based image and video search, a large number of object classes need to be modeled and therefore unsupervised learning or semi-supervised learning is preferred. Therefore some generative models, such as topic models, are used in object segmentation because of their capability to learn the object classes without supervision or with weak supervision of less labeling work. We will overview different technologies used in each step of the semantic object segmentation pipeline and discuss major challenges for each step. We will focus on conditional random fields and topic models, which are two types of frameworks widely used in semantic object segmentation. In video segmentation, we summarize and compare the frameworks of Markov random fields and conditional random fields, which are the representative models of the generative and discriminative approaches respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.

  2. J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.

    Google Scholar 

  3. G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. International Journal of Computer Vision, 2010.

    Google Scholar 

  4. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, 2001.

    Google Scholar 

  5. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

  6. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.

  7. B. C. Russell and A. Torralba. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77:157–173, 2008.

    Article  Google Scholar 

  8. Z. Y. Yao, X. Yang, and S. C. Zhu. Introduction to a large scale general purpose groundtruth dataset: Methodology, annotation tool, and benchmarks. In Proc. Int’l Conf. on EMMCVPR, 2007.

    Google Scholar 

  9. T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, 1999.

    Google Scholar 

  10. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

    Article  MATH  Google Scholar 

  11. X. Wang and E. Grimson. Spatial latent dirichlet allocation. In Proc. Neural Information Processing Systems Conf., 2007.

    Google Scholar 

  12. T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43:29–44, 2001.

    Article  MATH  Google Scholar 

  13. C. Schmid. Constructing models for content-based image retrieval. In IEEE International Conference on Computer Vision and Pattern Recognition, 2001.

    Google Scholar 

  14. J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43:7–27, 2001.

    Article  MATH  Google Scholar 

  15. M. Varma and A. Zisserman. A statistical approach to texture classification from single images. International Journal of Computer Vision, 62:61–81, 2005.

    Google Scholar 

  16. D. Lowe. Distinctive image features from scale-invariant key points. International Journal of Computer Vision, 60:91–110, 2004.

    Article  Google Scholar 

  17. K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1615–1630, 2005.

    Article  Google Scholar 

  18. N. Dalal and B. Triggs. Histogram of oriented gradients for human detection. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.

    Google Scholar 

  19. Q. Zhu, M. Yeh, K. Cheng, and S. Avidan. Fast human detection using a cascade of histograms of oriented gradients. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  20. J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In Proc. British Machine Vision Conference, 2002.

    Google Scholar 

  21. P. Forssen. Maximally stable colour regions for recognition and matching. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  22. H. Bay, A. Ess, T. Tuytelaars, and L. van Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110:346–359, 2008.

    Article  Google Scholar 

  23. S. Lazebnik, S. Schmid, and J. Ponce. A sparse texture representation using local affine regions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1265–1278, 2005.

    Article  Google Scholar 

  24. K. E. A. Sande, T. Gevers, and G.M. Snoek. Evaluation of color descriptors for object and scene recognition. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.

    Google Scholar 

  25. M.A. Tahir, K. Sande, J. Uijlings, F. Yan, X. Li, K. Mikolajczyk, J. Kittler, T. Gevers, and A. Smeulders. Surreyuva-srkda method. In Pascal VOC 2008 Workshop, Marseille, France, 2008.

    Google Scholar 

  26. E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Proc. European Conf. Computer Vision, 2006.

    Google Scholar 

  27. F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In Proc. Int’l Conf. Computer Vision, 2005.

    Google Scholar 

  28. D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  29. F. Moosmann, B. Tigggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. In Proc. Neural Information Processing Systems Conf., 2006.

    Google Scholar 

  30. C. Elkan. Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning, 2003.

    Google Scholar 

  31. J.C. Van Gemert, J. Geusebroek, C.J. Veenman, and A.W.M. Smeulders. Kernel codebooks for scene categorization. In Proc. European Conf. Computer Vision, 2008.

    Google Scholar 

  32. J. Shotton, M. Johnson, and Cipolla. Semantic texton forests for image categorization and segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  33. P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 36: 3–42, 2006.

    Article  Google Scholar 

  34. F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In Proc. European Conf. Computer Vision, 2006.

    Google Scholar 

  35. M. Marszalek and C. Schmid. Accurate object localization with shape masks. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  36. S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80:300–316, 2008.

    Article  Google Scholar 

  37. D. Cai, X. He, and J. Han. Efficient kernel discriminant analysis via spectral regression. In Proc. IEEE Int’l Conf. Data Mining, 2007.

    Google Scholar 

  38. D. Aldavert, A. Ramisa, R.L. Mantaras, and R. Toledo. Fast and robust object segmentation with the integral linear classifier. In IEEE International Conference on Computer Vision and Pattern Recognition, 2010.

    Google Scholar 

  39. X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random fields for image labeling. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  40. J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81:2–23, 2009.

    Article  Google Scholar 

  41. A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:854–869, 2007.

    Article  Google Scholar 

  42. B.A. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In Proc. Int’l Conf. Computer Vision, 2009.

    Google Scholar 

  43. X. Ren and J. Malik. Learning a classification model for segmentation. In Proc. Int’l Conf. Computer Vision, 2003.

    Google Scholar 

  44. X. He, R. S. Zemel, and D. Ray. Learning and incorporating top-down cues in image segmentation. In Proc. European Conf. Computer Vision, 2006.

    Google Scholar 

  45. A. Torralba, K.P. Murphy, and W. Freeman. Contextual models for object detection using boosted random fields. In Proc. Neural Information Processing Systems Conf., 2004.

    Google Scholar 

  46. A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In Proc. Int’l Conf. Computer Vision, 2007.

    Google Scholar 

  47. A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In Proc. Neural Information Processing Systems Conf., 2004.

    Google Scholar 

  48. X. Ma and W.E.L. Grimson. Learning coupled conditional random field for image decomposition with application on object categorization. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.

    Google Scholar 

  49. J. Reynolds and K. Murphy. Figure-ground segmentation using a hierarchical conditional random field. In Proc. of Canadian Conference on Computer and Robot Vision, 2007.

    Google Scholar 

  50. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proc. Int’l Conf. Computer Vision, 2005.

    Google Scholar 

  51. B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  52. J. Verbeek and B. Triggs. Region classification with markov field aspect models. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  53. J. Verbeek and B. Triggs. Scene segmentation with conditional random fields learned from partially labeled images. In Proc. Neural Information Processing Systems Conf., 2007.

    Google Scholar 

  54. T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of the National Academy of Sciences of the United States of America, 2004.

    Google Scholar 

  55. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach teseted on 101 object categories. In in Proc. IEEE CVPR Worshop of Generative Model Based Vision, 2004.

    Google Scholar 

  56. T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1:209–230m, 1973.

    Article  MathSciNet  MATH  Google Scholar 

  57. J Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22:888–905, 2000.

    Google Scholar 

  58. D. Larlus, J. Verbeek, and F. Jurie. Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. International Journal of Computer Vision, 88:238–253, 2010.

    Article  Google Scholar 

  59. G. Passino, I. Patras, and E. Izquierdo. Latent semantics local distribution for crf-based image semantic segmentation. In Proc. British Machine Vision Conference, 2009.

    Google Scholar 

  60. E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77:291–330, 2007.

    Article  Google Scholar 

  61. L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent object segmentation and classification. In Proc. Int’l Conf. Computer Vision, 2007.

    Google Scholar 

  62. J. Sun, W. Zhang, X. Tang, and H. Shum. Background cut. In Proc. European Conf. Computer Vision, 2006.

    Google Scholar 

  63. Y. Boykov and M. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in nd images. In Proc. Int’l Conf. Computer Vision, 2002.

    Google Scholar 

  64. A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov. Bilayer segmentation of live video. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  65. C. Wojek and B. Schiele. A dynamic conditional random field model for joint labeling of object and scene classes. In Proc. European Conf. Computer Vision, 2008.

    Google Scholar 

  66. P. Yin, A. Criminisi, J. Winn, and M. Essa. Tree-based classifiers for bilayer video segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  67. Y. Wang and Q. Ji. A dynamic conditional random field model for object segmentation in image sequences. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaogang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Wang, X. (2011). Semantic Object Segmentation. In: Ngan, K., Li, H. (eds) Video Segmentation and Its Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9482-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9482-0_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-9481-3

  • Online ISBN: 978-1-4419-9482-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics