Semantic Object Segmentation

Wang, Xiaogang

doi:10.1007/978-1-4419-9482-0_3

Xiaogang Wang³

997 Accesses

Abstract

Semantic object segmentation is to label each pixel in an image or a video sequence to one of the object classes with semantic meanings. It has drawn a lot of research interest because of its wide applications to image and video search, editing and compression. It is a very challenging problem because a large number of object classes need to be distinguished and there is a large visual variability within each object class. In order to successfully segment objects, local appearance of objects, local consistency between labels of neighboring pixels, and long-range contextual information in an image need to be integrated under a unified framework. Such integration can be achieved using conditional random fields. Conditional random fields are discriminative models. Although they can learn the models of object classes more accurately and efficiently, they require training examples labeled at pixel-level and the labeling cost is expensive. The models of object classes can be learned with different levels of supervision. In some applications, such as web-based image and video search, a large number of object classes need to be modeled and therefore unsupervised learning or semi-supervised learning is preferred. Therefore some generative models, such as topic models, are used in object segmentation because of their capability to learn the object classes without supervision or with weak supervision of less labeling work. We will overview different technologies used in each step of the semantic object segmentation pipeline and discuss major challenges for each step. We will focus on conditional random fields and topic models, which are two types of frameworks widely used in semantic object segmentation. In video segmentation, we summarize and compare the frameworks of Markov random fields and conditional random fields, which are the representative models of the generative and discriminative approaches respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
Google Scholar
G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. International Journal of Computer Vision, 2010.
Google Scholar
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, 2001.
Google Scholar
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
B. C. Russell and A. Torralba. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77:157–173, 2008.
Article Google Scholar
Z. Y. Yao, X. Yang, and S. C. Zhu. Introduction to a large scale general purpose groundtruth dataset: Methodology, annotation tool, and benchmarks. In Proc. Int’l Conf. on EMMCVPR, 2007.
Google Scholar
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, 1999.
Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
Article MATH Google Scholar
X. Wang and E. Grimson. Spatial latent dirichlet allocation. In Proc. Neural Information Processing Systems Conf., 2007.
Google Scholar
T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43:29–44, 2001.
Article MATH Google Scholar
C. Schmid. Constructing models for content-based image retrieval. In IEEE International Conference on Computer Vision and Pattern Recognition, 2001.
Google Scholar
J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43:7–27, 2001.
Article MATH Google Scholar
M. Varma and A. Zisserman. A statistical approach to texture classification from single images. International Journal of Computer Vision, 62:61–81, 2005.
Google Scholar
D. Lowe. Distinctive image features from scale-invariant key points. International Journal of Computer Vision, 60:91–110, 2004.
Article Google Scholar
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1615–1630, 2005.
Article Google Scholar
N. Dalal and B. Triggs. Histogram of oriented gradients for human detection. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
Google Scholar
Q. Zhu, M. Yeh, K. Cheng, and S. Avidan. Fast human detection using a cascade of histograms of oriented gradients. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
Google Scholar
J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In Proc. British Machine Vision Conference, 2002.
Google Scholar
P. Forssen. Maximally stable colour regions for recognition and matching. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
H. Bay, A. Ess, T. Tuytelaars, and L. van Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110:346–359, 2008.
Article Google Scholar
S. Lazebnik, S. Schmid, and J. Ponce. A sparse texture representation using local affine regions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1265–1278, 2005.
Article Google Scholar
K. E. A. Sande, T. Gevers, and G.M. Snoek. Evaluation of color descriptors for object and scene recognition. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
Google Scholar
M.A. Tahir, K. Sande, J. Uijlings, F. Yan, X. Li, K. Mikolajczyk, J. Kittler, T. Gevers, and A. Smeulders. Surreyuva-srkda method. In Pascal VOC 2008 Workshop, Marseille, France, 2008.
Google Scholar
E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Proc. European Conf. Computer Vision, 2006.
Google Scholar
F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In Proc. Int’l Conf. Computer Vision, 2005.
Google Scholar
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
Google Scholar
F. Moosmann, B. Tigggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. In Proc. Neural Information Processing Systems Conf., 2006.
Google Scholar
C. Elkan. Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning, 2003.
Google Scholar
J.C. Van Gemert, J. Geusebroek, C.J. Veenman, and A.W.M. Smeulders. Kernel codebooks for scene categorization. In Proc. European Conf. Computer Vision, 2008.
Google Scholar
J. Shotton, M. Johnson, and Cipolla. Semantic texton forests for image categorization and segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 36: 3–42, 2006.
Article Google Scholar
F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In Proc. European Conf. Computer Vision, 2006.
Google Scholar
M. Marszalek and C. Schmid. Accurate object localization with shape masks. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80:300–316, 2008.
Article Google Scholar
D. Cai, X. He, and J. Han. Efficient kernel discriminant analysis via spectral regression. In Proc. IEEE Int’l Conf. Data Mining, 2007.
Google Scholar
D. Aldavert, A. Ramisa, R.L. Mantaras, and R. Toledo. Fast and robust object segmentation with the integral linear classifier. In IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
Google Scholar
X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random fields for image labeling. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.
Google Scholar
J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81:2–23, 2009.
Article Google Scholar
A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:854–869, 2007.
Article Google Scholar
B.A. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In Proc. Int’l Conf. Computer Vision, 2009.
Google Scholar
X. Ren and J. Malik. Learning a classification model for segmentation. In Proc. Int’l Conf. Computer Vision, 2003.
Google Scholar
X. He, R. S. Zemel, and D. Ray. Learning and incorporating top-down cues in image segmentation. In Proc. European Conf. Computer Vision, 2006.
Google Scholar
A. Torralba, K.P. Murphy, and W. Freeman. Contextual models for object detection using boosted random fields. In Proc. Neural Information Processing Systems Conf., 2004.
Google Scholar
A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In Proc. Int’l Conf. Computer Vision, 2007.
Google Scholar
A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In Proc. Neural Information Processing Systems Conf., 2004.
Google Scholar
X. Ma and W.E.L. Grimson. Learning coupled conditional random field for image decomposition with application on object categorization. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
Google Scholar
J. Reynolds and K. Murphy. Figure-ground segmentation using a hierarchical conditional random field. In Proc. of Canadian Conference on Computer and Robot Vision, 2007.
Google Scholar
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proc. Int’l Conf. Computer Vision, 2005.
Google Scholar
B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
Google Scholar
J. Verbeek and B. Triggs. Region classification with markov field aspect models. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
J. Verbeek and B. Triggs. Scene segmentation with conditional random fields learned from partially labeled images. In Proc. Neural Information Processing Systems Conf., 2007.
Google Scholar
T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of the National Academy of Sciences of the United States of America, 2004.
Google Scholar
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach teseted on 101 object categories. In in Proc. IEEE CVPR Worshop of Generative Model Based Vision, 2004.
Google Scholar
T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1:209–230m, 1973.
Article MathSciNet MATH Google Scholar
J Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22:888–905, 2000.
Google Scholar
D. Larlus, J. Verbeek, and F. Jurie. Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. International Journal of Computer Vision, 88:238–253, 2010.
Article Google Scholar
G. Passino, I. Patras, and E. Izquierdo. Latent semantics local distribution for crf-based image semantic segmentation. In Proc. British Machine Vision Conference, 2009.
Google Scholar
E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77:291–330, 2007.
Article Google Scholar
L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent object segmentation and classification. In Proc. Int’l Conf. Computer Vision, 2007.
Google Scholar
J. Sun, W. Zhang, X. Tang, and H. Shum. Background cut. In Proc. European Conf. Computer Vision, 2006.
Google Scholar
Y. Boykov and M. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in nd images. In Proc. Int’l Conf. Computer Vision, 2002.
Google Scholar
A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov. Bilayer segmentation of live video. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
Google Scholar
C. Wojek and B. Schiele. A dynamic conditional random field model for joint labeling of object and scene classes. In Proc. European Conf. Computer Vision, 2008.
Google Scholar
P. Yin, A. Criminisi, J. Winn, and M. Essa. Tree-based classifiers for bilayer video segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
Y. Wang and Q. Ji. A dynamic conditional random field model for object segmentation in image sequences. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China
Xiaogang Wang

Authors

Xiaogang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaogang Wang .

Editor information

Editors and Affiliations

, Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China, People's Republic
King Ngi Ngan
Technology of China, School of Electronic Engineering, University of Electronic Science &, Chengdu, 610054, China, People's Republic
Hongliang Li

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X. (2011). Semantic Object Segmentation. In: Ngan, K., Li, H. (eds) Video Segmentation and Its Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9482-0_3

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9482-0_3
Published: 21 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9481-3
Online ISBN: 978-1-4419-9482-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics