Abstract
Semantic object segmentation is to label each pixel in an image or a video sequence to one of the object classes with semantic meanings. It has drawn a lot of research interest because of its wide applications to image and video search, editing and compression. It is a very challenging problem because a large number of object classes need to be distinguished and there is a large visual variability within each object class. In order to successfully segment objects, local appearance of objects, local consistency between labels of neighboring pixels, and long-range contextual information in an image need to be integrated under a unified framework. Such integration can be achieved using conditional random fields. Conditional random fields are discriminative models. Although they can learn the models of object classes more accurately and efficiently, they require training examples labeled at pixel-level and the labeling cost is expensive. The models of object classes can be learned with different levels of supervision. In some applications, such as web-based image and video search, a large number of object classes need to be modeled and therefore unsupervised learning or semi-supervised learning is preferred. Therefore some generative models, such as topic models, are used in object segmentation because of their capability to learn the object classes without supervision or with weak supervision of less labeling work. We will overview different technologies used in each step of the semantic object segmentation pipeline and discuss major challenges for each step. We will focus on conditional random fields and topic models, which are two types of frameworks widely used in semantic object segmentation. In video segmentation, we summarize and compare the frameworks of Markov random fields and conditional random fields, which are the representative models of the generative and discriminative approaches respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. International Journal of Computer Vision, 2010.
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, 2001.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
B. C. Russell and A. Torralba. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77:157–173, 2008.
Z. Y. Yao, X. Yang, and S. C. Zhu. Introduction to a large scale general purpose groundtruth dataset: Methodology, annotation tool, and benchmarks. In Proc. Int’l Conf. on EMMCVPR, 2007.
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, 1999.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
X. Wang and E. Grimson. Spatial latent dirichlet allocation. In Proc. Neural Information Processing Systems Conf., 2007.
T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43:29–44, 2001.
C. Schmid. Constructing models for content-based image retrieval. In IEEE International Conference on Computer Vision and Pattern Recognition, 2001.
J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43:7–27, 2001.
M. Varma and A. Zisserman. A statistical approach to texture classification from single images. International Journal of Computer Vision, 62:61–81, 2005.
D. Lowe. Distinctive image features from scale-invariant key points. International Journal of Computer Vision, 60:91–110, 2004.
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1615–1630, 2005.
N. Dalal and B. Triggs. Histogram of oriented gradients for human detection. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
Q. Zhu, M. Yeh, K. Cheng, and S. Avidan. Fast human detection using a cascade of histograms of oriented gradients. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In Proc. British Machine Vision Conference, 2002.
P. Forssen. Maximally stable colour regions for recognition and matching. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
H. Bay, A. Ess, T. Tuytelaars, and L. van Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110:346–359, 2008.
S. Lazebnik, S. Schmid, and J. Ponce. A sparse texture representation using local affine regions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27:1265–1278, 2005.
K. E. A. Sande, T. Gevers, and G.M. Snoek. Evaluation of color descriptors for object and scene recognition. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
M.A. Tahir, K. Sande, J. Uijlings, F. Yan, X. Li, K. Mikolajczyk, J. Kittler, T. Gevers, and A. Smeulders. Surreyuva-srkda method. In Pascal VOC 2008 Workshop, Marseille, France, 2008.
E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Proc. European Conf. Computer Vision, 2006.
F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In Proc. Int’l Conf. Computer Vision, 2005.
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
F. Moosmann, B. Tigggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. In Proc. Neural Information Processing Systems Conf., 2006.
C. Elkan. Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning, 2003.
J.C. Van Gemert, J. Geusebroek, C.J. Veenman, and A.W.M. Smeulders. Kernel codebooks for scene categorization. In Proc. European Conf. Computer Vision, 2008.
J. Shotton, M. Johnson, and Cipolla. Semantic texton forests for image categorization and segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 36: 3–42, 2006.
F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In Proc. European Conf. Computer Vision, 2006.
M. Marszalek and C. Schmid. Accurate object localization with shape masks. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80:300–316, 2008.
D. Cai, X. He, and J. Han. Efficient kernel discriminant analysis via spectral regression. In Proc. IEEE Int’l Conf. Data Mining, 2007.
D. Aldavert, A. Ramisa, R.L. Mantaras, and R. Toledo. Fast and robust object segmentation with the integral linear classifier. In IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random fields for image labeling. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.
J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81:2–23, 2009.
A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:854–869, 2007.
B.A. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In Proc. Int’l Conf. Computer Vision, 2009.
X. Ren and J. Malik. Learning a classification model for segmentation. In Proc. Int’l Conf. Computer Vision, 2003.
X. He, R. S. Zemel, and D. Ray. Learning and incorporating top-down cues in image segmentation. In Proc. European Conf. Computer Vision, 2006.
A. Torralba, K.P. Murphy, and W. Freeman. Contextual models for object detection using boosted random fields. In Proc. Neural Information Processing Systems Conf., 2004.
A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In Proc. Int’l Conf. Computer Vision, 2007.
A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In Proc. Neural Information Processing Systems Conf., 2004.
X. Ma and W.E.L. Grimson. Learning coupled conditional random field for image decomposition with application on object categorization. In IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
J. Reynolds and K. Murphy. Figure-ground segmentation using a hierarchical conditional random field. In Proc. of Canadian Conference on Computer and Robot Vision, 2007.
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proc. Int’l Conf. Computer Vision, 2005.
B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
J. Verbeek and B. Triggs. Region classification with markov field aspect models. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
J. Verbeek and B. Triggs. Scene segmentation with conditional random fields learned from partially labeled images. In Proc. Neural Information Processing Systems Conf., 2007.
T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of the National Academy of Sciences of the United States of America, 2004.
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach teseted on 101 object categories. In in Proc. IEEE CVPR Worshop of Generative Model Based Vision, 2004.
T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1:209–230m, 1973.
J Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22:888–905, 2000.
D. Larlus, J. Verbeek, and F. Jurie. Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. International Journal of Computer Vision, 88:238–253, 2010.
G. Passino, I. Patras, and E. Izquierdo. Latent semantics local distribution for crf-based image semantic segmentation. In Proc. British Machine Vision Conference, 2009.
E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77:291–330, 2007.
L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent object segmentation and classification. In Proc. Int’l Conf. Computer Vision, 2007.
J. Sun, W. Zhang, X. Tang, and H. Shum. Background cut. In Proc. European Conf. Computer Vision, 2006.
Y. Boykov and M. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in nd images. In Proc. Int’l Conf. Computer Vision, 2002.
A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov. Bilayer segmentation of live video. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.
C. Wojek and B. Schiele. A dynamic conditional random field model for joint labeling of object and scene classes. In Proc. European Conf. Computer Vision, 2008.
P. Yin, A. Criminisi, J. Winn, and M. Essa. Tree-based classifiers for bilayer video segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2007.
Y. Wang and Q. Ji. A dynamic conditional random field model for object segmentation in image sequences. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Wang, X. (2011). Semantic Object Segmentation. In: Ngan, K., Li, H. (eds) Video Segmentation and Its Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9482-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9482-0_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9481-3
Online ISBN: 978-1-4419-9482-0
eBook Packages: EngineeringEngineering (R0)