Semantic Texton Forests for Image Categorization and Segmentation

Johnson, M.; Shotton, J.; Cipolla, R.

doi:10.1007/978-1-4471-4929-3_15

M. Johnson³,
J. Shotton⁴ &
R. Cipolla⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

7628 Accesses
4 Citations

Abstract

Semantic texton forests (stfs) are a form of random decision forest that can be employed to produce powerful low-level codewords for computer vision. Each decision tree acts directly on image pixels, resulting in a codebook that bypasses the expensive computation of filter-bank responses or local descriptors. Further, stfs are extremely fast to both train and test, especially when compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the stfs provide both an implicit hierarchical clustering into semantic textons, and also an explicit pixel-wise local classification estimate. In this chapter we (i) investigate stfs as learned visual dictionaries; (ii) show how stfs can be used for both image categorization and semantic segmentation by aggregating hierarchical bags of semantic textons; (iii) demonstrate that stfs allow us to exploit semantic context in segmentation; and (iv) show how a global image-level categorization can be used as a prior to improve the accuracy of semantic segmentation. We also see that the efficient tree structures of stfs allow at least a five-fold increase in execution speed over competing techniques.

This work was undertaken while the first two authors were at the University of Cambridge and Toshiba Corporate Research and Development Center respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
At training time, we compute and store the distributions p _j(c) for all nodes j in the tree, not just for leaf nodes.
2.
This effect may be due to segmentation forest (b) being over-confident: looking at the five most likely classes inferred for each pixel, (b) achieves 87.6 % while (d) achieves a better 88.0 %.

References

Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7)
Google Scholar
Bosch A, Zisermann A, Muñoz X (2007) Image classification using random forests and ferns. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1)
Google Scholar
Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV intl workshop on statistical learning in computer vision
Google Scholar
Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proc intl conf on machine learning (ICML)
Google Scholar
Everingham M, van Gool L, Williams C, Winn J, Zisserman A (2007) The Pascal visual object classes (VOC) challenge. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 36(1)
Google Scholar
Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
He X, Zemel RS, Carreira-Perpiñán MÁ (2004) Multiscale conditional random fields for image labeling. In: Proc IEEE conf computer vision and pattern recognition (CVPR), June 2004, vol 2
Google Scholar
Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290(5802)
Google Scholar
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proc IEEE intl conf on computer vision (ICCV), vol 1
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Lepetit V, Lagger P, Fua P (2005) Randomized trees for real-time keypoint recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Li L-J, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2)
Google Scholar
Malik J, Belongie S, Leung T, Shi J (2001) Contour and texture analysis for image segmentation. Int J Comput Vis 43(1)
Google Scholar
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1)
Google Scholar
Moosmann F, Triggs B, Jurie F (2006) Fast discriminative visual codebooks using randomized clustering forests. In: Advances in neural information processing systems (NIPS)
Google Scholar
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proc European conf on computer vision (ECCV). Springer, Berlin
Google Scholar
Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Vis Percept Prog Brain Res 155(1)
Google Scholar
Porikli FM (2005) Integral histogram: a fast way to extract histograms in Cartesian spaces. In: Proc IEEE conf computer vision and pattern recognition (CVPR), vol 1
Google Scholar
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR), Minneapolis, June 2007
Google Scholar
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Shotton J, Winn JM, Rother C, Criminisi A (2009) TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1)
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Tu Z, Bai X (2010) Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10)
Google Scholar
Tuytelaars T, Schmid C (2007) Vector quantizing feature space with a regular lattice. In: Proc IEEE intl conf on computer vision (ICCV)
Google Scholar
Varma M, Zisserman A (2005) A statistical approach to texture classification from single images. Int J Comput Vis 62(1–2)
Google Scholar
Verbeek J, Triggs B (2007) Region classification with Markov field aspect models. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Winder S, Brown M (2007) Learning local image descriptors. In: Proc IEEE conf computer vision and pattern recognition (CVPR)
Google Scholar
Winn J, Criminisi A, Minka T (2005) Categorization by learned universal visual dictionary. In: Proc IEEE intl conf on computer vision (ICCV), Beijing, China, October 2005, vol 2
Google Scholar
Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2)
Google Scholar

Download references

Acknowledgements

We would like to thank J. Winn, B. Wenger, O. Yamaguchi, and V. Viitaniemi for helpful conversations and insights contributing to the work in this paper.

Author information

Authors and Affiliations

Unicorn Media, Temple, USA
M. Johnson
Microsoft Research, Cambridge, UK
J. Shotton
University of Cambridge, Cambridge, UK
R. Cipolla

Authors

M. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
J. Shotton
View author publications
You can also search for this author in PubMed Google Scholar
R. Cipolla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
A. Criminisi
Microsoft Research Ltd., 7 J.J. Thomson Avenue, Cambridge, CB3 0FB, United Kingdom
J. Shotton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Johnson, M., Shotton, J., Cipolla, R. (2013). Semantic Texton Forests for Image Categorization and Segmentation. In: Criminisi, A., Shotton, J. (eds) Decision Forests for Computer Vision and Medical Image Analysis. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4929-3_15

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4929-3_15
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4928-6
Online ISBN: 978-1-4471-4929-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics