Semantic Texton Forests for Image Categorization and Segmentation

  • M. Johnson
  • J. Shotton
  • R. Cipolla
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Semantic texton forests (stfs) are a form of random decision forest that can be employed to produce powerful low-level codewords for computer vision. Each decision tree acts directly on image pixels, resulting in a codebook that bypasses the expensive computation of filter-bank responses or local descriptors. Further, stfs are extremely fast to both train and test, especially when compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the stfs provide both an implicit hierarchical clustering into semantic textons, and also an explicit pixel-wise local classification estimate. In this chapter we (i) investigate stfs as learned visual dictionaries; (ii) show how stfs can be used for both image categorization and semantic segmentation by aggregating hierarchical bags of semantic textons; (iii) demonstrate that stfs allow us to exploit semantic context in segmentation; and (iv) show how a global image-level categorization can be used as a prior to improve the accuracy of semantic segmentation. We also see that the efficient tree structures of stfs allow at least a five-fold increase in execution speed over competing techniques.

Keywords

Beach Pyramid 

Notes

Acknowledgements

We would like to thank J. Winn, B. Wenger, O. Yamaguchi, and V. Viitaniemi for helpful conversations and insights contributing to the work in this paper.

References

  1. 5.
    Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7) Google Scholar
  2. 36.
    Bosch A, Zisermann A, Muñoz X (2007) Image classification using random forests and ferns. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  3. 44.
    Breiman L (2001) Random forests. Mach Learn 45(1) Google Scholar
  4. 67.
    Chum O, Zisserman A (2007) An exemplar model for learning object classes. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  5. 81.
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV intl workshop on statistical learning in computer vision Google Scholar
  6. 97.
    Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proc intl conf on machine learning (ICML) Google Scholar
  7. 99.
    Everingham M, van Gool L, Williams C, Winn J, Zisserman A (2007) The Pascal visual object classes (VOC) challenge. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
  8. 104.
    Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  9. 128.
    Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 36(1) Google Scholar
  10. 141.
    Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  11. 157.
    He X, Zemel RS, Carreira-Perpiñán MÁ (2004) Multiscale conditional random fields for image labeling. In: Proc IEEE conf computer vision and pattern recognition (CVPR), June 2004, vol 2 Google Scholar
  12. 176.
    Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290(5802) Google Scholar
  13. 177.
    Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proc IEEE intl conf on computer vision (ICCV), vol 1 Google Scholar
  14. 204.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  15. 214.
    Lepetit V, Lagger P, Fua P (2005) Randomized trees for real-time keypoint recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  16. 219.
    Li L-J, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  17. 225.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2) Google Scholar
  18. 229.
    Malik J, Belongie S, Leung T, Shi J (2001) Contour and texture analysis for image segmentation. Int J Comput Vis 43(1) Google Scholar
  19. 248.
    Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1) Google Scholar
  20. 253.
    Moosmann F, Triggs B, Jurie F (2006) Fast discriminative visual codebooks using randomized clustering forests. In: Advances in neural information processing systems (NIPS) Google Scholar
  21. 266.
    Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  22. 267.
    Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proc European conf on computer vision (ECCV). Springer, Berlin Google Scholar
  23. 276.
    Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Vis Percept Prog Brain Res 155(1) Google Scholar
  24. 295.
    Porikli FM (2005) Integral histogram: a fast way to extract histograms in Cartesian spaces. In: Proc IEEE conf computer vision and pattern recognition (CVPR), vol 1 Google Scholar
  25. 303.
    Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  26. 321.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proc IEEE conf computer vision and pattern recognition (CVPR), Minneapolis, June 2007 Google Scholar
  27. 341.
    Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  28. 342.
    Shotton J, Winn JM, Rother C, Criminisi A (2009) TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1) Google Scholar
  29. 348.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  30. 375.
    Tu Z, Bai X (2010) Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans Pattern Anal Mach Intell 32(10) Google Scholar
  31. 376.
    Tuytelaars T, Schmid C (2007) Vector quantizing feature space with a regular lattice. In: Proc IEEE intl conf on computer vision (ICCV) Google Scholar
  32. 381.
    Varma M, Zisserman A (2005) A statistical approach to texture classification from single images. Int J Comput Vis 62(1–2) Google Scholar
  33. 384.
    Verbeek J, Triggs B (2007) Region classification with Markov field aspect models. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  34. 402.
    Winder S, Brown M (2007) Learning local image descriptors. In: Proc IEEE conf computer vision and pattern recognition (CVPR) Google Scholar
  35. 404.
    Winn J, Criminisi A, Minka T (2005) Categorization by learned universal visual dictionary. In: Proc IEEE intl conf on computer vision (ICCV), Beijing, China, October 2005, vol 2 Google Scholar
  36. 416.
    Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2) Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • M. Johnson
    • 1
  • J. Shotton
    • 2
  • R. Cipolla
    • 3
  1. 1.Unicorn MediaTempleUSA
  2. 2.Microsoft ResearchCambridgeUK
  3. 3.University of CambridgeCambridgeUK

Personalised recommendations