Advertisement

Visual Object Class Recognition

  • Michael StarkEmail author
  • Bernt Schiele
  • Aleš Leonardis
Part of the Springer Handbooks book series (SHB)

Abstract

Object class recognition is among the most fundamental problems in computer vision and thus has been researched intensively over the years. This chapter is mostly concerned with the recognition and detection of basic level object classes such as cars, persons, chairs, or dogs. We will review the state of the art and in particular discuss the most promising methods available today.

Keywords

Object Class Convolutional Neural Network Multiple Kernel Learning Probabilistic Graphical Model Fisher Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
2-D

two-dimensional

3-D

three-dimensional

ANN

artificial neural network

BOw

bag-of-word

CAD

computer-aided design

CNN

convolution neural network

convolutional neural network

COCO

common objects in context

DOG

difference of Gaussian

DPM

deformable part model

deformable part model

EM

expectation maximization

GMM

Gaussian mixture model

HOG

histogram of oriented gradient

ISM

implicit shape model

LDA

latent Dirichlet allocation

LLC

locality constrained linear coding

LSVM

latent support vector machine

MAP

maximum a posteriori

MDL

minimum description length

MKL

multiple kernel learning

MSER

maximally stable extremal region

PGM

probabilistic graphical model

PLSA

probabilistic latent semantic analysis

SC

sparse coding

SIFT

scale-invariant feature transform

SPM

spatial pyramid matching

SUN

scene understanding

SVM

support vector machine

VOC

visual object class

References

  1. 33.1
    D. Marr: Vision (Freeman, San Francisco 1982)Google Scholar
  2. 33.2
    D.H. Ballard, C.M. Brown: Computer Vision (Prentice Hall, Englewood Cliffs 1982)Google Scholar
  3. 33.3
    R. Brown: How shall a thing be called?, Psychol. Rev. 65, 14–21 (1958)CrossRefGoogle Scholar
  4. 33.4
    R. Brown: Social Psychology (Free, New York 1965)Google Scholar
  5. 33.5
    E. Rosch, C. Mervis, W. Gray, D. Johnson, P. Boyes-Braem: Basic objects in natural categories, Cogn. Psychol. 8, 382–439 (1976)CrossRefGoogle Scholar
  6. 33.6
    G. Lakoff: Women, Fire, and Dangerous Things – What Categories Reveal About the Mind (Univ. Chicago Press, Chicago 1987)CrossRefGoogle Scholar
  7. 33.7
    S. Dickinson, A. Leonardis, B. Schiele, M. Tarr: Object Categorization: Computer and Human Vision Perspectives (Cambridge Univ. Press, Cambridge 2009)CrossRefGoogle Scholar
  8. 33.8
    G. Csurka, C.R. Dance, L. Fan, J. Willarnowski, C. Bray: Visual categorization with bags of keypoints, Eur. Conf. Comput. Vis. (ECCV) (2004)Google Scholar
  9. 33.9
    D. Lowe: Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  10. 33.10
    K. Mikolajczyk, C. Schmid: A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2004)CrossRefGoogle Scholar
  11. 33.11
    J. Matas, O. Chum, M. Urban, T. Pajdla: Robust wide baseline stereo from maximally stable extremal regions, Image Vis. Comput. 22(10), 761–767 (2004)CrossRefGoogle Scholar
  12. 33.12
    T. Tuytelaars: Dense interest points, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2010)Google Scholar
  13. 33.13
    R.O. Duda, P.E. Hart, D.G. Stork: Pattern Classification (Wiley, New York 2000)zbMATHGoogle Scholar
  14. 33.14
    F. Jurie, B. Triggs: Creating efficient codebooks for visual recognition, 10th IEEE Int. Conf. Comput. Vis. (ICCV) (2005)Google Scholar
  15. 33.15
    B. Schiele, J.L. Crowley: Recognition without correspondence using multidimensional receptive field histograms, Int. J. Comput. Vis. 36(1), 31–52 (2000)CrossRefGoogle Scholar
  16. 33.16
    T. Hofmann: Probabilistic latent semantic indexing, Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (1999)Google Scholar
  17. 33.17
    D.M. Blei, A.Y. Ng, M.I. Jordan: Latent dirichlet allocation, J. Mach. Learn. Res. 3, 983–1022 (2003)zbMATHGoogle Scholar
  18. 33.18
    A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. R. Statist. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  19. 33.19
    Z. Shi, Y. Yang, T.M. Hospedales, T. Xiang: Weakly supervised learning of objects, attributes and their associations, Eur. Conf. Comput. Vis. (ECCV) (2014) pp. 472–487Google Scholar
  20. 33.20
    S. Lazebnik, C. Schmid, J. Ponce: Spatial pyramid matching. In: Object Categorization, ed. by S. Dickinson, A. Leonardis, B. Schiele, M. Tarr (Cambridge Univ. Press, Cambridge 2009) pp. 401–415CrossRefGoogle Scholar
  21. 33.21
    C. Cortes, V. Vapnik: Support-vector networks, Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  22. 33.22
    S. Maji, A.C. Berg, J. Malik: Classification using intersection kernel support vector machines is efficient, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2008)Google Scholar
  23. 33.23
    Y. Freund, R.E. Schapire: A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 33.24
    L. Breiman: Random forests, Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  25. 33.25
    T. Lindeberg: Feature detection with automatic scale selection, Int. J. Comput. Vis. 30(2), 79–116 (1998)CrossRefGoogle Scholar
  26. 33.26
    C.H. Lampert, M.B. Blaschko, T. Hofmann: Efficient subwindow search: A branch and bound framework for object localization, IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2129–2142 (2009)CrossRefGoogle Scholar
  27. 33.27
    S. Lazebnik, C. Schmid, J. Ponce: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2006)Google Scholar
  28. 33.28
    V. Ferrari, L. Fevrier, F. Jurie, C. Schmid: Groups of adjacent contour segments for object detection, IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 36–51 (2008)CrossRefGoogle Scholar
  29. 33.29
    E. Shechtman, M. Irani: Matching local self-similarities across images and videos, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2007)Google Scholar
  30. 33.30
    F. Perronnin, C. Dance: Fisher kernels on visual vocabularies for image categorization, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2007)Google Scholar
  31. 33.31
    J. Yang, K. Yu, Y. Gong, T. Huang: Linear spatial pyramid matching using sparse coding for image classification, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2009)Google Scholar
  32. 33.32
    J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong: Locality-constrained linear coding for image classification, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2010)Google Scholar
  33. 33.33
    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei: ImageNet: A large-scale hierarchical image database, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2009)Google Scholar
  34. 33.34
    N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A.J. Rodriguez-Sanchez, L. Wiskott: Deep hierarchies in the primate visual cortex: What can we learn for computer vision?, IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1847–1871 (2013)CrossRefGoogle Scholar
  35. 33.35
    R. Farrell, O. Oza, N. Zhang, V.I. Morariu, T. Darrell, L.S. Davis: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance, IEEE Int. Conf. Comput. Vis. (ICCV) (2011)Google Scholar
  36. 33.36
    J. Krause, M. Stark, J. Deng, L. Fei-Fei: 3d object representations for fine-grained categorization, 4th Int. IEEE Workshop 3D Represent. Recognit. (3dRR-13), Sydney (2013)Google Scholar
  37. 33.37
    M. Varma, D. Ray: Learning the discriminative power-invariance trade-off, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2007)Google Scholar
  38. 33.38
    A. Vedaldi, V. Gulshan, M. Varma, A. Zisserman: Multiple kernels for object detection, IEEE Int. Conf. Comput. Vis. (ICCV) (2009)Google Scholar
  39. 33.39
    M.A. Fischler, R.A. Elschlager: The representation and matching of pictorial structures, IEEE Trans. Comput. 22(1), 67–92 (1973)CrossRefGoogle Scholar
  40. 33.40
    M. Weber, M. Welling, P. Perona: Unsupervised learning of models for recognition, Eur. Conf. Comput. Vis. (ECCV) (2000)Google Scholar
  41. 33.41
    R. Fergus, P. Perona, A. Zisserman: Object class recognition by unsupervised scale-invariant learning, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2003)Google Scholar
  42. 33.42
    M. Stark, M. Goesele, B. Schiele: A shape-based object class model for knowledge transfer, IEEE Int. Conf. Comput. Vis. (ICCV) (2009)Google Scholar
  43. 33.43
    L. Fei-Fei, R. Fergus, P. Perona: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2004) pp. 178–186Google Scholar
  44. 33.44
    B. Leibe, A. Leonardis, B. Schiele: Robust object detection by interleaving categorization and segmentation, Int. J. Comput. Vis. 77(1–3), 259–289 (2008)CrossRefGoogle Scholar
  45. 33.45
    S. Maji, J. Malik: Object detection using a max-margin Hough transform, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2009)Google Scholar
  46. 33.46
    L. Bourdev, J. Malik: Poselets: Body part detectors trained using 3D human pose annotations, IEEE Int. Conf. Comput. Vis. (ICCV) (2009)Google Scholar
  47. 33.47
    N. Dalal, B. Triggs: Histograms of oriented gradients for human detection, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2005) pp. 886–893Google Scholar
  48. 33.48
    B. Leibe, E. Seemann, B. Schiele: Pedestrian detection in crowded scenes, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Washington (2005) pp. 878–885Google Scholar
  49. 33.49
    A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, L. Van Gool: Towards multi-view object class detection, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2006)Google Scholar
  50. 33.50
    M. Arie-Nachimson, R. Basri: Constructing implicit 3D shape models for pose estimation, IEEE Int. Conf. Comput. Vis. (ICCV) (2009)Google Scholar
  51. 33.51
    P.F. Felzenszwalb, D.P. Huttenlocher: Efficient matching of pictorial structures, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2000)Google Scholar
  52. 33.52
    M. Andriluka, S. Roth, B. Schiele: Pictorial structures revisited: People detection and articulated pose estimation, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2009)Google Scholar
  53. 33.53
    P.F. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan: Object detection with discriminatively trained part based models, IEEE Trans. Pattern Anal. Machin. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  54. 33.54
    F. Wang, Y. Li: Beyond physical connections: Tree models in human pose estimation, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2013) pp. 596–603Google Scholar
  55. 33.55
    M. Sun, M. Telaprolu, H. Lee, S. Savarese: An efficient branch-and-bound algorithm for optimal human pose estimation, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2012)Google Scholar
  56. 33.56
    P. Bojan, M. Stark: Multi-view and 3D deformable part models, IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(11), 2232–2245 (2015)CrossRefGoogle Scholar
  57. 33.57
    N. Dalal, B. Triggs, C. Schmid: Human detection using oriented histograms of flow and appearance, Eur. Conf. Comput. Vis. (ECCV) (2006)Google Scholar
  58. 33.58
    G. Bradski: Opencv. http://opencv.org/ (July 09, 2015)
  59. 33.59
    P.F. Felzenszwalb, D.P. Huttenlocher: Distance transforms of sampled functions, Technical Report 1963 (Cornell Univ., Ithaca 2004)zbMATHGoogle Scholar
  60. 33.60
    C.-N.J. Yu, T. Joachims: Learning structural SVMs with latent variables, ACM Proc. 26th Annu. Int. Conf. Mach. Learn., New York (2009) pp. 1169–1176Google Scholar
  61. 33.61
    M. Everingham, L. Gool, C.K. Williams, J. Winn, A. Zisserman: The PASCAL visual object classes (VOC) challenge, Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  62. 33.62
    B. Pepik, M. Stark, P. Gehler, B. Schiele: Teaching 3d geometry to deformable part models, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2012)Google Scholar
  63. 33.63
    B. Pepik, P. Gehler, M. Stark, B. Schiele: 3D${}^{2}$PM–3D deformable part models, Eur. Conf. Comput. Vis. (ECCV) (2012)Google Scholar
  64. 33.64
    P. Ott, M. Everingham: Shared parts for deformable part-based models, IEEE Comput. Vis. Pattern Recognit. (CVPR) (2011)Google Scholar
  65. 33.65
    H.O. Song, S. Zickler, T. Althoff, R. Girshick, M. Fritz, C. Geyer, P. Felzenszwalb, T. Darrell: Sparselet models for efficient multiclass object detection, Eur. Conf. Comput. Vis. (ECCV) (2012)Google Scholar
  66. 33.66
    P. Felzenszwalb, R. Girshick, D. McAllester: Cascade object detection with deformable part models, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2010)Google Scholar
  67. 33.67
    T. Gao, M. Stark, D. Koller: What makes a good detector? — structured priors for learning from few examples, Eur. Conf. Comput. Vis. (ECCV) (2012)Google Scholar
  68. 33.68
    B. Hariharan, J. Malik, D. Ramanan: Discriminative decorrelation for clustering and classification, Eur. Conf. Comput. Vis. (ECCV) (2012)Google Scholar
  69. 33.69
    S. Haykin: Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall, Upper Saddle River 1998)zbMATHGoogle Scholar
  70. 33.70
    Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel: Backpropagation applied to handwritten zip code recognition, Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  71. 33.71
    A. Krizhevsky, I. Sutskever, G.E. Hinton: Imagenet classification with deep convolutional neural networks, Adv. Neural Inform. Process. Syst. 25, 1097–1105 (2012)Google Scholar
  72. 33.72
    Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell: Caffe: Convolutional architecture for fast feature embedding, http://caffe.berkeleyvision.org/ (arXiv preprint arXiv:1408.5093) (2014)
  73. 33.73
    J. Hosang, M. Omran, R. Benenson, B. Schiele: Taking a deeper look at pedestrians, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2015)Google Scholar
  74. 33.74
    M.D. Zeiler, R. Fergus: Visualizing and understanding convolutional networks, Eur. Conf. Comput Vis. (ECCV) (2014)Google Scholar
  75. 33.75
    B. Alexe, T. Deselares, V. Ferrari: Measuring the objectness of image windows, IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  76. 33.76
    J. Hosang, R. Benenson, B. Schiele: How good are detection proposals, really?, 25th Br. Mach. Vis. Conf. (BMVC) (2014)Google Scholar
  77. 33.77
    K.E.A. van de Sande, J.R.R. Uijlings, T. Gevers, A.W.M. Smeulders: Segmentation as selective search for object recognition, IEEE Int. Conf. Comput. Vis. (ICCV) (2013)Google Scholar
  78. 33.78
    B. Hariharan, P. Arbeláez, R. Girshick, J. Malik: Simultaneous detection and segmentation, Eur. Conf. Comput. Vis. (ECCV) (2014)Google Scholar
  79. 33.79
    R. Girshick, J. Donahue, T. Darrell, J. Malik: Rich feature hierarchies for accurate object detection and semantic segmentation, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2014)Google Scholar
  80. 33.80
    A. Toshev, C. Szegedy: Deeppose: Human pose estimation via deep neural networks, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2014)Google Scholar
  81. 33.81
    J. Deng, J. Krause, M. Stark, L. Fei-Fei: Leveraging the wisdom of the crowd for fine-grained recognition, IEEE Trans. Pattern Anal. Mach. Intel. (2015)Google Scholar
  82. 33.82
    J. Deng, J. Krause, L. Fei-Fei: Fine-grained crowdsourcing for fine-grained recognition, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2013)Google Scholar
  83. 33.83
    N. Zhang, R. Farrell, T. Darrell: Pose pooling kernels for sub-category recognition, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2012)Google Scholar
  84. 33.84
    T. Berg, P.N. Belhumeur: Poof: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2013)Google Scholar
  85. 33.85
    A. Torralba, A.A. Efros: Unbiased look at dataset bias, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2011)Google Scholar
  86. 33.86
    Y. Xiang, R. Mottaghi, S. Savarese: Beyond pascal: A benchmark for 3d object detection in the wild, IEEE Winter Conf. Appl. Comput. Vis. (WACV) (2014)Google Scholar
  87. 33.87
    J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba: Sun database: Large-scale scene recognition from abbey to zoo, IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) (2010)Google Scholar
  88. 33.88
    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick: Microsoft coco: Common objects in context, Lect. Notes Comput. Sci. 8693, 740–755 (2014)CrossRefGoogle Scholar
  89. 33.89
    G.A. Miller: Wordnet: A lexical database for english, ACM Communication 38(11), 39–41 (1995)CrossRefGoogle Scholar
  90. 33.90
    M.A. Turk, A.P. Pentland: Face recognition using eigenfaces, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (1991) pp. 586–591Google Scholar
  91. 33.91
    Z.W. Tu, X.R. Chen, A.L. Yuille, S.C. Zhu: Image parsing: Unifying segmentation, detection and recognition, Int. J. Comput. Vis. 63(2), 113–140 (2005)CrossRefGoogle Scholar
  92. 33.92
    L. Zhu, Y. Chen, A.L. Yuille: Unsupervised learning of probabilistic Grammar--Markov models for object categories, IEEE Trans. Pattern Anal. Machin. Intell. 31(1), 114–128 (2009)CrossRefGoogle Scholar
  93. 33.93
    J. Deng, J. Krause, A. Berg, L. Fei-Fei: Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Providence (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Computer Vision and Multimodal ComputingMax Planck Institute of InformaticsSaarbrückenGermany
  2. 2.Department of Computer ScienceSaarland UniversitySaarbrückenGermany
  3. 3.Department of Computer ScienceUniversity of BirminghamBirminghamUK

Personalised recommendations