Computer Vision

Living Edition

Object Detection

  • Yali Amit
  • Pedro FelzenszwalbEmail author
  • Ross Girshick
Living reference work entry

Related Concepts


Object detection involves detecting instances of objects from one or several classes in an image.


The goal of object detection is to detect all instances of objects from one or several known classes, such as people, cars, or faces in an image. Typically only a small number of objects are present in the image, but there is a very large number of possible locations and scales at which they can occur and that need to somehow be explored.

Each detection is reported with some form of poseinformation. This could be as simple as the location of the object, a location and scale, a bounding box, or a segmentation mask. In other situations the pose information is more detailed and contains the parameters of a linear or nonlinear transformation. For example a face detector may compute the locations of the eyes, nose, and mouth, in addition to the bounding box of the face. An example of a bicycle detection that...
This is a preview of subscription content, log in to check access.


  1. 1.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551CrossRefGoogle Scholar
  2. 2.
    Amit Y (2002) 2D object detection and recognition: models, algorithms and networks. MIT Press, CambridgeCrossRefGoogle Scholar
  3. 3.
    Felzenszwalb P, Huttenlocher D (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRefGoogle Scholar
  4. 4.
    Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: IEEE CVPR 2003Google Scholar
  5. 5.
    Amit Y, Trouvé A (2007) POP: patchwork of parts models for object recognition. Int J Comput Vis 75(2):267–282CrossRefGoogle Scholar
  6. 6.
    Jin Y, Geman S (2006) Context and hierarchy in a probabilistic image model. In: IEEE CVPR 2006Google Scholar
  7. 7.
    Rowley HA, Baluja S, Kanade T (1998) Neural network-based face detection. IEEE Trans Pattern Anal Mach Intell 20(1):23–38CrossRefGoogle Scholar
  8. 8.
    Viola P, Jones MJ (2004) Robust real time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  9. 9.
    Fleuret F, Geman D (2001) Coarse-to-fine face detection. Int J Comput Vis 41(1–2):85–107CrossRefGoogle Scholar
  10. 10.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE CVPR 2005Google Scholar
  11. 11.
    Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
  12. 12.
    Reiss T (1993) Recognizing planar objects using invariant image features. Springer, BerlinCrossRefGoogle Scholar
  13. 13.
    Riesenhuber M, Poggio T (2000) Models of object recognition. Nat Neurosci 3(11s):1199–1204CrossRefGoogle Scholar
  14. 14.
    Lampert C, Blaschko M, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans Pattern Anal Mach Intell 31(12):2129–2142CrossRefGoogle Scholar
  15. 15.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  16. 16.
    Chang LB, Jin Y, Zhang W, Borenstein E, Geman S (2011) Context computation, and optimal ROC performance in hierarchical models. Int J Comput Vis 93(2):117–140MathSciNetCrossRefGoogle Scholar
  17. 17.
    Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systemsGoogle Scholar
  18. 18.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE CVPR 2014Google Scholar
  19. 19.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: ECCV 2016Google Scholar
  20. 20.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systemsGoogle Scholar
  21. 21.
    He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: ICCV 2017Google Scholar
  22. 22.
    Ullman S (1996) High-level vision. MIT Press, Cambridge, MACrossRefGoogle Scholar
  23. 23.
    Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE CVPR 2017Google Scholar
  24. 24.
    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV 2014Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of ChicagoChicagoUSA
  2. 2.School of EngineeringBrown UniversityProvidenceUSA
  3. 3.Facebook AI ResearchMenlo ParkUSA

Section editors and affiliations

  • Lei Zhang
    • 1
  1. 1.MicrosoftWAUSA