Ten Years of Pedestrian Detection, What Have We Learned?

  • Rodrigo BenensonEmail author
  • Mohamed Omran
  • Jan Hosang
  • Bernt Schiele
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8926)


Paper-by-paper results make it easy to miss the forest for the trees. We analyse the remarkable progress of the last decade by discussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detection quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset.


Object Detection Deep Learning Convolutional Neural Network Human Detection Pedestrian Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  2. 2.
    Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR. IEEE Press, June 2008Google Scholar
  3. 3.
    Wojek, C., Walk, S., Schiele, B.: Multi-cue onboard pedestrian detection. In: CVPR (2009)Google Scholar
  4. 4.
    Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: Survey and experiments. PAMI (2009)Google Scholar
  5. 5.
    Keller, C.G., Llorca, D.F., Gavrila, D.M.: Dense stereo-based roi generation for pedestrian detection. In: Denzler, J., Notni, G., Süße, H. (eds.) Pattern Recognition. LNCS, vol. 5748, pp. 81–90. Springer, Heidelberg (2009) Google Scholar
  6. 6.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: CVPR (2009)Google Scholar
  7. 7.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and PatternRecognition (CVPR) (2012)Google Scholar
  8. 8.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. TPAMI (2011)Google Scholar
  9. 9.
    Viola, P., Jones, M.: Robust real-time face detection. IJCV (2004)Google Scholar
  10. 10.
    Sabzmeydani, P., Mori, G.: Detecting pedestrians by learning shapelet features. In: CVPR (2007)Google Scholar
  11. 11.
    Lin, Z., Davis, L.S.: A pose-invariant descriptor for human detection and segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 423–436. Springer, Heidelberg (2008) Google Scholar
  12. 12.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  13. 13.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)Google Scholar
  14. 14.
    Dollár, P., Tu, Z., Tao, H., Belongie, S.: Feature mining for image classification. In: CVPR (2007)Google Scholar
  15. 15.
    Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: CVPR (2008)Google Scholar
  16. 16.
    Wojek, C., Schiele, B.: A performance evaluation of single and multi-feature people detection. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 82–91. Springer, Heidelberg (2008) Google Scholar
  17. 17.
    Wang, X., Han, X., Yan, S.: An hog-lbp human detector with partial occlusion handling. In: ICCV (2009)Google Scholar
  18. 18.
    Levi, D., Silberstein, S., Bar-Hillel, A.: Fast multiple-part based object detection using kd-ferns. In: CVPR (2013)Google Scholar
  19. 19.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI (2010)Google Scholar
  20. 20.
    Schwartz, W., Kembhavi, A., Harwood, D., Davis, L.S.: Human detection using partial least squares analysis. In: ICCV (2009)Google Scholar
  21. 21.
    Nam, W., Han, B., Han, J.: Improving object localization using macrofeature layout selection. In: ICCV, Visual Surveillance Workshop (2011)Google Scholar
  22. 22.
    Walk, S., Majer, N., Schindler, K., Schiele, B.: New features and insights for pedestrian detection. In: CVPR (2010)Google Scholar
  23. 23.
    Bar-Hillel, A., Levi, D., Krupka, E., Goldberg, C.: Part-based feature synthesis for human detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 127–142. Springer, Heidelberg (2010) Google Scholar
  24. 24.
    Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Efficient pedestrian detection by directly optimize the partial area under the roc curve. In: ICCV (2013)Google Scholar
  25. 25.
    Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC (2010)Google Scholar
  26. 26.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)Google Scholar
  27. 27.
    Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012) Google Scholar
  28. 28.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR (2012)Google Scholar
  29. 29.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. PAMI (2014)Google Scholar
  30. 30.
    Marin, J., Vazquez, D., Lopez, A., Amores, J., Leibe, B.: Random forests of local experts for pedestrian detection. In: ICCV (2013)Google Scholar
  31. 31.
    Benenson, R., Mathias, M., Tuytelaars, T., Van Gool, L.: Seeking the strongest rigid detector. In: CVPR (2013)Google Scholar
  32. 32.
    Mathias, M., Benenson, R., Timofte, R., Van Gool, L.: Handling occlusions with franken-classifiers. In: ICCV (2013)Google Scholar
  33. 33.
    Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 241–254. Springer, Heidelberg (2010) Google Scholar
  34. 34.
    Ouyang, W., Zeng, X., Wang, X.: Modeling mutual visibility relationship with a deep model in pedestrian detection. In: CVPR (2013)Google Scholar
  35. 35.
    Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR (2013)Google Scholar
  36. 36.
    Chen, G., Ding, Y., Xiao, J., Han, T.X.: Detection evolution with multi-order contextual co-occurrence. In: CVPR (2013)Google Scholar
  37. 37.
    Zeng, X., Ouyang, W., Wang, X.: Multi-stage contextual deep learning for pedestrian detection. In: ICCV (2013)Google Scholar
  38. 38.
    Costea, A.D., Nedevschi, S.: Word channel based multiscale pedestrian detection without image resizing and using only one classifier. In: CVPR, June 2014Google Scholar
  39. 39.
    Yan, J., Zhang, X., Lei, Z., Liao, S., Li, S.Z.: Robust multi-resolution pedestrian detection in traffic scenes. In: CVPR (2013)Google Scholar
  40. 40.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: ICCV (2013)Google Scholar
  41. 41.
    Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: CVPR (2014)Google Scholar
  42. 42.
    Park, D., Zitnick, C.L., Ramanan, D., Dollár, P.: Exploring weak stabilization for motion feature extraction. In: CVPR (2013)Google Scholar
  43. 43.
    Zhang, S., Bauckhage, C., Cremers, A.B.: Informed haar-like features improve pedestrian detection. In: CVPR (2014)Google Scholar
  44. 44.
    Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: CVPR (2003)Google Scholar
  45. 45.
    Keller, C.G., Enzweiler, M., Rohrbach, M., Fernandez Llorca, D., Schnorr, C., Gavrila, D.M.: The benefits of dense stereo for pedestrian detection. IEEE Transactions on Intelligent Transportation Systems (2011)Google Scholar
  46. 46.
    Ess, A., Leibe, B., Schindler, K., Van Gool, L.: Robust multi-person tracking from a mobile platform. PAMI (2009)Google Scholar
  47. 47.
    Premebida, C., Carreira, J., Batista, J., Nunes, U.: Pedestrian detection combining rgb and dense lidar data. In: IROS (2014)Google Scholar
  48. 48.
    Enzweiler, M., Gavrila, D.: A multilevel mixture-of-experts framework for pedestrian classification. IEEE Transactions on Image Processing (2011)Google Scholar
  49. 49.
    Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3d brain image segmentation. PAMI (2010)Google Scholar
  50. 50.
    Yan, J., Lei, Z., Wen, L., Li, S.Z.: The fastest deformable part model for object detection. In: CVPR, June 2014Google Scholar
  51. 51.
    Hariharan, B., Zitnick, C.L., Dollár, P.: Detecting objects using deformation dictionaries. In: CVPR (2014)Google Scholar
  52. 52.
    Pedersoli, M., Tuytelaars, T., Gool, L.V.: Using a deformation field model for localizing faces and facial points under weak supervision. In: CVPR, June 2014Google Scholar
  53. 53.
    Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)Google Scholar
  54. 54.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: arXiv (2014)Google Scholar
  55. 55.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)Google Scholar
  56. 56.
    Pinheiro, P., Collobert, R.: Recurrent convolutional neural networks for scene labeling. In: JMLR (2014)Google Scholar
  57. 57.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. CoRR (2014)Google Scholar
  58. 58.
    Lim, J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: CVPR (2013)Google Scholar
  59. 59.
    Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 546–561. Springer, Heidelberg (2014) Google Scholar
  60. 60.
    Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved detection. In: Nips (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Rodrigo Benenson
    • 1
    Email author
  • Mohamed Omran
    • 1
  • Jan Hosang
    • 1
  • Bernt Schiele
    • 1
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations