Learning Discriminative Spatial Relations for Detector Dictionaries: An Application to Pedestrian Detection

  • Enver Sangineto
  • Marco Cristani
  • Alessio Del Bue
  • Vittorio Murino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


The recent availability of large scale training sets in conjunction with accurate classifiers (e.g., SVMs) makes it possible to build large sets of “simple” object detectors and to develop new classification approaches in which dictionaries of visual features are substituted by dictionaries of object detectors. The responses of this collection of detectors can then be used as a high-level image representation. In this work, we propose to go a step further in this direction by modeling spatial relations among different detector responses. We use Random Forests in order to discriminatively select spatial relations which represent frequent co-occurrences of detector responses. We demonstrate our idea in the specific people detection framework, which is a challenging classification task due to the variability of the human body articulations and appearance, and we use the recently proposed poselets as our basic object dictionary. The use of poselets is not the only possible, actually the proposed method can be applied more in general since few assumptions are made on the basic object detector. The results obtained show sharp improvements with respect to both the original poselet-based people detection method and to other state-of-the-art approaches on two difficult benchmark datasets.


Random Forest Object Detector Spatial Relation Pedestrian Detection Pictorial Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Torresani, L., Szummer, M., Fitzgibbon, A.: Efficient Object Category Recognition Using Classemes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 776–789. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: NIPS (2010)Google Scholar
  3. 3.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  4. 4.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting People Using Mutually Consistent Poselet Activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. of Computer Vision 61, 55–79 (2005)CrossRefGoogle Scholar
  7. 7.
    Ferrari, V., Marín-Jiménez, M.J., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar
  8. 8.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. In: CVPR, pp. 1014–1021 (2009)Google Scholar
  9. 9.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Advanced in Neural Information Processing Systems (2006)Google Scholar
  10. 10.
    Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR (2011)Google Scholar
  11. 11.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: CVPR (2000)Google Scholar
  12. 12.
    Forsyth, D.A., Fleck, M.M.: Body plans. In: CVPR, pp. 678–683 (1997)Google Scholar
  13. 13.
    Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: CVPR (2003)Google Scholar
  14. 14.
    Sigal, L., Bhatia, S., Roth, S., Black, M.J., Isard, M.: Tracking loose-limbed people. In: CVPR, pp. 421–428 (2004)Google Scholar
  15. 15.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)Google Scholar
  16. 16.
    Breiman, L.: Random forests. Machine Learning 45 (2001)Google Scholar
  17. 17.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)Google Scholar
  18. 18.
    Gall, J., Lempitsky, V.: Class-specific Hough forests for object detection. In: CVPR (2009)Google Scholar
  19. 19.
    Hall, M.A., Smith, L.A.: Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In: FLAIRS, pp. 235–239 (1999)Google Scholar
  20. 20.
    Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: ICCV, pp. 1668–1675 (2011)Google Scholar
  21. 21.
    Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. Int. J. of Computer Vision 63, 153–161 (2005)CrossRefGoogle Scholar
  22. 22.
    Chen, Y.T., Chen, C.S., Hung, Y.P., Chang, K.Y.: Multi-class multi-instance boosting for part-based human detection. In: 9th IEEE International Workshop on Visual Surveillance (2009)Google Scholar
  23. 23.
    Chen, Y.T., Chen, C.S.: Fast human detection using a novel boosted cascading structure with meta stages. IEEE Trans. on Image Processing 17, 1452–1464 (2008)CrossRefGoogle Scholar
  24. 24.
    Wachs, J.P., Kolsch, M., Goshorn, D.: Human posture recognition for intelligent vehicles. J. of Real Time Image Processing 5, 231–244 (2010)CrossRefGoogle Scholar
  25. 25.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  26. 26.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. on PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  27. 27.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV (2004)Google Scholar
  28. 28.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR, pp. 878–885 (2005)Google Scholar
  29. 29.
    Gavrila, D.M.: Pedestrian Detection from a Moving Vehicle. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 37–49. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  30. 30.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR, pp. 264–271 (2003)Google Scholar
  31. 31.
    Tuzel, O., Porikli, F., Meer, P.: Pedestrian Detection via Classification on Riemannian Manifolds. IEEE Trans. on PAMI, 1713–1727 (2008)Google Scholar
  32. 32.
    Tosato, D., Farenzena, M., Cristani, M., Spera, M., Murino, V.: Multi-class Classification on Riemannian Manifolds for Video Surveillance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 378–391. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  33. 33.
    Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. on PAMI 34, 743–761 (2012)CrossRefGoogle Scholar
  34. 34.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR, pp. 511–518 (2001)Google Scholar
  35. 35.
    Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)Google Scholar
  36. 36.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: (The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Enver Sangineto
    • 1
  • Marco Cristani
    • 1
  • Alessio Del Bue
    • 1
  • Vittorio Murino
    • 1
  1. 1.Pattern Analysis and Computer Vision (PAVIS)Istituto Italiano di Tecnologia (IIT)GenovaItaly

Personalised recommendations