Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

  • Csaba BeleznaiEmail author
  • Daniel Steininger
  • Elisabeth Broneder
Part of the Studies in Computational Intelligence book series (SCI, volume 810)


Human detection in crowded situations represents a challenging task in many practically relevant scenarios. In this paper we propose a passive stereo depth based human detection scheme employing a hierarchically-structured tree of learned shape templates for delineating clusters corresponding to humans. In order to enhance the specificity of the depth-based detection approach towards humans, we also incorporate a visual object recognition modality in form of a deeply-trained model. We propose a simple way to combine the depth and appearance modalities to better cope with complex effects such as heavily occluded and small-sized humans, and clutter. Obtained results are analyzed in terms of improvements and shortcomings introduced by the individual detection modalities. Our proposed combination achieves a good accuracy at a decent computational speed in difficult scenarios exhibiting crowded situations. Hence in our view, the presented concepts represent a detection scheme of practical relevance.


Human detection Detection in a crowd Prior shape model Mean shift clustering Semantic segmentation Stereo vision Occupancy map Video surveillance 



The authors thank both the Austrian Federal Ministry for Transport, Innovation and Technology as well as the Austrian Research Promotion Agency (FFG) for co-funding the research project “LEAL” (FFG Nr. 850218) within the National Research Development Programme KIRAS Austria.


  1. 1.
    Beleznai, C., Zweng, A., Netousek, T., Birchbauer, J.A.: Multi-resolution binary shape tree for efficient 2D clustering. In: 3rd IAPR Asian Conference on Pattern Recognition, pp. 569–573 (2015)Google Scholar
  2. 2.
    Beyer, L., Hermans, A., Linder, T., Arras, K.O., Leibe, B.: Deep person detection in 2D range data (2018). arXiv:1804.02463
  3. 3.
    Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. (Q2), 15 (1998)Google Scholar
  4. 4.
    Bulò, S.R., Neuhold, G., Kontschieder, P.: Loss max-pooling for semantic image segmentation. In: Proceedings of CVPR, pp. 7082–7091. IEEE Computer Society (2017)Google Scholar
  5. 5.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. PAMI 24, 603–619 (2002)CrossRefGoogle Scholar
  6. 6.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of BMVC, pp. 91.1–91.11 (2009)Google Scholar
  7. 7.
    Engelmann, F., Stückler, J., Leibe, B.: Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In: Proceedings of the German Conference on Pattern Recognition (GCPR) (2016)Google Scholar
  8. 8.
    Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  9. 9.
    Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 ( 2015)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 2980–2988 (2017)Google Scholar
  11. 11.
    Humenberger, M., Zinner, C., Weber, M., Kubinger, W., Vincze, M.: A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)CrossRefGoogle Scholar
  12. 12.
    Krotosky, S., Trivedi, M.M.: A comparison of color and infrared stereo approaches to pedestrian detection. In: 2007 IEEE Intelligent Vehicles Symposium, pp. 81–86 (2007)Google Scholar
  13. 13.
    Linder, T., Arras, K.O.: Multi-model hypothesis tracking of groups of people in RGB-D data. In: 17th International Conference on Information Fusion, FUSION, pp. 1–7 (2014)Google Scholar
  14. 14.
    Linder, T., Breuers, S., Leibe, B., Arras, K.O.: On multi-modal people tracking from mobile platforms in very crowded and dynamic environments. IEEE International Conference on Robotics and Automation (ICRA), pp. 5512–5519 (2016)Google Scholar
  15. 15.
    Liu, H., Luo, J., Wu, P., Xie, S., Li, H.: People detection and tracking using RGB-D cameras for mobile robots. Int. J. Adv. Robot. Syst. 13(5), 1–8 (2016)Google Scholar
  16. 16.
    Lu, H., Li, Y., Chen, M., Kim, H., Serikawa, S.: Brain intelligence: go beyond artificial intelligence. Mob. Netw. Appl. (2017)Google Scholar
  17. 17.
    Lu, H., Li, Y., Uemura, T., Kim, H., Serikawa, S.: Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gen. Comput. Syst. 82, 142–148 (2018)CrossRefGoogle Scholar
  18. 18.
    Muñoz Salinas, R., Aguirre, E., García-Silvente, M.: People detection and tracking using stereo vision and color. Image Vis. Comput. 25(6), 995–1007 (2007)CrossRefGoogle Scholar
  19. 19.
    Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of International Conference on Pattern Recognition, vol 3, pp. 850–855 (2006)Google Scholar
  20. 20.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)Google Scholar
  21. 21.
    Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: Proceedings of NIPS (2014)Google Scholar
  22. 22.
    Yu, F., Wang, D., Darrell, T.: Deep layer aggregation. In: Proceedings of CVPR. IEEE Computer Society (2018)Google Scholar
  23. 23.
    Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 973–986 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Csaba Beleznai
    • 1
    Email author
  • Daniel Steininger
    • 1
  • Elisabeth Broneder
    • 2
  1. 1.Center for Vision, Automation & ControlAIT Austrian Institute of TechnologyViennaAustria
  2. 2.Center for Digital Safety & SecurityAIT Austrian Institute of TechnologyViennaAustria

Personalised recommendations