A new 3D descriptor for human classification: application for human detection in a multi-kinect system

  • Kyis Essmaeel
  • Cyrille MigniotEmail author
  • Albert Dipanda
  • Luigi Gallo
  • Ernesto Damiani
  • Giuseppe De Pietro


In this paper we present a new 3D descriptor for human classification and a human detection method based on this descriptor. The proposed 3D descriptor allows classification of an object represented by a point cloud, as human or non-human. It is derived from the well-known Histogram of Oriented Gradient by employing surface normals instead of gradients. The process consists in an appropriate subdivision of the object point cloud into blocks. These blocks provide the spatial distribution modeling of the surface normal orientation into the different parts of the object. This distribution modelling is expressed as a histogram. In addition we have set up a multi-kinect acquisition system that provides us with Complete Point Clouds (CPC) (i.e. 360° view). Such CPCs enable a suitable processing, particularly in case of occlusions. Moreover they allow for the determination of the human frontal orientation. Based on the proposed 3D descriptor, we have developed a human detection method that is applied on CPCs. First, we evaluated the 3D descriptor over a set of CPC candidates by using the Support Vector Machine (SVM) classifier. The learning process was conducted with the original CPC database that we have built. The results are very promising. The descriptor can discriminate human from non-human candidates and provides the frontal direction of humans with high precision. In addition we demonstrated that using the CPCs improves significantly the classification results in comparison with Single Point Clouds (i.e. points clouds acquired with only one kinect). Second, we compared our detection method with two others, namely the HOG detector on RGB images and a 3D HOG-based detection method that is applied on RGB-depth data. The obtained results on different situations show that the proposed human detection method provides excellent performances that outperform the other two detection methods.


Human classification 3D descriptor Multi-kinect 



  1. 1.
    Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascades. In: British machine vision conferenceGoogle Scholar
  2. 2.
    Bajracharya M, Moghaddam B, Howard A, Brennan S, Matthies L H (2009) A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. In: International Journal of Robotics ResearchGoogle Scholar
  3. 3.
    Baltieri D, Vezzani R, Cucchiara R (2012) People orientation recognition by mixtures of wrapped distributions on random trees. In: European conference on computer vision, pp 270–283Google Scholar
  4. 4.
    Campmany V, Silva S, Espinosa A, Moure J, Vazquez D, Lopez A (2016) GPU-based pedestrian detection for autonomous driving. Proc Comput Scie 80:2377–2381CrossRefGoogle Scholar
  5. 5.
    Chang C, Lin C (2011) LIBSVM: a library for support vector machines. In: Transactions on intelligent systems and technology, vol 27. ACM, pp 1–27Google Scholar
  6. 6.
    Chen C, Heili A, Odobez J (2011) Combined estimation of location and body pose in surveillance video. In: International conference on advanced video and signal based surveillance, pp 5–10Google Scholar
  7. 7.
    Choi B, Meriçli C, Biswas J, Veloso M (2013) Fast human detection for indoor mobile robots using depth images. In: IEEE international conference on robotics and automation. IEEE, pp 1108– 1113Google Scholar
  8. 8.
    Choi B, Pantofaru C, Savarese S (2011) Detecting and tracking people using an RGB-D camera via multiple detector fusion. In: Conference on computer vision workshops. IEEE, pp 6–13Google Scholar
  9. 9.
    Culhane K M, OConnor M, Lyons D, Lyons G M (2008) Accelerometers in rehabilitation medicine for older adults. Age Ageing 6:556–560Google Scholar
  10. 10.
    Herrera DC, Kannala J, Heikkilä J (2011) Accurate and practical calibration of a depth and color camera pair. In: Lecture notes in computer scienceGoogle Scholar
  11. 11.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, vol I. IEEE, pp 886–893Google Scholar
  12. 12.
    Deveaux J C, Hadj-Abdelkader H, Colle E (2013) A multi-sensor calibration toolbox for kinect: application to kinect and laser range finder fusion. In: International conference on advanced roboticsGoogle Scholar
  13. 13.
    Drory A, Zhu G, Li H, Hartley R (2017) Automated detection and tracking of slalom paddlers from broadcast image sequences using cascade classifiers and discriminative correlation filters. Comput Vis Image Underst 159:116–127CrossRefGoogle Scholar
  14. 14.
    Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis 99(2):190–214MathSciNetCrossRefGoogle Scholar
  15. 15.
    Engelcke M, Rao D, Wang D Z, Tong C H, Posner I (2017) Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. International Conference on Robotics and AutomationGoogle Scholar
  16. 16.
    Fitte-Duval L, Mekonnen A, Lerasle F (2015) Upper body detection and feature set evaluation for body pose classification. In: International conference on computer vision theory and applications, pp 439–446Google Scholar
  17. 17.
    Gavrila D M, Munder S (2007) Multi-cue pedestrian detection and tracking from a moving vehicle. In: International journal of computer vision, vol 73. Springer, pp 41–59Google Scholar
  18. 18.
    Gond L, Sayd P, Chateau T, Dhome M (2008) A 3D shape descriptor for human pose recovery. In: Lecture notes in computer science, vol 5098. Springer, pp 370–379Google Scholar
  19. 19.
    Hegger F, Hochgeschwender N, Kraetzschmar G K, Ploeger P G (2013) People detection in 3d point clouds using local surface normals. Lect Notes Comput Sci 7500:154–165CrossRefGoogle Scholar
  20. 20.
    Holz D, Holzer S, Rusu R B, Benke S (2012) Real-time plane segmentation using rgb-d cameras. In: Lecture notes in computer science. Springer, pp 306–317Google Scholar
  21. 21.
    Hosseini JO, Mitzel D, Leibe B (2014) Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras. In: IEEE international conference on robotics and automationGoogle Scholar
  22. 22.
    Ikemura S, Fujiyoshi H (2011) Real-time human detection using relational depth similarity features. In: Asian conference on computer vision. Springer, pp 25–38Google Scholar
  23. 23.
    Johnson A (1997) Spin-images: a representation for 3-D surface matching. Ph.D. thesis, The Robotics Institute, Carnegie Mellon UniversityGoogle Scholar
  24. 24.
    Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D- gradients. In: British machine vision conference, pp 275:1–10Google Scholar
  25. 25.
    Lai K, Bo L, Ren X, Fox D (2011) A scalable tree-based approach for joint object and pose recognition. In: Conference on artificial intelligenceGoogle Scholar
  26. 26.
    Li C, Wang X, Liu W (2017) Neural features for pedestrian detection. In: Neurocomputing, pp 420–432Google Scholar
  27. 27.
    Liem M C, Gavrila D M (2014) Coupled person orientation estimation and appearance modeling using spherical harmonics. Image Vis Comput 32(10):728–738CrossRefGoogle Scholar
  28. 28.
    Lin B Z, Lin C C (2016) Pedestrian detection by fusing 3D points and color images. Int J Netw Distrib Comput 4:252Google Scholar
  29. 29.
    Liu B, Wu H, Su W, Sun J (2017) Sector-ring HOG for rotation-invariant human detection. Signal Process Image Commun 54:1–10CrossRefGoogle Scholar
  30. 30.
    Liu J, Liu Y, Zhang G, Zhu P, Chen Y Q (2015) Detecting and tracking people in real time with RGB-D camera. In: Pattern recognition letters. Elsevier, p 1623Google Scholar
  31. 31.
    Maimone A, Fuchs H (2011) Encumbrance-free telepresence system with real-time 3D capture and display using commodity depth cameras. In: IEEE international symposium on mixed and augmented reality, pp 137–146Google Scholar
  32. 32.
    Mattausch O, Panozzo D, Mura C, Sorkine-Hornung O, Pajarola R (2014) Object detection and classification from large-scale cluttered indoor scans. In: EUROGRAPHICS, vol 33Google Scholar
  33. 33.
    Mitzel D, Leibe B (2012) Close-range human detection for head-mounted cameras. In: British machine vision conferenceGoogle Scholar
  34. 34.
    Moeslund T B, Hilton A, Kruger V (2008) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 23:90–126Google Scholar
  35. 35.
    Mozos O M, Kurazume R, Hasegawa T (2010) Multi-layer people detection using 2D range data. In: International journal of social robotics, vol 2. Springer, pp 31–40Google Scholar
  36. 36.
    Munaro M, Basso F, Menegatti E (2012) Tracking people within groups with RGB-D data. In: International conference on intelligent robots and systems. IEEE, pp 2101–2107Google Scholar
  37. 37.
    Nakazawa M, Mitsugami I, Makihara Y, Nakajima H, Habe H, Yamazoe H, Yagi Y (2012) Dynamic scene reconstruction using asynchronous multiple kinects. In: International conference on pattern recognition, pp 11–15Google Scholar
  38. 38.
    Navarro-Serment L, Mertz C, Hebert M (2010) Pedestrian detection and tracking using three-dimensional LADAR data. In: Tracts in advanced robotics, vol 62. Springer, pp 103–112Google Scholar
  39. 39.
    Oreifej O, Liu Z (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar
  40. 40.
    Ott C, Lee D, Nakamura Y (2008) Motion capture based human motion recognition and imitation by direct marker control. In: IEEE-RAS international conference on humanoid robots, pp 399–405Google Scholar
  41. 41.
    Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE conference on computer vision and pattern recognition, pp 3258–3265Google Scholar
  42. 42.
    Ouyang W, Zeng X, Wang X (2013) Modeling mutual visibility relationship in pedestrian detection. In: IEEE conference on computer vision and pattern recognition, pp 3222–3229Google Scholar
  43. 43.
    Parisot P, Vleeschouwer C D (2017) Scene-specific classifier for effective and efficient team sport players detection from a single calibrated camera. Comput Vis Image Underst 159(Supplement C):74–88CrossRefGoogle Scholar
  44. 44.
    Paul P, Haque S M E, Chakraborty S (2013) Human detection in surveillance videos and its applications - a review. EURASIP J Adv Signal Process 1:1–16Google Scholar
  45. 45.
    Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In: IEEE international conference on robotics and automation, pp 3108–3113Google Scholar
  46. 46.
    Raposo C, Barreto J P, Nunes U (2013) Fast and accurate calibration of a kinect sensor. In: International conference on 3D vision. IEEE, pp 342–349Google Scholar
  47. 47.
    Roetenberg D, Luinge H, Slycke P (2009) Xsens mvn: full 6dof human motion tracking using miniature inertial sensors xsens motion technologies bvGoogle Scholar
  48. 48.
    Rusu R (2010) Semantic 3D object maps for everyday manipulation in human living environments. In: KI - Künstliche Intelligenz, vol 24Google Scholar
  49. 49.
    Rusu R B, Blodow N, Beetz M (2009) Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of the 2009 IEEE international conference on robotics and automation, pp 1848–1853Google Scholar
  50. 50.
    Salas J, Tomasi C (2011) People detection using color and depth images. In: Pattern recognition, vol 6718. Springer, Berlin, pp 127–135Google Scholar
  51. 51.
    Satake J, Miura J (2009) Multiple-person tracking for a mobile robot using stereo. In: IAPR conference on machine vision applications, pp 8–17Google Scholar
  52. 52.
    Shashua A, Gdalyahu Y, Hayun G (2004) Pedestrian detection for driving assistance systems: single-frame classification and system level performance. In: IEEE intelligent vehicles symposium, pp 1–6Google Scholar
  53. 53.
    Shen Y, Hao Z, Wang P, Ma S (2013) A novel human detection approach based on depth map via kinect. In: IEEE conference on computer vision and pattern recognition workshops, pp 535–541Google Scholar
  54. 54.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  55. 55.
    Song S, Xiao J (2014) Sliding shapes for 3D object detection in depth images. In: European conference on computer visionGoogle Scholar
  56. 56.
    Spinello L, Arras K O (2011) People detection in RGB-D data. In: International conference on intelligent robots and systems. IEEE, pp 3838–3843Google Scholar
  57. 57.
    Stone E E, Skubic M (2012) Capturing habitual, in-home gait parameter trends using an inexpensive depth camera. In: IEEE engineering in medicine and biology society, pp 5106–9Google Scholar
  58. 58.
    Tang S, Wang X, Lv X, Han T X, Keller J, He Z, Skubic M, Lao S (2012) Histogram of oriented normal vectors for object recognition with a depth sensor. Asian Conference on Computer Vision 7725:525– 538Google Scholar
  59. 59.
    Tian Q, Zhou B, Zhao W, Wei Y, Fei W (2013) Human detection using HOG features of head and shoulder based on depth map. J Softw 8:2223–2230. Academy PublisherGoogle Scholar
  60. 60.
    Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detectionGoogle Scholar
  61. 61.
    Tombari F, Salti S, Stefano L D (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision, pp 356–369Google Scholar
  62. 62.
    Weinrich C, Vollmer C, Gross H (2012) Estimation of human upper body orientation for mobile robotics using an SVM decision tree on monocular images. In: International conference on intelligent robots and systems, pp 2147–2152Google Scholar
  63. 63.
    Xia L, Chen C, Aggarwal J K (2011) Human detection using depth information by Kinect. In: Computer vision and pattern recognition workshops. IEEE, pp 15–22Google Scholar
  64. 64.
    Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision, pp 443–457Google Scholar
  65. 65.
    Zong C, Clady X, Chetouani M (2011) An embedded human motion capture system for an assistive walking robot, pp 1–6Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.ImViA, EA 7535Univ. Bourgogne Franche-ComtéDijonFrance
  2. 2.ICAR-CNRNaplesItaly
  3. 3.Department of Computer TechnologyUniversity of MilanMilanItaly

Personalised recommendations