Human Behavior Analysis from Depth Maps

  • Sergio Escalera
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7378)


Pose Recovery (PR) and Human Behavior Analysis (HBA) have been a main focus of interest from the beginnings of Computer Vision and Machine Learning. PR and HBA were originally addressed by the analysis of still images and image sequences. More recent strategies consisted of Motion Capture technology (MOCAP), based on the synchronization of multiple cameras in controlled environments; and the analysis of depth maps from Time-of-Flight (ToF) technology, based on range image recording from distance sensor measurements. Recently, with the appearance of the multi-modal RGBD information provided by the low cost Kinect\(^{\textsf{TM}}\) sensor (from RGB and Depth, respectively), classical methods for PR and HBA have been redefined, and new strategies have been proposed. In this paper, the recent contributions and future trends of multi-modal RGBD data analysis for PR and HBA are reviewed and discussed.


Pose Recovery Human Behavior Analysis Depth Maps Kinect\(^{\textsf{TM}}\) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain, H., Subramanian, A.: Real-time upper-body human pose estimation using a depth camera, HP Technical ReportsGoogle Scholar
  2. 2.
    Rodgers, J., Anguelov, D., Hoi-Cheung, P.: Object pose detection in range scan data. In: CVPR, pp. 2445–2452 (2006)Google Scholar
  3. 3.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: CVPR, pp. 755–762 (2010)Google Scholar
  4. 4.
    Sabata, B., Arman, F., Aggarwal, J.: Segmentation of 3d range images using pyramidal data structures. CVGIP: Image Understanding 57(3), 373–387 (1993)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M.: Real-time human pose recognition in parts from single depth images (2011)Google Scholar
  7. 7.
  8. 8.
  9. 9.
    Flexible action and articulated skeleton toolkit (faast),
  10. 10.
    Suma, E., Lange, B., Rizzo, A., Krum, D.M.: FAAST: the flexible action and articulated skeleton toolkit. In: Virtual Reality, Singapore, pp. 245–246 (2011)Google Scholar
  11. 11.
    Kinect for windows sdk from microsoft research,
  12. 12.
    Openkinect (libfreenect),
  13. 13.
    Code laboratories cl nui platform - kinect driver/sdk,
  14. 14.
    Point cloud library (pcl),
  15. 15.
    Rusu, R.B.: Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments. Articial Intelligence (KI-Kuenstliche Intelligenz) (2010)Google Scholar
  16. 16.
    Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining rgb and depth information. In: ICRAGoogle Scholar
  17. 17.
    Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: IROS, pp. 821–826 (2011)Google Scholar
  18. 18.
    Koch, R., Schiller, I., Bartczak, B., Kellner, F., Koser, K.: Mixin3d: 3d mixed reality with tof-camera, pp. 126–141 (2009)Google Scholar
  19. 19.
    Castaneda, V., Mateus, D., Navab, N.: Slam combining tof and high-resolution cameras. In: WACV, pp. 672–678 (2011)Google Scholar
  20. 20.
    Gehrig, D., Kuehne, H.: Hmm-based human motion recognition with optical flow data. In: IEEE International Conference on Humanoid Robots, Humanoids 2009 (2009)Google Scholar
  21. 21.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. CVIU 104(2-3), 210–220 (2006)Google Scholar
  22. 22.
    Zhou, F., la Torre, F.D., Hodgins, J.K.: Aligned cluster analysis for temporal segmentation of human motion. In: IEEE Conference on Automatic Face and Gestures Recognition, FG (2008)Google Scholar
  23. 23.
    Reyes, M., Dominguez, G., Escalera, S.: Feature weighting in dynamic time warping for gesture recognition in depth data. In: ICCV, Barcelona, Spain (2011)Google Scholar
  24. 24.
    Hernandez-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., Escalera, S.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: CVPR (2012)Google Scholar
  25. 25.
    Hernandez-Vela, A., Reyes, M., Escalera, S., Radeva, P.: Spatio-temporal grabcut human segmentation for face and pose recovery. In: IEEE International Workshop on Analysis and Modeling of Faces and Gestures, CVPR (2010)Google Scholar
  26. 26.
    Hernandez-Vela, A., Primo, C., Escalera, S.: Automatic user interaction correction via multi-label graph cuts. In: 1st IEEE International Workshop on Human Interaction in Computer Vision HICV, ICCV (2011)Google Scholar
  27. 27.
    Igual, L., Soliva, J., Hernandez-Vela, A., Escalera, S., Jimenez, X., Vilarroya, O., Radeva, P.: A fully-automatic caudate nucleus segmentation of brain mri: Application in volumetric analysis of pediatric attention-deficit/hyperactivity disorder. In: BioMedical Engineering OnLine (2011)Google Scholar
  28. 28.
    Liu, Y., Stoll, C., Gall, J., Seidel, H.: Markerless motion capture of interacting characters using multi-view image segmentation. CVPR 14(1), 1249–1256 (2011)CrossRefGoogle Scholar
  29. 29.
    Holt, B., Ong, E.-J., Cooper, H., Bowden, R.: Putting the pieces together: Connected poselets for human pose estimation. In: ICCV (2011)Google Scholar
  30. 30.
    Pugeault, N., Bowden, R.: Spelling it out: Real-time asl fingerspelling recognition. In: ICCV (2011)Google Scholar
  31. 31.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: ICCV, pp. 3108–3113 (2011)Google Scholar
  32. 32.
    Clapes, A., Reyes, M., Escalera, S.: User Identification and Object Recognition in Clutter Scenes Based on RGB-Depth Analysis. In: Perales, F.J., Fisher, R.B., Moeslund, T.B. (eds.) AMDO 2012. LNCS, vol. 7378, pp. 1–11. Springer, Heidelberg (2012)Google Scholar
  33. 33.
    Charles, J., Everingham, M.: Learning shape models for monocular human pose estimation from the microsoft xbox kinect. In: ICCV, pp. 1202–1208 (2011)Google Scholar
  34. 34.
    Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011)Google Scholar
  35. 35.
    Aubry, M., Schlickewei, U., Cremers, D.: The wave kernel signature: A quantum mechanical approach to shape analysis. In: ICCV (2011)Google Scholar
  36. 36.
    Schwarz, L., Mkhitaryan, A., Mateus, D., Navab, N.: Estimating human 3d pose from time-of-flight images based on geodesic distances and optical flow. In: IEEE Conference on Automatic Face and Gesture Recognition, FG (2011)Google Scholar
  37. 37.
    Ganapathiand, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: CVPR, pp. 755–762 (2010)Google Scholar
  38. 38.
    Keskin, C., Racc, F., Kara, Y., Akarun, L.: Real time hand pose estimation using depth sensors. In: ICCV (2011)Google Scholar
  39. 39.
    Minnen, D., Zafrulla, Z.: Towards robust cross-user hand tracking and shape recognition. In: ICCV, pp. 1235–1241 (2011)Google Scholar
  40. 40.
    Windheuser, T., Schlickewei, U., Schmidt, F.R.: Geometrically consistent elastic matching of 3d shapes: A linear programming solution. In: ICCV (2011)Google Scholar
  41. 41.
    Xia, L., Chen, C.-C., Aggarwal, J.K.: Human detection using depth information by kinect department of electrical and computer engineering. PR, 15–22 (2011)Google Scholar
  42. 42.
    Human pose recovery and behavior analysis group,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sergio Escalera
    • 1
    • 2
  1. 1.Dept. Matemáatica Aplicada i AnálisiUniversitat de BarcelonaBarcelonaSpain
  2. 2.Computer Vision CenterBellaterraSpain

Personalised recommendations