The Role of Manifold Learning in Human Motion Analysis

  • Ahmed Elgammal
  • Chan-Su Lee
Part of the Computational Imaging and Vision book series (CIVI, volume 36)

The human body is an articulated object with a high number of degrees of freedom. Despite the high dimensionality of the configuration space, many human motion activities lie intrinsically on low-dimensional manifolds. Although the intrinsic body configuration manifolds might be very low in dimensionality, the resulting appearance manifolds are challenging to model given various aspects that affect the appearance such as the shape and appearance of the person performing the motion, or variation in the viewpoint, or illumination. Our objective is to learn representations for the shape and the appearance of moving (dynamic) objects that support tasks such as synthesis, pose recovery, reconstruction, and tracking. We studied various approaches for representing global deformation manifolds that preserve their geometric structure. Given such representations, we can learn generative models for dynamic shape and appearance. We also address the fundamental question of separating style and content on nonlinear manifolds representing dynamic objects. We learn factorized generative models that explicitly decompose the intrinsic body configuration (content) as a function of time from the appearance/shape (style factors) of the person performing the action as time-invariant parameters. We show results on pose recovery, body tracking, gait recognition, as well as facial expression tracking and recognition.


Facial Expression Human Motion Configuration Space Style Factor Dynamic Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J.O’Rourke, Badler: Model-based image analysis of human motion using con-straint propagation. IEEE PAMI 2(6) (1980)Google Scholar
  2. 2.
    Hogg, D.: Model-based vision: a program to see a walking person. Image and Vision Computing 1(1) (1983) 5-20CrossRefGoogle Scholar
  3. 3.
    Chen, Z., Lee, H.: Knowledge-guided visual perception of 3-d human gait from single image sequence. IEEE SMC 22(2) (1992) 336-342Google Scholar
  4. 4.
    Rohr, K.: Towards model-based recognition of human movements in image sequence. CVGIP 59(1) (1994) 94-115CrossRefGoogle Scholar
  5. 5.
    Rehg, J.M., Kanade, T.: Model-based tracking of self-occluding articulated objects. In: ICCV (1995) 612-617Google Scholar
  6. 6.
    Gavrila, D., Davis, L.: 3-d model-based tracking of humans in action: a multi-view approach. In: IEEE Conference on Computer Vision and Pattern Recog-nition. Volume 73-80 (1996)Google Scholar
  7. 7.
    Kakadiaris, I.A., Metaxas, D.: Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, CVPR, Los Alamitos, California, USA, IEEE Computer Society (1996) 81-87CrossRefGoogle Scholar
  8. 8.
    Sidenbladh, H., Black, M.J., Fleet, D.J.: Stochastic tracking of 3d human figures using 2d image motion. In: ECCV (2) (2000) 702-718Google Scholar
  9. 9.
    Rehg, J.M., Kanade, T.: Visual tracking of high DOF articulated structures: an application to human hand tracking. In: ECCV (2) (1994) 35-46Google Scholar
  10. 10.
    Darrell, T., Pentland, A.: Space-time gesture. In: Proc IEEE CVPR (1993)Google Scholar
  11. 11.
    Campbell, L.W., Bobick, A.F.: Recognition of human body motion using phase space constraints. In: ICCV (1995) 624-630Google Scholar
  12. 12.
    Wern, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: Real-time tracking of human body. IEEE Transaction on Pattern Analysis and Machine Intelligence 19(7) (1997)Google Scholar
  13. 13.
    Ju, S.X., Black, M.J., Yacoob, Y.: Cardboard people: A parameterized model of articulated motion. In: International Conference on Automatic Face and Gesture Recognition, Killington, Vermont (1996) 38-44CrossRefGoogle Scholar
  14. 14.
    Black, M.J., Jepson, A.D.: Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. In: ECCV (1) (1996) 329-342Google Scholar
  15. 15.
    Haritaoglu, I., Harwood, D., Davis, L.S.: W4: Who? when? where? what? a real time system for detecting and tracking people. In: International Conference on Automatic Face and Gesture Recognition (1998) 222-227Google Scholar
  16. 16.
    Yacoob, Y., Black, M.J.: Parameterized modelling and recognition of activities. Computer Vision and Image Understanding: CVIU 73(2) (1999) 232-247CrossRefGoogle Scholar
  17. 17.
    Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Proc. ECCV 2002, LNCS 2350 (2002) 476-491Google Scholar
  18. 18.
    Sidenbladh, H., Black, M.J., Sigal, L.: Implicit probabilistic models of human motion for synthesis and tracking. In: Proc. ECCV 2002, LNCS 2350 (2002) 784-800Google Scholar
  19. 19.
    Goldenberg, R., Kimmel, R., Rivlin, E., Rudzsky, M.: ‘Dynamism of a dog on a leash’ or behavior classification by eigen-decomposition of periodic motions. In: Proceedings of the ECCV’02, Copenhagen, Springer, LNCS 2350 (2002) 461-475Google Scholar
  20. 20.
    Polana, R., Nelson, R.C.: Qualitative detection of motion by a moving observer. International Journal of Computer Vision 7(1) (1991) 33-46CrossRefGoogle Scholar
  21. 21.
    Nelson, R.C., Polana, R.: Qualitative recognition of motion using temporal texture. CVGIP Image Understanding 56(1) (1992) 78-89MATHCrossRefGoogle Scholar
  22. 22.
    Polana, R., Nelson, R.: Low level recognition of human motion (or how to get your man without finding his body parts). In: IEEE Workshop on Non-Rigid and Articulated Motion (1994) 77-82Google Scholar
  23. 23.
    Polana, R., Nelson, R.C.: Detecting activities. Journal of Visual Communication and Image Representation (1994)Google Scholar
  24. 24.
    Niyogi, S., Adelson, E.: Analyzing and recognition walking figures in xyt. In: Proc. IEEE CVPR (1994) 469-474Google Scholar
  25. 25.
    Song, Y., Feng, X., Perona, P.: Towards detection of human motion. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2000) (2000) 810-817Google Scholar
  26. 26.
    Rittscher, J., Blake, A.: Classification of human body motion. In: IEEE Inter-national Conferance on Compute Vision (1999)Google Scholar
  27. 27.
    Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3)(2001) 257-267CrossRefGoogle Scholar
  28. 28.
    Cutler, R., Davis, L.: Robust periodic motion and motion symmetry detection. In: Proc. IEEE CVPR (2000)Google Scholar
  29. 29.
    Mori, G., Malik., J.: Estimating human body configurations using shape context matching. In: European Conference on Computer Vision (2002)Google Scholar
  30. 30.
    Kristen Grauman, Gregory Shakhnarovich, T.D.: Inferring 3d structure with a statistical image-based shape model. In: ICCV (2003)Google Scholar
  31. 31.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter- sensitive hashing. In: ICCV (2003)Google Scholar
  32. 32.
    Howe, Leventon, Freeman, W.: Bayesian reconstruction of 3d human motion from single-camera video. In: Proc. NIPS (1999)Google Scholar
  33. 33.
    Brand, M.: Shadow puppetry. In: International Conference on Computer Vision. Volume 2 (1999) 1237CrossRefGoogle Scholar
  34. 34.
    Rosales, R., Sclaroff, S.: Inferring body pose without tracking body parts. Tech- nical Report 1999-017 (1999)Google Scholar
  35. 35.
    Rosales, R., Sclaroff, S.: Specialized mappings and the estimation of human body pose from a single image. In: Workshop on Human Motion (2000) 19-24Google Scholar
  36. 36.
    Rosales, R., Athitsos, V., Sclaroff, S.:3D hand pose reconstruction using spe- cialized mappings. In: Proc. ICCV (2001)Google Scholar
  37. 37.
    Christoudias, C.M., Darrell, T.: On modelling nonlinear shape-and-texture ap-pearance manifolds. In: Proc.of IEEE CVPR. Volume 2 (2005) 1067-1074Google Scholar
  38. 38.
    Rahimi, A., Recht, B., Darrell, T.: Learning appearane manifolds from video. In: Proc.of IEEE CVPR. Volume 1 (2005) 868-875Google Scholar
  39. 39.
    Bowden, R.: Learning statistical models of human motion. In: IEEE Workshop on Human Modelling, Analysis and Synthesis (2000)Google Scholar
  40. 40.
    Toyama, K., Blake, A.: Probabilistic tracking in a metric space. In: ICCV (2001) 50-59Google Scholar
  41. 41.
    Bregler, C., Omohundro, S.M.: Nonlinear manifold learning for visual speech recognition (1995) 494-499Google Scholar
  42. 42.
    Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag (1986)Google Scholar
  43. 43.
    M. Turk, A. Pentland: Eigenfaces for recognition. Journal of Cognitive Neuro- science 3(1) (1991) 71-86CrossRefGoogle Scholar
  44. 44.
    Belhumeur, P.N., Hespanha, J., Kriegman, D.J.: Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. In: ECCV (1) (1996) 45-58Google Scholar
  45. 45.
    Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: Their training and application. CVIU 61(1) (1995) 38-59Google Scholar
  46. 46.
    Levin, A., Shashua, A.: Principal component analysis over continuous sub-spaces and intersection of half-spaces. In: ECCV, Copenhagen, Denmark (2002) 635-650Google Scholar
  47. 47.
    Murase, H., Nayar., S.: Visual learning and recognition of 3d objects from appearance. International Journal of Computer Vision 14 (1995) 5-24CrossRefGoogle Scholar
  48. 48.
    Tenenbaum, J., Freeman, W.T.: Separating style and content with bilinear models. Neural Computation 12 (2000) 1247-1283CrossRefGoogle Scholar
  49. 49.
    Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensebles: Ten-sorfaces. In: Proc. of ECCV, Copenhagen, Danmark (2002) 447-460Google Scholar
  50. 50.
    Magnus, J., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, New York (1988)MATHGoogle Scholar
  51. 51.
    Marimont, D., Wandell, B.: Linear models of surface and illumination spectra. Journal of Optical Society od America 9 (1992) 1905-1913CrossRefGoogle Scholar
  52. 52.
    Lathauwer, L.D., de Moor, B., Vandewalle, J.: A multilinear singular value de-composiiton. SIAM Journal On Matrix Analysis and Applications 21(4) (2000) 1253-1278MATHCrossRefMathSciNetGoogle Scholar
  53. 53.
    Shashua, A., Levin, A.: Linear image coding of regression and classification using the tensor rank principle. In: Proc. of IEEE CVPR, Hawai (2001)Google Scholar
  54. 54.
    Vasilescu, M.A.O.: An algorithm for extracting human motion signatures. In: Proc. of IEEE CVPR, Hawai (2001)Google Scholar
  55. 55.
    Wang, H., Ahuja, N.: Rank-r approximation of tensors: Using image-as-matrix representation. (In: Proc IEEE CVPR)Google Scholar
  56. 56.
    Tucker, L.: Some mathematical notes on three-mode factor analysis. Psychome- trika 31 (1966) 279-311CrossRefGoogle Scholar
  57. 57.
    Kapteyn, A., Neudecker, H., Wansbeek, T.: An approach to n-model component analysis. Psychometrika 51(2) (1986) 269-275MATHCrossRefMathSciNetGoogle Scholar
  58. 58.
    Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (gpca). In: Proceedings of IEEE CVPR. Volume 1 (2003) 621-628Google Scholar
  59. 59.
    Vidal, R., Hartley, R.: Motion segmentation with missing data using powerfac- torization and gpca (2004)Google Scholar
  60. 60.
    Cox, T., Cox, M.: Multidimentional scaling. Chapman & Hall (1994)Google Scholar
  61. 61.
    Tenenbaum, J.: Mapping a manifold of perceptual observations. In: Advances in Neural Information Processing. Volume 10 (1998) 682-688Google Scholar
  62. 62.
    Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear em-bedding. Sciene 290(5500) (2000) 2323-2326CrossRefGoogle Scholar
  63. 63.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6) (2003) 1373-1396MATHCrossRefGoogle Scholar
  64. 64.
    Brand, M., Huang, K.: A unifying theorem for spectral embedding and cluster-ing. In: Proc. of the Ninth International Workshop on AI and Statistics (2003)Google Scholar
  65. 65.
    Lawrence, N.: Gaussian process latent variable models for visualization of high dimensional data. In: NIPS (2003)Google Scholar
  66. 66.
    Weinberger, K.W., Saul, L.K.: Unsupervised learning of image manifolds by semidefinite programming. In: Proceedings of IEEE CVPR. Volume 2 (2004) 988-995Google Scholar
  67. 67.
    Mordohai, P., Medioni, G.: Unsupervised dimensionality estimation and man-ifold learning in high-dimensional spaces by tensor voting. In: Proceedings of International Joint Conference on Artificial Intelligence (2005)Google Scholar
  68. 68.
    Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.F., Vincent, P., Ouimet, M.: Learning eigenfunctions links spectral embedding and kernel pca. Neural Comp. 16(10) (2004) 2197-2219MATHCrossRefGoogle Scholar
  69. 69.
    Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of ICML, New York, NY, USA, ACM Press (2004)47Google Scholar
  70. 70.
    Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Reg-ularization, Optimization and Beyond. MIT Press, Cambridge, Massachusetts (2002)Google Scholar
  71. 71.
    Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Roux, N.L., Ouimet, M.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: NIPS 16 (2004)Google Scholar
  72. 72.
    Elgammal, A.: Nonlinear generative models for dynamic shape and dynamic appearance. In: Proc. of 2nd International Workshop on Generative-Model based vision. GMBV 2004 (2004)Google Scholar
  73. 73.
    Elgammal, A., Lee, C.S.: Separating style and content on a nonlinear manifold. In: Proc. of CVPR (2004) 478-485Google Scholar
  74. 74.
    Seung, H.S., Lee, D.D.: The manifold ways of perception. Science 290(5500) (2000)2268-2269CrossRefGoogle Scholar
  75. 75.
    Poggio, T., Girosi, F.: Network for approximation and learning. Proc. IEEE 78(9)(1990) 1481-1497CrossRefGoogle Scholar
  76. 76.
    Beymer, D., Poggio, T.: Image representations for visual learning. Science 272(5250)(1996)Google Scholar
  77. 77.
    Elgammal, A., Lee, C.S.: Inferring 3d body pose from silhouettes using activity manifold learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (2004)Google Scholar
  78. 78.
    Lee, C.S., Elgammal, A.: Style adaptive bayesian tracking using explicit mani- fold learning. In: Proc BMVC (2005)Google Scholar
  79. 79.
    Lee, C.S., Elgammal, A.: Gait tracking and recognition using person-dependent dynamic shape model. In: International Conference on Automatic Face and Gesture Recognition. Volume 0., IEEE Computer Society (2006) 553-559Google Scholar
  80. 80.
    Vasilescu, M.A.O., Terzopoulos, D.: Multilinear subspace analysis of image ensembles. (2003)Google Scholar
  81. 81.
    Lee, C.S., Elgammal, A.: Homeomorphic manifold analysis: Learning decompos-able generative models for human motion analysis. In: Workshop on Dynamical Vision (2005)Google Scholar
  82. 82.
    Gross, R., Shi, J.: The cmu motion of body (mobo) database. Technical Report TR-01-18, Carnegie Mellon University (2001)Google Scholar
  83. 83.
    Lee, C.S., Elgammal, A.M.: Simultaneous inference of view and body pose using torus manifolds. In: ICPR (3) (2006) 489-494Google Scholar
  84. 84.
    Lee, C.S., Elgammal, A.: Gait style and gait content: Bilinear model for gait recogntion using gait re-sampling. In: International Conference on Automatic Face and Gesture Recognition (2004) 147-152Google Scholar
  85. 85.
    Lee, C.S., Elgammal, A.M.: Towards scalable view-invariant gait recognition: Multilinear analysis for gait. In: AVBPA (2005) 395-405Google Scholar
  86. 86.
    Lee, C.S., Elgammal, A.: Facial expression analysis using nonlinear decompos-able generative models. In: AMFG (2005) 17-31Google Scholar
  87. 87.
    Lee, C.S., Elgammal, A.M.: Nonlinear shape and appearance models for facial expression analysis and synthesis. In: ICPR (1) (2006) 497-502Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  • Ahmed Elgammal
    • 1
  • Chan-Su Lee
    • 1
  1. 1.Department of Computer ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations