Statistical Gesture Models for 3D Motion Capture from a Library of Gestures with Variants

  • Zhenbo Li
  • Patrick Horain
  • André-Marie Pez
  • Catherine Pelachaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5934)


A challenge for 3D motion capture by monocular vision is 3D-2D projection ambiguities that may bring incorrect poses during tracking. In this paper, we propose improving 3D motion capture by learning human gesture models from a library of gestures with variants. This library has been created with virtual human animations. Gestures are described as Gaussian Process Dynamic Models (GPDM) and are used as constraints for motion tracking. Given the raw input poses from the tracker, the gesture model helps to correct ambiguous poses. The benefit of the proposed method is demonstrated with results.


Gaussian Process 3D motion capture gesture model gesture library 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vilhjálmsson, H.: Avatar Augmented Online Conversation, Ph.D. thesis, Media Arts and Sciences, Massachusetts Institute of Technology, Media Laboratory, Cambridge, MA (2003)Google Scholar
  2. 2.
    Horain, P., Marques Soares, J., Rai, P.K., Bideau, A.: Virtually enhancing the perception of user actions. In: 15th International Conference on Artificial Reality and Telexistence (ICAT 2005), Christchurch, New Zealand, pp. 245–246 (2005), doi:10.1145/1152399.1152446Google Scholar
  3. 3.
    Moeslund, T., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. Computer vision and image understanding 104(2-3), 90–126 (2006)CrossRefGoogle Scholar
  4. 4.
    Poppe, R.W.: Vision-based human motion analysis: An overview. Computer Vision and Image Understanding 108(1-2), 4–18 (2007)CrossRefGoogle Scholar
  5. 5.
    Poggi, I.: Mind, Hands, Face and Body. In: A Goal and Belief View of Multimodal Communication, vol. 19. Weidler Verlag, Körper (2007)Google Scholar
  6. 6.
    Bevacqua, E., Mancini, M., Niewiadomski, R., Pelachaud, C.: An expressive ECA showing complex emotions. In: AISB 2007 Annual convention, workshop Language, Speech and Gesture for Expressive Characters, Newcastle, UK, pp. 208–216 (2007)Google Scholar
  7. 7.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian Process Dynamical Models for Human Motion. IEEE Transactions on PAMI 30(2), 283–298 (2008)Google Scholar
  8. 8.
    Pullen, K., Bregler, C.: Motion capture assisted animation: Texturing and synthesis. In: SIGGRAPH 2002, pp. 501–508 (2002)Google Scholar
  9. 9.
    Safonova, A., Hodgins, J.K., Pollard, N.S.: Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Transactions on Graphics 23(3), 524–521 (2004)Google Scholar
  10. 10.
    Elgammal, A.M., Lee, C.-S.: Inferring 3D body pose from silhouettes using activity manifold learning. In: Conference on Computer Vision and Pattern Recognition (CVPR 2004), vol. 2, pp. 681–688 (2004)Google Scholar
  11. 11.
    Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. ACM Transactions on Graphics 23(3), 522–531 (2004)CrossRefGoogle Scholar
  12. 12.
    Teh, Y.W., Roweis, S.T.: Automatic alignment of local representations. In: Neural Information Processing Systems 15 (NIPS 2002), pp. 841–848 (2003)Google Scholar
  13. 13.
    Lawrence, N.D.: Gaussian process latent variable models for visualisation of high dimensional data. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, pp. 329–336. MIT Press, Cambridge (2004)Google Scholar
  14. 14.
    Carreira-Perpiñán, M.Á., Lu, Z.: The Laplacian Eigenmaps Latent Variable Model. In: 11th International Conference on Artificial Intelligence and Statistics (AISTATS), Puerto Rico (2007)Google Scholar
  15. 15.
    Urtasun, R., Fleet, D.J., Hertzmann, A., Fua, P.: Priors for people tracking from small training sets. In: International Conference On Computer Vision (ICCV 2005), Beijing, China, vol. 1, pp. 403–410 (2005)Google Scholar
  16. 16.
    Urtasun, R., Fleet, D.J., Fua, P.: 3D people tracking with Gaussian process dynamical models. In: Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, NY, vol. 1, pp. 238–245 (2006)Google Scholar
  17. 17.
    Raskin, L., Rivlin, E., Rudzsky, M.: Dimensionality Reduction for Articulated Body Tracking. In: 3DTV 2007, pp. 1–4 (2007)Google Scholar
  18. 18.
    Gómez Jáuregui, D.A., Horain, P.: Region-based vs. edge-based registration for 3D motion capture by real time monoscopic vision. In: Gagalowicz, A., Philips, W. (eds.) MIRAGE 2009. LNCS, vol. 5496, pp. 344–355. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  19. 19.
    Moon, K., Pavlovic, V.I.: Impact of dynamics on subspace embedding and tracking of sequences. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR 2006), June 2006, New York, NY, vol. 1, pp. 198–205 (2006)Google Scholar
  20. 20.
    Lu, Z., Carreira-Perpiñán, M.Á., Sminchisescu, C.: People Tracking with the Laplacian Eigenmaps Latent Variable Model. In: Advances in Neural Information Processing Systems, NIPS, vol. 21 (2007)Google Scholar
  21. 21.
    Isard, M., Blake, A.: Condensation-conditional density propagation for visual tracking. Int. J. Computer Vision 29(1), 5–28 (1998)CrossRefGoogle Scholar
  22. 22.
    Calbris, G.: The semiotics of French gestures. University Press, Bloomington (1990)Google Scholar
  23. 23.
    Gallaher, P.E.: Individual differences in nonverbal behavior: Dimensions of style. Journal of Personality and Social Psychology 63(1), 133–145 (1992)CrossRefGoogle Scholar
  24. 24.
    Mancini, M., Pelachaud, C.: Distinctiveness in multimodal behaviors. In: 7th International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2008, Estoril Portugal (May 2008)Google Scholar
  25. 25.
    Kipp, M.: Anvil - A Generic Annotation Tool for Multimodal Dialogue. In: 7th European Conference on Speech Communication and Technology (Eurospeech), Aalborg, pp. 1367–1370 (2001)Google Scholar
  26. 26.
    Davis, J., Agrawala, M., Chuang, E., Popovic, Z., Salesin, D.: A Sketching Interface for Articulated Figure Animation. In: Eurographics/SIGGRAPH Symposium on Computer Animation, SCA (2003)Google Scholar
  27. 27.
    Sam, R., Lawrence, S.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  28. 28.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  29. 29.
    Neal, R.M.: Bayesian Learning for Neural Networks. Lecture Notes in Statistics, vol. 118. Springer, Heidelberg (1996)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Zhenbo Li
    • 1
  • Patrick Horain
    • 1
  • André-Marie Pez
    • 2
  • Catherine Pelachaud
    • 2
    • 3
  1. 1.Institut TelecomTelecom SudParisEvry CedexFrance
  2. 2.Institut TelecomTelecom ParisTechParis Cedex 13France
  3. 3.CNRS, LTCIParis Cedex 13France

Personalised recommendations