Modelling Gesture

  • Shaogang Gong
  • Tao Xiang


Gesture, particularly hand gesture, is an important part of human body language which conveys messages and reveals human intention and emotional state. Automatic interpretation of gesture provides an important means for interaction and communication between human and computer, going beyond the conventional text and graphic based interface. Broadly speaking, human gesture can be composed of movements from any body part of a person, although the most relevant body parts are face and hand. In this sense, facial expression is a special case of gesture. Facial expression and hand movement often act together to define a coherent gesture, and can be better understood if analysed together. This is especially true when interpreting human emotional state based on visual observation of body language. A gesture is a dynamic process, typically characterised by the spatio-temporal trajectory of body motion, and modelled as trajectories in a multivariate feature space. In this chapter, we describe plausible methods for tracking both individual body parts and overall body movement to construct trajectories for gesture representation. Unsupervised learning is explored for automatically segmenting a gesture movement sequence into atomic components and discovering the number of distinctive gesture classes. We also study supervised learning for modelling a gesture sequence as a stochastic process, with classification of different gesture processes learned from data. In addition, the problem of affective state recognition is considered by analysing both facial expression and body gesture together for interpreting the emotional state of a person.


Facial Expression Canonical Correlation Analysis Hand Gesture Belief Revision Atomic Action 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Ambady, N., Rosenthal, R.: Thin slices of expressive behaviour as predictors of interpersonal consequences: a meta-analysis. Psychol. Bull. 111(2), 256–274 (1992) CrossRefGoogle Scholar
  2. Asada, H., Brady, M.: The curvature primal sketch. IEEE Trans. Pattern Anal. Mach. Intell. 8(1), 2–14 (1986) CrossRefGoogle Scholar
  3. Ballard, D., Brown, C.: Computer Vision. Prentice Hall, New York (1982) Google Scholar
  4. Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Annu. Math. Stat. 37, 1554–1563 (1966) MathSciNetMATHCrossRefGoogle Scholar
  5. Black, M., Jepson, A.: Recognizing temporal trajectories using the condensation algorithm. In: IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, April 1998, pp. 16–21 (1998) CrossRefGoogle Scholar
  6. Borga, M.: Learning multidimensional signal processing. PhD thesis, Linkoping University, SE-581 83 Linkoping, Sweden. Dissertation No. 531 (1998) Google Scholar
  7. Bowden, R., Mitchell, T., Sarhadi, M.: Reconstructing 3D pose and motion from a single camera view. In: British Machine Vision Conference, Southhampton, UK, pp. 904–913 (1998) Google Scholar
  8. Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J., 2nd Quarter (1998) Google Scholar
  9. Byers, S., Raftery, A.: Nearest-neighbour clutter removal for estimating features in spatial point processes. J. Am. Stat. Assoc. 93(442), 577–584 (1998) MATHCrossRefGoogle Scholar
  10. Charniak, E.: Bayesian networks without tears. AI Mag. 12(4), 50–63 (1991) Google Scholar
  11. Cowell, R., Philip, A., Lauritzen, S., Spiegelhalter, D.: Probabilistic Networks and Expert Systems. Springer, Berlin (1999) MATHGoogle Scholar
  12. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977) MathSciNetMATHGoogle Scholar
  13. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005) CrossRefGoogle Scholar
  14. Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733 (2003) CrossRefGoogle Scholar
  15. Gavrila, D., Davis, L.S.: Towards 3-D model based tracking and recognition of human movement: a multi-view approach. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 272–277 (1995) Google Scholar
  16. Golub, G.H., Zha, H.: The canonical correlations of matrix pairs and their numerical computation. Technical report, Stanford University, Stanford, USA (1992) Google Scholar
  17. Gong, S., Walter, M., Psarrou, A.: Recognition of temporal structures: Learning prior and propagating observation augmented densities via hidden Markov states. In: IEEE International Conference on Computer Vision, Corfu, Greece, pp. 157–162 (1999) CrossRefGoogle Scholar
  18. Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. J. Netw. Comput. Appl. 30(4), 1334–1345 (2007) CrossRefGoogle Scholar
  19. Hardoon, D., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004) MATHCrossRefGoogle Scholar
  20. Heap, T.: Learning deformable shape models for object tracking. PhD thesis, School of Computer Studies, University of Leeds, UK (1997) Google Scholar
  21. Heap, T., Hogg, D.C.: Improving specificity in pdms using a hierarchical approach. In: British Machine Vision Conference, pp. 80–89 (1997) Google Scholar
  22. Hogg, D.C.: Model based vision: A program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983) CrossRefGoogle Scholar
  23. Hotelling, H.: Relations between two sets of variates. Biometrika 8, 321–377 (1936) Google Scholar
  24. Isard, M., Blake, A.: Contour tracking by stochastic propagation of conditional density. In: European Conference on Computer Vision, Cambridge, UK, pp. 343–357 (1996) Google Scholar
  25. Isard, M., Blake, A.: CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998) CrossRefGoogle Scholar
  26. Kapoor, A., Picard, R.W.: Multimodal affect recognition in learning environments. In: ACM International Conference on Multimedia, pp. 677–682 (2005) Google Scholar
  27. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: IEEE International Conference on Computer Vision, pp. 166–173 (2005) Google Scholar
  28. Laptev, I., Lindeberg, T.: Space-time interest points. In: IEEE International Conference on Computer Vision, pp. 432–439 (2003) CrossRefGoogle Scholar
  29. Lauritzen, S., Spiegelhalter, D.: Local Computations with probabilities on graphical structures and their application to expert systems. In: Shafer, G., Pearl, J. (eds.) Readings in Uncertain Reasoning, pp. 415–448. Morgan Kaufmann, San Mateo (1990) Google Scholar
  30. Li, Y., Gong, S., Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. In: IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, March 2000, pp. 300–305 (2000) Google Scholar
  31. McKenna, S., Gong, S.: Gesture recognition for visually mediated interaction using probabilistic event trajectories. In: British Machine Vision Conference, Southampton, UK, pp. 498–508 (1998) Google Scholar
  32. McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press, Chicago (1992) Google Scholar
  33. Meeren, H., Heijnsbergen, C., Gelder, B.: Rapid perceptual integration of facial expression and emotional body language. Proc. Natl. Acad. Sci. USA 102(45), 16518–16523 (2005) CrossRefGoogle Scholar
  34. Melzer, T., Reiter, M., Bischof, H.: Appearance models based on kernel canonical correlation analysis. Pattern Recognit. 39(9), 1961–1973 (2003) CrossRefGoogle Scholar
  35. Ng, J., Gong, S.: Composite support vector machines for the detection of faces across views and pose estimation. Image Vis. Comput. 20(5–6), 359–368 (2002) CrossRefGoogle Scholar
  36. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference, Edinburgh, UK, September 2006 Google Scholar
  37. Ong, E.-J., Gong, S.: The dynamics of linear combinations. Image Vis. Comput. 20(5–6), 397–414 (2002) CrossRefGoogle Scholar
  38. Pantic, M., Sebe, N., Cohn, J., Huang, T.S.: Affective multimodal human-computer interaction. In: ACM International Conference on Multimedia, pp. 669–676 (2005) Google Scholar
  39. Pantic, M., Pentland, A., Nijholt, A., Huang, T.S.: Human computing and machine understanding of human behavior: a survey. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds.) Artificial Intelligence for Human Computing, vol. 4451, pp. 47–71. Springer, Berlin (2007) CrossRefGoogle Scholar
  40. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988) Google Scholar
  41. Pentland, A.: Automatic extraction of deformable models. Int. J. Comput. Vis. 4, 107–126 (1990) CrossRefGoogle Scholar
  42. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) Google Scholar
  43. Raja, Y., McKenna, S., Gong, S.: Tracking and segmenting people in varying lighting conditions using colour. In: IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 228–233 (1998) CrossRefGoogle Scholar
  44. Regh, J.M.: Visual analysis of high DOF articulated objects with application to hand tracking. PhD thesis, Carnegie Mellon University, Pittsburgh, USA (1995) Google Scholar
  45. Rissanen, J.: Modelling by shortest data description. Automatica 14, 465–471 (1978) MATHCrossRefGoogle Scholar
  46. Russell, D., Gong, S.: Minimum cuts of a time-varying background. In: British Machine Vision Conference, Edinburgh, UK, September 2006, pp. 809–818 (2006) Google Scholar
  47. Shan, C., Gong, S., McOwan, P.: Beyond facial expressions: Learning human emotion from body gestures. In: British Machine Vision Conference, Warwick, UK, September 2007 Google Scholar
  48. Sherrah, J., Gong, S.: Resolving visual uncertainty and occlusion through probabilistic reasoning. In: British Machine Vision Conference, Bristol, UK, September 2000, pp. 252–261 (2000a) Google Scholar
  49. Sherrah, J., Gong, S.: Tracking discontinuous motion using Bayesian inference. In: European Conference on Computer Vision, Dublin, Ireland, June 2000, pp. 150–166 (2000b) Google Scholar
  50. Walter, M., Psarrou, A., Gong, S.: Data driven model acquisition using minimum description length. In: British Machine Vision Conference, Manchester, UK, pp. 673–683 (2001) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK
  2. 2.School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK

Personalised recommendations