Biological Motion of Speech

  • Gregor A. Kalberer
  • Pascal Müller
  • Luc Van Gool
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2525)


The paper discusses the detailed analysis of visual speech. As with other forms of biological motion, humans are known to be very sensitive to the realism in the ways the lips move. In order to determine the elements that come to play in the perceptual analysis of visual speech, it is important to have control over the data. The paper discusses the capture of detailed 3D deformations of faces when talking. The data are detailed in both a temporal and spatial sense. The 3D positions of thousands of points on the face are determined at the temporal resolution of video. Such data have been decomposed into their basic modes, using ICA. It is noteworthy that this yielded better results than a mere PCA analysis, which results in modes that individually represent facial changes that anatomically inconsistent. The ICs better capture the underlying, anatomical changes that the face undergoes. Different visemes are all based on the underlying, joint action of the facial muscles. The IC modes do not reflect single muscles, but nevertheless decompose the speech related deformations into anatomically convincing modes, coined ‘pseudo-muscles’.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Guenter B., Grimm C., Wood D., Malvar H. and Pighin F., “Making Faces”, SIGGRAPH’98 Conf. Proc.,vol. 32, pp. 55–66, 1998.Google Scholar
  2. 2.
    Hyvärinen A., “Independent Component Analysis by minimizing of mutual information”, Technical Report A46, Helsinki University of Technology, 1997.Google Scholar
  3. 4.
    Kalberer G. and Van Gool L., “Lip animation based on observed 3D speech dynamics”, SPIE Proc.,vol. 4309, pp. 16–25, 2001.Google Scholar
  4. 5.
    Kalberer G. and Van Gool L., “Face Animation Based on Observed 3D Speech Dynamics” Computer Animation 2001. Proc., pp. 20–27, 2001.Google Scholar
  5. 6.
    Kshirsagar S., Molet T. and Magnenat-Thalmann N., “Principal components of expressive speech animation”, Computer Graphics Int. Proc., pp. 38–44, 2001.Google Scholar
  6. 7.
    Lin I., Yeh J. and Ouhyoung M., “Realistic 3D Facial Animation Parameters from Mirror-reflected Multi-view Video”, Computer Animation 2001 Conf. Proc., pp. 2–11, 2001.Google Scholar
  7. 8.
    Massaro D. W., “Perceiving Talking Faces”, MIT. Press, 1998.Google Scholar
  8. 9.
    Noh J. and Neumann U., “Expression Cloning”, SIGGRAPH’01 Conf. Proc., pp. 277–288, 2001.Google Scholar
  9. 10.
    Pighin F., Hecker J., Lischinski D., Szeliski R. and Salesin D., “Synthesizing Realistic Facial Expressions from Photographs”, SIGGRAPH’98 Conf. Proc., pp. 75–84, 1998.Google Scholar
  10. 11.
    Reveret L., Bailly G. and Badin P., “MOTHER, A new generation of talking heads providing a flexible articulatory control for videorealistic speech animation”, ICSL’00 Proc., 2000.Google Scholar
  11. 12.
    Rosenblum, L.D. and Saldaña, H.M, “Time-varying information for visual speech perception”, In Hearing by Eye,vol. 2, pp. 61–81, ed. Campbell R., Dodd B. and Burnham D.,1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Gregor A. Kalberer
    • 1
  • Pascal Müller
    • 1
  • Luc Van Gool
    • 2
  1. 1.D-ITET/BIWIETH ZurichSwitzerland
  2. 2.ESAT/PSI/VisicsKULeuvenBelgium

Personalised recommendations