Skip to main content

Modeling Coarticulation in Synthetic Visual Speech

  • Conference paper
Models and Techniques in Computer Animation

Part of the book series: Computer Animation Series ((3056))

Abstract

After describing the importance of visual information in speech perception and sketching the history of visual speech synthesis, we consider a number of theories of coarticulation in human speech. An implementation of Löfqvist’s (1990) gestural theory of speech production is described for visual speech synthesis along with a description of the graphically controlled development system. We conclude with some plans for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abry, C. & Lallouache, T. (1991) Audibility and Stability of Articulatory Movements: Deciphering two experiments on anticipatory rounding in French. Proc. of the 12th Int. Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 1, 220–225.

    Google Scholar 

  • Allen, J., Hunnicutt, M. S., and Klatt, D. (1987) From text to speech: The MITalk system. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  • Bell-Berti, F. & Harris K. S. (1979) Anticipatory coarticulation: Some implications from a study of lip rounding. Journal of the Acoustical Society of America, 65, 1268–1270.

    Article  Google Scholar 

  • Bell-Berti, F. & Harris K. S. (1982) Temporal patterns of coarticulation: Lip rounding. Journal of the Acoustical Society of America, 71, 449–459.

    Article  Google Scholar 

  • Benguerel, A. P. & Cowan, H. A. (1974) Coarticulation of upper lip protrusion in French. Phonetica, 30, 41–55.

    Article  Google Scholar 

  • Benguerel A. P. & Pichora-Fuller M. K. (1982) Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–607.

    Google Scholar 

  • Bernstein, L.E. & Eberhardt, S. P. (1986) Johns Hopkins lipreading corpus I-II: Disc I. [Videodisc]. Baltimore: The Johns Hopkins University.

    Google Scholar 

  • Bladon, R. A. & Al-Bamerni, A. (1976) Coarticulation resistance of English /1/. Journal of Phonetics, 4, 135–150.

    Google Scholar 

  • Bladon, R. A. & Al-Bamemi, A. (1982) One stage and two-stage temporal patterns of velar coarticulation. Journal of the Acoustical Society of America, 72, S 104 (A).

    Google Scholar 

  • Boyce, S. E. (1990) Coarticulatory organization for lip rounding in Turkish and English. Journal of the Acoustical Society of America, 88, 2584–2595.

    Article  Google Scholar 

  • Breeuwer, M., & Plomp, R. (1985) Speechreading supplemented with formant-frequency information for voiced speech. Journal of the Acoustical Society of America, 77, 314–317.

    Article  Google Scholar 

  • Brooke, N. M. & Summerfield, A. Q. (1983) Analysis, synthesis, and perception of visible articulatory movements. Journal of Phonetics, 11, 63–76.

    Google Scholar 

  • Bnrnswik, E. (1955) Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217.

    Article  Google Scholar 

  • Cathiard, M. A., Tiberghien, G., Cirot-Tseva, A., Lallouache, M.-T., & Escudier, P. (1991) Visual perception of anticipatory rounding during acoustic pauses: A cross-language study. Proc. of the 12th Int. Congress of.’honetic Sciences, Aix-en-Provence, France.

    Google Scholar 

  • Cohen, M. M. & Massaro, D. W. (1990) Synthesis of visible speech. Behavioral Research Methods and Instrumentation, 22, 260–263.

    Article  Google Scholar 

  • DECtalk (1985) Programmers Reference Manual Maynard, MA: Digital Equipment Corporation.

    Google Scholar 

  • Eckman, P. & Friesen, W. V. (1977) Manual for the Facial Action Coding System Palo Alto: Consulting Psychologists Press.

    Google Scholar 

  • Elovitz, H. S., Johnson, R. W., McHugh, A., & Shore, J. E. (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929. Washington, DC: NTIS.

    Google Scholar 

  • Elson, M. (1990) Displacement facial animation techniques. SIGGRAPH Facial Animation Course Notes, 21–42.

    Google Scholar 

  • Erber, N. P. & De Filippo, C. L. (1978) Voice-mouth synthesis of /pa, ba, ma/. Journal of the Acoustical Society of America, 64, 1015–1019.

    Article  Google Scholar 

  • Finn, K. E. (1986) An Investigation of Visible Lip Information to be Used in Automated Speech Recognition. Ph.D. thesis, Georgetown University.

    Google Scholar 

  • Flanagan, J. L., Ishizaka, K. & Shipley, K. L. (1975) Synthesis of speech from a dynamic model of the vocal cords and vocal tract. Bell System Technology Journal, 54, 485–506.

    Article  Google Scholar 

  • Fujimura, O. (1961) Bilabial stop and nasal consonants: A motion picture study and its acoustical implications. Journal of Speech and Hearing Research, 4, 232–247.

    Google Scholar 

  • Gay, T. & Hirose, H. (1973) Effect of speaking rate on labial consonant production. Phonetica, 27, 44–56.

    Article  Google Scholar 

  • Gelfer, C. E., Bell-Berti, F. & Harris K. S. (1989) Determining the extent of coarticulation: Effects of experimental design. Journal of the Acoustical Society of America, 86, 2443–2445.

    Article  Google Scholar 

  • Gouraud, H. (1971) Computer display of curved surfaces, IEEE transactions, C-20(6), 623.

    Google Scholar 

  • Henke, W. L. (1967) Preliminaries to speech synthesis based on an articulatory model. Proceedings of the IEEE Speech Conference, Boston, 170–171.

    Google Scholar 

  • Hill, D. R., Pearce, A., & Wyvill, B. (1986) Animating speech: An automated approach using speech synthesized by rules. The Visual Computer, 3, 277–289.

    Article  Google Scholar 

  • Kent, R. D. (1970) A Cinefluorographic-Spectrographic Investigation of the Consonant Gestures in Lingual Articulation. Ph.D. thesis, University of Iowa.

    Google Scholar 

  • Kent, R. D. (1972) Some considerations in the cinefluorographic analysis of tongue movements during speech. Phonetica, 26, 16–32.

    Article  Google Scholar 

  • Kent, R. D. (1983) The Segmental Organization of Speech. in P. F. MacNeilage (Ed.) The Production of Speech. New York: Springer-Verlag.

    Google Scholar 

  • Kent, R. D. & Minifie, F. D. (1977) Coarticulation in recent speech production models. Journal of Phonetics, 5, 115–133.

    Google Scholar 

  • Kent, R. D. & Moll, K. L. (1972) Tongue body articulation during vocal and diphthong gestures. Folia Phoniatrica, 24, 286–300.

    Article  Google Scholar 

  • Klatt, D. (1979) Synthesis by rule of segmental durations in English sentences. in B. Lindblom and S. Oilman (Eds.) Frontiers of Speech Communication Research. London: Academic Press.

    Google Scholar 

  • Klatt, D. (1980) Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971–995.

    Article  Google Scholar 

  • Kozhevnikov, V. A. & Chistovich, L. A. (1965) Rech: Artikulatsiya i Vospriatatie (Moscow-Lenningrad). Trans. Speech: Articulation and Perception. Washington, DC: Joint Publication Research Service, No. 30, 543.

    Google Scholar 

  • Kuehn, D. P. & Moll, K. L. (1976) A cineradiographic study of VC and CV articulatory velocities. Journal of Phonetics, 4, 303–320.

    Google Scholar 

  • Lewis, J. P. & Parke, F. I. (1987) Automated lipsynch and speech synthesis for character animation. Proceedings CHI+CG ‘87, Toronto, 143–147.

    Google Scholar 

  • Löfqvist, A. (1990) Speech as audible gestures. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 289–322.

    Chapter  Google Scholar 

  • Löfqvist, A. & Yoshika, H. (1981) Laryngeal activity in Icelandic obstruent production. Nordic Journal of Linguistics, 4, 1–18.

    Article  Google Scholar 

  • Lubker, J. & Gay, T. (1982) Anticipatory labial coarticulation: Experimental, biological, and linguistic variables. Journal of the Acoustical Society of America, 71, 437–448.

    Article  Google Scholar 

  • Massaro, D. W. (1987) Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Massaro, D. W. (1989) A precis of Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Behavioral and Brain Sciences, 12, 741–794.

    Article  Google Scholar 

  • Massaro, D. W. (1990) A Fuzzy logical Model of Speech Perception Proceedings of the XXIV International Congress of Psychology.

    Google Scholar 

  • Massaro, D. W., & Cohen, M. M. (1983) Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9, 753–771.

    Article  Google Scholar 

  • Massaro, D. W. & Cohen, M. M. (1990) Perception of synthesized audible and visible speech. Psychological Science, 1, 55–63.

    Article  Google Scholar 

  • Montgomery, A. A. (1980) Development of a model for generating synthetic animated lip shapes. Journal of the Acoustical Society of America, 68, S58 (abstract)

    Article  Google Scholar 

  • Montgomery, A. A., & Jackson, P. L. (1983) Physical characteristics of the lips underlying vowel lipreading performance. Journal of the Acoustical Society of America, 73, 2134–2144.

    Article  Google Scholar 

  • Munhall, K. & Löfqvist, A. (1992) Gestural aggregation in speech: Laryngeal gestures. Journal of Phonetics, 20, 111–126.

    Google Scholar 

  • Nahas, M., Huitric, H., & Saintourens, M. (1988) Animation of a B-spline figure. The Visual Computer, 3, 272–276.

    Article  Google Scholar 

  • Öhman, S. (1966) Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America, 39, 151–168

    Article  Google Scholar 

  • Öhman, S. (1967) Numerical model of coarticulation. Journal of the Acoustical Society of America, 41, 310–320.

    Article  Google Scholar 

  • Overmars (1990) Forms Library. Dept. of Computer Science, Ultrecht University, Ultrecht, the Netherlands.

    Google Scholar 

  • Parke, F. I. (1974) A parametric model for human faces, Tech. Report UTEC-CSc-75–047. Salt Lake City: University of Utah

    Google Scholar 

  • Parke, F. I. (1975) A model for human faces that allows speech synchronized animation. Journal of Computers and Graphics, 1(1), 1–4.

    Google Scholar 

  • Parke, F. I. (1982) Parameterized models for facial animation, IEEE Computer Graphics, 2(9), 61–68.

    Google Scholar 

  • Parke, F. I. (1991) Control Parameterization for facial animation, in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.

    Google Scholar 

  • Pelachaud, C., Badler, N. I., & Steedman, M. (1991) Linguistic issues in facial animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.

    Google Scholar 

  • Pearce, A., Wyvill, B., Wyvill, G., & Hill, D. (1986) Speech and expression: A computer solution to face animation. Graphics Interface ‘86.

    Google Scholar 

  • Perkell, J. S. (1969) Physiology of Speech Production: Results and Implications of a Cineradiographic Study. Cambridge, Massachusetts: MIT Press.

    Google Scholar 

  • Perkell, J. S. (1990) Testing theories of speech production: Implications of some detailed analysis of variable articulation rate. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 262–288.

    Google Scholar 

  • Perkell, J. S. & Chiang, C. (1986) Preliminary support for a “hybrid model” of anticipatory coarticulation. Proceedings of the 12th International Conference of Acoustics, A3–6.

    Google Scholar 

  • Platt, S.M. & Badler, N. I. (1981) Animating Facial Expressions. Computer Graphics, 15(3), 245–252.

    Google Scholar 

  • Recasens, D. (1984) Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of the Acoustical Society of America, 76, 1624–1635.

    Article  Google Scholar 

  • Reynolds, C. W. (1985) Description and control of time and dynamics in computer animation. SIGGRAPH Advanced Computer Animation Course Notes, 21–42.

    Google Scholar 

  • Saltzman, E. L., Rubin, P. E., Goldstein, L. & Browman, C. P. (1987) Task-dynamic modeling of interarticulator coordination. Journal of the Acoustical Society of America, 82, S15.

    Article  Google Scholar 

  • Terzopoulous, D. & Waters K. (1990) Muscle parameter estimation from image sequences. SIGGRAPH Facial Animation Course Notes, 146–155.

    Google Scholar 

  • Terzopoulous, D. & Waters K. (1991) Techniques for realistic facial modeling and animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.

    Google Scholar 

  • VOTRAX (1981) User’s Manual Votrax, Div. of Federal Screw Works.

    Google Scholar 

  • Waters, K. (1987) A muscle model for animating three-dimensional facial expression. IEEE Computer Graphics, 21(4).

    Google Scholar 

  • Waters, K. (1990) Modeling 3D facial expressions. SIGGRAPH Facial Animation Course Notes, 109–129.

    Google Scholar 

  • Waters, K. & Terzopoulous, D. (1990) A physical model of facial tissue and muscle articulation. SIG-GRAPH Facial Animation Course Notes, 130–145.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer Japan

About this paper

Cite this paper

Cohen, M.M., Massaro, D.W. (1993). Modeling Coarticulation in Synthetic Visual Speech. In: Thalmann, N.M., Thalmann, D. (eds) Models and Techniques in Computer Animation. Computer Animation Series. Springer, Tokyo. https://doi.org/10.1007/978-4-431-66911-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-66911-1_13

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-66913-5

  • Online ISBN: 978-4-431-66911-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics