Modeling Coarticulation in Synthetic Visual Speech

Cohen, Michael M.; Massaro, Dominic W.

doi:10.1007/978-4-431-66911-1_13

Michael M. Cohen³ &
Dominic W. Massaro⁴

Part of the book series: Computer Animation Series ((3056))

132 Accesses
186 Citations

Abstract

After describing the importance of visual information in speech perception and sketching the history of visual speech synthesis, we consider a number of theories of coarticulation in human speech. An implementation of Löfqvist’s (1990) gestural theory of speech production is described for visual speech synthesis along with a description of the graphically controlled development system. We conclude with some plans for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abry, C. & Lallouache, T. (1991) Audibility and Stability of Articulatory Movements: Deciphering two experiments on anticipatory rounding in French. Proc. of the 12th Int. Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 1, 220–225.
Google Scholar
Allen, J., Hunnicutt, M. S., and Klatt, D. (1987) From text to speech: The MITalk system. Cambridge, MA: Cambridge University Press.
Google Scholar
Bell-Berti, F. & Harris K. S. (1979) Anticipatory coarticulation: Some implications from a study of lip rounding. Journal of the Acoustical Society of America, 65, 1268–1270.
Article Google Scholar
Bell-Berti, F. & Harris K. S. (1982) Temporal patterns of coarticulation: Lip rounding. Journal of the Acoustical Society of America, 71, 449–459.
Article Google Scholar
Benguerel, A. P. & Cowan, H. A. (1974) Coarticulation of upper lip protrusion in French. Phonetica, 30, 41–55.
Article Google Scholar
Benguerel A. P. & Pichora-Fuller M. K. (1982) Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–607.
Google Scholar
Bernstein, L.E. & Eberhardt, S. P. (1986) Johns Hopkins lipreading corpus I-II: Disc I. [Videodisc]. Baltimore: The Johns Hopkins University.
Google Scholar
Bladon, R. A. & Al-Bamerni, A. (1976) Coarticulation resistance of English /1/. Journal of Phonetics, 4, 135–150.
Google Scholar
Bladon, R. A. & Al-Bamemi, A. (1982) One stage and two-stage temporal patterns of velar coarticulation. Journal of the Acoustical Society of America, 72, S 104 (A).
Google Scholar
Boyce, S. E. (1990) Coarticulatory organization for lip rounding in Turkish and English. Journal of the Acoustical Society of America, 88, 2584–2595.
Article Google Scholar
Breeuwer, M., & Plomp, R. (1985) Speechreading supplemented with formant-frequency information for voiced speech. Journal of the Acoustical Society of America, 77, 314–317.
Article Google Scholar
Brooke, N. M. & Summerfield, A. Q. (1983) Analysis, synthesis, and perception of visible articulatory movements. Journal of Phonetics, 11, 63–76.
Google Scholar
Bnrnswik, E. (1955) Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217.
Article Google Scholar
Cathiard, M. A., Tiberghien, G., Cirot-Tseva, A., Lallouache, M.-T., & Escudier, P. (1991) Visual perception of anticipatory rounding during acoustic pauses: A cross-language study. Proc. of the 12th Int. Congress of.’honetic Sciences, Aix-en-Provence, France.
Google Scholar
Cohen, M. M. & Massaro, D. W. (1990) Synthesis of visible speech. Behavioral Research Methods and Instrumentation, 22, 260–263.
Article Google Scholar
DECtalk (1985) Programmers Reference Manual Maynard, MA: Digital Equipment Corporation.
Google Scholar
Eckman, P. & Friesen, W. V. (1977) Manual for the Facial Action Coding System Palo Alto: Consulting Psychologists Press.
Google Scholar
Elovitz, H. S., Johnson, R. W., McHugh, A., & Shore, J. E. (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929. Washington, DC: NTIS.
Google Scholar
Elson, M. (1990) Displacement facial animation techniques. SIGGRAPH Facial Animation Course Notes, 21–42.
Google Scholar
Erber, N. P. & De Filippo, C. L. (1978) Voice-mouth synthesis of /pa, ba, ma/. Journal of the Acoustical Society of America, 64, 1015–1019.
Article Google Scholar
Finn, K. E. (1986) An Investigation of Visible Lip Information to be Used in Automated Speech Recognition. Ph.D. thesis, Georgetown University.
Google Scholar
Flanagan, J. L., Ishizaka, K. & Shipley, K. L. (1975) Synthesis of speech from a dynamic model of the vocal cords and vocal tract. Bell System Technology Journal, 54, 485–506.
Article Google Scholar
Fujimura, O. (1961) Bilabial stop and nasal consonants: A motion picture study and its acoustical implications. Journal of Speech and Hearing Research, 4, 232–247.
Google Scholar
Gay, T. & Hirose, H. (1973) Effect of speaking rate on labial consonant production. Phonetica, 27, 44–56.
Article Google Scholar
Gelfer, C. E., Bell-Berti, F. & Harris K. S. (1989) Determining the extent of coarticulation: Effects of experimental design. Journal of the Acoustical Society of America, 86, 2443–2445.
Article Google Scholar
Gouraud, H. (1971) Computer display of curved surfaces, IEEE transactions, C-20(6), 623.
Google Scholar
Henke, W. L. (1967) Preliminaries to speech synthesis based on an articulatory model. Proceedings of the IEEE Speech Conference, Boston, 170–171.
Google Scholar
Hill, D. R., Pearce, A., & Wyvill, B. (1986) Animating speech: An automated approach using speech synthesized by rules. The Visual Computer, 3, 277–289.
Article Google Scholar
Kent, R. D. (1970) A Cinefluorographic-Spectrographic Investigation of the Consonant Gestures in Lingual Articulation. Ph.D. thesis, University of Iowa.
Google Scholar
Kent, R. D. (1972) Some considerations in the cinefluorographic analysis of tongue movements during speech. Phonetica, 26, 16–32.
Article Google Scholar
Kent, R. D. (1983) The Segmental Organization of Speech. in P. F. MacNeilage (Ed.) The Production of Speech. New York: Springer-Verlag.
Google Scholar
Kent, R. D. & Minifie, F. D. (1977) Coarticulation in recent speech production models. Journal of Phonetics, 5, 115–133.
Google Scholar
Kent, R. D. & Moll, K. L. (1972) Tongue body articulation during vocal and diphthong gestures. Folia Phoniatrica, 24, 286–300.
Article Google Scholar
Klatt, D. (1979) Synthesis by rule of segmental durations in English sentences. in B. Lindblom and S. Oilman (Eds.) Frontiers of Speech Communication Research. London: Academic Press.
Google Scholar
Klatt, D. (1980) Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971–995.
Article Google Scholar
Kozhevnikov, V. A. & Chistovich, L. A. (1965) Rech: Artikulatsiya i Vospriatatie (Moscow-Lenningrad). Trans. Speech: Articulation and Perception. Washington, DC: Joint Publication Research Service, No. 30, 543.
Google Scholar
Kuehn, D. P. & Moll, K. L. (1976) A cineradiographic study of VC and CV articulatory velocities. Journal of Phonetics, 4, 303–320.
Google Scholar
Lewis, J. P. & Parke, F. I. (1987) Automated lipsynch and speech synthesis for character animation. Proceedings CHI+CG ‘87, Toronto, 143–147.
Google Scholar
Löfqvist, A. (1990) Speech as audible gestures. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 289–322.
Chapter Google Scholar
Löfqvist, A. & Yoshika, H. (1981) Laryngeal activity in Icelandic obstruent production. Nordic Journal of Linguistics, 4, 1–18.
Article Google Scholar
Lubker, J. & Gay, T. (1982) Anticipatory labial coarticulation: Experimental, biological, and linguistic variables. Journal of the Acoustical Society of America, 71, 437–448.
Article Google Scholar
Massaro, D. W. (1987) Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Massaro, D. W. (1989) A precis of Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Behavioral and Brain Sciences, 12, 741–794.
Article Google Scholar
Massaro, D. W. (1990) A Fuzzy logical Model of Speech Perception Proceedings of the XXIV International Congress of Psychology.
Google Scholar
Massaro, D. W., & Cohen, M. M. (1983) Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9, 753–771.
Article Google Scholar
Massaro, D. W. & Cohen, M. M. (1990) Perception of synthesized audible and visible speech. Psychological Science, 1, 55–63.
Article Google Scholar
Montgomery, A. A. (1980) Development of a model for generating synthetic animated lip shapes. Journal of the Acoustical Society of America, 68, S58 (abstract)
Article Google Scholar
Montgomery, A. A., & Jackson, P. L. (1983) Physical characteristics of the lips underlying vowel lipreading performance. Journal of the Acoustical Society of America, 73, 2134–2144.
Article Google Scholar
Munhall, K. & Löfqvist, A. (1992) Gestural aggregation in speech: Laryngeal gestures. Journal of Phonetics, 20, 111–126.
Google Scholar
Nahas, M., Huitric, H., & Saintourens, M. (1988) Animation of a B-spline figure. The Visual Computer, 3, 272–276.
Article Google Scholar
Öhman, S. (1966) Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America, 39, 151–168
Article Google Scholar
Öhman, S. (1967) Numerical model of coarticulation. Journal of the Acoustical Society of America, 41, 310–320.
Article Google Scholar
Overmars (1990) Forms Library. Dept. of Computer Science, Ultrecht University, Ultrecht, the Netherlands.
Google Scholar
Parke, F. I. (1974) A parametric model for human faces, Tech. Report UTEC-CSc-75–047. Salt Lake City: University of Utah
Google Scholar
Parke, F. I. (1975) A model for human faces that allows speech synchronized animation. Journal of Computers and Graphics, 1(1), 1–4.
Google Scholar
Parke, F. I. (1982) Parameterized models for facial animation, IEEE Computer Graphics, 2(9), 61–68.
Google Scholar
Parke, F. I. (1991) Control Parameterization for facial animation, in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
Google Scholar
Pelachaud, C., Badler, N. I., & Steedman, M. (1991) Linguistic issues in facial animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
Google Scholar
Pearce, A., Wyvill, B., Wyvill, G., & Hill, D. (1986) Speech and expression: A computer solution to face animation. Graphics Interface ‘86.
Google Scholar
Perkell, J. S. (1969) Physiology of Speech Production: Results and Implications of a Cineradiographic Study. Cambridge, Massachusetts: MIT Press.
Google Scholar
Perkell, J. S. (1990) Testing theories of speech production: Implications of some detailed analysis of variable articulation rate. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 262–288.
Google Scholar
Perkell, J. S. & Chiang, C. (1986) Preliminary support for a “hybrid model” of anticipatory coarticulation. Proceedings of the 12th International Conference of Acoustics, A3–6.
Google Scholar
Platt, S.M. & Badler, N. I. (1981) Animating Facial Expressions. Computer Graphics, 15(3), 245–252.
Google Scholar
Recasens, D. (1984) Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of the Acoustical Society of America, 76, 1624–1635.
Article Google Scholar
Reynolds, C. W. (1985) Description and control of time and dynamics in computer animation. SIGGRAPH Advanced Computer Animation Course Notes, 21–42.
Google Scholar
Saltzman, E. L., Rubin, P. E., Goldstein, L. & Browman, C. P. (1987) Task-dynamic modeling of interarticulator coordination. Journal of the Acoustical Society of America, 82, S15.
Article Google Scholar
Terzopoulous, D. & Waters K. (1990) Muscle parameter estimation from image sequences. SIGGRAPH Facial Animation Course Notes, 146–155.
Google Scholar
Terzopoulous, D. & Waters K. (1991) Techniques for realistic facial modeling and animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
Google Scholar
VOTRAX (1981) User’s Manual Votrax, Div. of Federal Screw Works.
Google Scholar
Waters, K. (1987) A muscle model for animating three-dimensional facial expression. IEEE Computer Graphics, 21(4).
Google Scholar
Waters, K. (1990) Modeling 3D facial expressions. SIGGRAPH Facial Animation Course Notes, 109–129.
Google Scholar
Waters, K. & Terzopoulous, D. (1990) A physical model of facial tissue and muscle articulation. SIG-GRAPH Facial Animation Course Notes, 130–145.
Google Scholar

Download references

Author information

Authors and Affiliations

UC-Santa Cruz, 68 Clark Kerr Hall, Santa Cruz, CA, 96064, USA
Michael M. Cohen
UC-Santa Cruz, 433 Clark Kerr Hall, Santa Cruz, CA, 96064, USA
Dominic W. Massaro

Authors

Michael M. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Dominic W. Massaro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MIRALab, Centre Universitaire d’Informatique, University of Geneva, 24 rue du Général-Dufour, CH-1211, Geneva 4, Switzerland
Nadia Magnenat Thalmann
Computer Graphics Lab., Swiss Federal Institute of Technology, CH-1015, Lausanne, Switzerland
Daniel Thalmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cohen, M.M., Massaro, D.W. (1993). Modeling Coarticulation in Synthetic Visual Speech. In: Thalmann, N.M., Thalmann, D. (eds) Models and Techniques in Computer Animation. Computer Animation Series. Springer, Tokyo. https://doi.org/10.1007/978-4-431-66911-1_13

Download citation

DOI: https://doi.org/10.1007/978-4-431-66911-1_13
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-66913-5
Online ISBN: 978-4-431-66911-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics