Abstract
After describing the importance of visual information in speech perception and sketching the history of visual speech synthesis, we consider a number of theories of coarticulation in human speech. An implementation of Löfqvist’s (1990) gestural theory of speech production is described for visual speech synthesis along with a description of the graphically controlled development system. We conclude with some plans for future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abry, C. & Lallouache, T. (1991) Audibility and Stability of Articulatory Movements: Deciphering two experiments on anticipatory rounding in French. Proc. of the 12th Int. Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 1, 220–225.
Allen, J., Hunnicutt, M. S., and Klatt, D. (1987) From text to speech: The MITalk system. Cambridge, MA: Cambridge University Press.
Bell-Berti, F. & Harris K. S. (1979) Anticipatory coarticulation: Some implications from a study of lip rounding. Journal of the Acoustical Society of America, 65, 1268–1270.
Bell-Berti, F. & Harris K. S. (1982) Temporal patterns of coarticulation: Lip rounding. Journal of the Acoustical Society of America, 71, 449–459.
Benguerel, A. P. & Cowan, H. A. (1974) Coarticulation of upper lip protrusion in French. Phonetica, 30, 41–55.
Benguerel A. P. & Pichora-Fuller M. K. (1982) Coarticulation effects in lipreading. Journal of Speech and Hearing Research, 25, 600–607.
Bernstein, L.E. & Eberhardt, S. P. (1986) Johns Hopkins lipreading corpus I-II: Disc I. [Videodisc]. Baltimore: The Johns Hopkins University.
Bladon, R. A. & Al-Bamerni, A. (1976) Coarticulation resistance of English /1/. Journal of Phonetics, 4, 135–150.
Bladon, R. A. & Al-Bamemi, A. (1982) One stage and two-stage temporal patterns of velar coarticulation. Journal of the Acoustical Society of America, 72, S 104 (A).
Boyce, S. E. (1990) Coarticulatory organization for lip rounding in Turkish and English. Journal of the Acoustical Society of America, 88, 2584–2595.
Breeuwer, M., & Plomp, R. (1985) Speechreading supplemented with formant-frequency information for voiced speech. Journal of the Acoustical Society of America, 77, 314–317.
Brooke, N. M. & Summerfield, A. Q. (1983) Analysis, synthesis, and perception of visible articulatory movements. Journal of Phonetics, 11, 63–76.
Bnrnswik, E. (1955) Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217.
Cathiard, M. A., Tiberghien, G., Cirot-Tseva, A., Lallouache, M.-T., & Escudier, P. (1991) Visual perception of anticipatory rounding during acoustic pauses: A cross-language study. Proc. of the 12th Int. Congress of.’honetic Sciences, Aix-en-Provence, France.
Cohen, M. M. & Massaro, D. W. (1990) Synthesis of visible speech. Behavioral Research Methods and Instrumentation, 22, 260–263.
DECtalk (1985) Programmers Reference Manual Maynard, MA: Digital Equipment Corporation.
Eckman, P. & Friesen, W. V. (1977) Manual for the Facial Action Coding System Palo Alto: Consulting Psychologists Press.
Elovitz, H. S., Johnson, R. W., McHugh, A., & Shore, J. E. (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929. Washington, DC: NTIS.
Elson, M. (1990) Displacement facial animation techniques. SIGGRAPH Facial Animation Course Notes, 21–42.
Erber, N. P. & De Filippo, C. L. (1978) Voice-mouth synthesis of /pa, ba, ma/. Journal of the Acoustical Society of America, 64, 1015–1019.
Finn, K. E. (1986) An Investigation of Visible Lip Information to be Used in Automated Speech Recognition. Ph.D. thesis, Georgetown University.
Flanagan, J. L., Ishizaka, K. & Shipley, K. L. (1975) Synthesis of speech from a dynamic model of the vocal cords and vocal tract. Bell System Technology Journal, 54, 485–506.
Fujimura, O. (1961) Bilabial stop and nasal consonants: A motion picture study and its acoustical implications. Journal of Speech and Hearing Research, 4, 232–247.
Gay, T. & Hirose, H. (1973) Effect of speaking rate on labial consonant production. Phonetica, 27, 44–56.
Gelfer, C. E., Bell-Berti, F. & Harris K. S. (1989) Determining the extent of coarticulation: Effects of experimental design. Journal of the Acoustical Society of America, 86, 2443–2445.
Gouraud, H. (1971) Computer display of curved surfaces, IEEE transactions, C-20(6), 623.
Henke, W. L. (1967) Preliminaries to speech synthesis based on an articulatory model. Proceedings of the IEEE Speech Conference, Boston, 170–171.
Hill, D. R., Pearce, A., & Wyvill, B. (1986) Animating speech: An automated approach using speech synthesized by rules. The Visual Computer, 3, 277–289.
Kent, R. D. (1970) A Cinefluorographic-Spectrographic Investigation of the Consonant Gestures in Lingual Articulation. Ph.D. thesis, University of Iowa.
Kent, R. D. (1972) Some considerations in the cinefluorographic analysis of tongue movements during speech. Phonetica, 26, 16–32.
Kent, R. D. (1983) The Segmental Organization of Speech. in P. F. MacNeilage (Ed.) The Production of Speech. New York: Springer-Verlag.
Kent, R. D. & Minifie, F. D. (1977) Coarticulation in recent speech production models. Journal of Phonetics, 5, 115–133.
Kent, R. D. & Moll, K. L. (1972) Tongue body articulation during vocal and diphthong gestures. Folia Phoniatrica, 24, 286–300.
Klatt, D. (1979) Synthesis by rule of segmental durations in English sentences. in B. Lindblom and S. Oilman (Eds.) Frontiers of Speech Communication Research. London: Academic Press.
Klatt, D. (1980) Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971–995.
Kozhevnikov, V. A. & Chistovich, L. A. (1965) Rech: Artikulatsiya i Vospriatatie (Moscow-Lenningrad). Trans. Speech: Articulation and Perception. Washington, DC: Joint Publication Research Service, No. 30, 543.
Kuehn, D. P. & Moll, K. L. (1976) A cineradiographic study of VC and CV articulatory velocities. Journal of Phonetics, 4, 303–320.
Lewis, J. P. & Parke, F. I. (1987) Automated lipsynch and speech synthesis for character animation. Proceedings CHI+CG ‘87, Toronto, 143–147.
Löfqvist, A. (1990) Speech as audible gestures. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 289–322.
Löfqvist, A. & Yoshika, H. (1981) Laryngeal activity in Icelandic obstruent production. Nordic Journal of Linguistics, 4, 1–18.
Lubker, J. & Gay, T. (1982) Anticipatory labial coarticulation: Experimental, biological, and linguistic variables. Journal of the Acoustical Society of America, 71, 437–448.
Massaro, D. W. (1987) Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, Hillsdale, NJ: Lawrence Erlbaum Associates.
Massaro, D. W. (1989) A precis of Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Behavioral and Brain Sciences, 12, 741–794.
Massaro, D. W. (1990) A Fuzzy logical Model of Speech Perception Proceedings of the XXIV International Congress of Psychology.
Massaro, D. W., & Cohen, M. M. (1983) Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9, 753–771.
Massaro, D. W. & Cohen, M. M. (1990) Perception of synthesized audible and visible speech. Psychological Science, 1, 55–63.
Montgomery, A. A. (1980) Development of a model for generating synthetic animated lip shapes. Journal of the Acoustical Society of America, 68, S58 (abstract)
Montgomery, A. A., & Jackson, P. L. (1983) Physical characteristics of the lips underlying vowel lipreading performance. Journal of the Acoustical Society of America, 73, 2134–2144.
Munhall, K. & Löfqvist, A. (1992) Gestural aggregation in speech: Laryngeal gestures. Journal of Phonetics, 20, 111–126.
Nahas, M., Huitric, H., & Saintourens, M. (1988) Animation of a B-spline figure. The Visual Computer, 3, 272–276.
Öhman, S. (1966) Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America, 39, 151–168
Öhman, S. (1967) Numerical model of coarticulation. Journal of the Acoustical Society of America, 41, 310–320.
Overmars (1990) Forms Library. Dept. of Computer Science, Ultrecht University, Ultrecht, the Netherlands.
Parke, F. I. (1974) A parametric model for human faces, Tech. Report UTEC-CSc-75–047. Salt Lake City: University of Utah
Parke, F. I. (1975) A model for human faces that allows speech synchronized animation. Journal of Computers and Graphics, 1(1), 1–4.
Parke, F. I. (1982) Parameterized models for facial animation, IEEE Computer Graphics, 2(9), 61–68.
Parke, F. I. (1991) Control Parameterization for facial animation, in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
Pelachaud, C., Badler, N. I., & Steedman, M. (1991) Linguistic issues in facial animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
Pearce, A., Wyvill, B., Wyvill, G., & Hill, D. (1986) Speech and expression: A computer solution to face animation. Graphics Interface ‘86.
Perkell, J. S. (1969) Physiology of Speech Production: Results and Implications of a Cineradiographic Study. Cambridge, Massachusetts: MIT Press.
Perkell, J. S. (1990) Testing theories of speech production: Implications of some detailed analysis of variable articulation rate. In W.J. Hardcastle and A. Marchal (Eds.) Speech Production and Speech Modeling. Dordrecht: Kluwer Academic Publishers, 262–288.
Perkell, J. S. & Chiang, C. (1986) Preliminary support for a “hybrid model” of anticipatory coarticulation. Proceedings of the 12th International Conference of Acoustics, A3–6.
Platt, S.M. & Badler, N. I. (1981) Animating Facial Expressions. Computer Graphics, 15(3), 245–252.
Recasens, D. (1984) Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of the Acoustical Society of America, 76, 1624–1635.
Reynolds, C. W. (1985) Description and control of time and dynamics in computer animation. SIGGRAPH Advanced Computer Animation Course Notes, 21–42.
Saltzman, E. L., Rubin, P. E., Goldstein, L. & Browman, C. P. (1987) Task-dynamic modeling of interarticulator coordination. Journal of the Acoustical Society of America, 82, S15.
Terzopoulous, D. & Waters K. (1990) Muscle parameter estimation from image sequences. SIGGRAPH Facial Animation Course Notes, 146–155.
Terzopoulous, D. & Waters K. (1991) Techniques for realistic facial modeling and animation. in N. M. Thalmann and D. Thalmann (Eds.) Computer Animation ‘81. Tokyo: Springer-Verlag.
VOTRAX (1981) User’s Manual Votrax, Div. of Federal Screw Works.
Waters, K. (1987) A muscle model for animating three-dimensional facial expression. IEEE Computer Graphics, 21(4).
Waters, K. (1990) Modeling 3D facial expressions. SIGGRAPH Facial Animation Course Notes, 109–129.
Waters, K. & Terzopoulous, D. (1990) A physical model of facial tissue and muscle articulation. SIG-GRAPH Facial Animation Course Notes, 130–145.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer Japan
About this paper
Cite this paper
Cohen, M.M., Massaro, D.W. (1993). Modeling Coarticulation in Synthetic Visual Speech. In: Thalmann, N.M., Thalmann, D. (eds) Models and Techniques in Computer Animation. Computer Animation Series. Springer, Tokyo. https://doi.org/10.1007/978-4-431-66911-1_13
Download citation
DOI: https://doi.org/10.1007/978-4-431-66911-1_13
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-66913-5
Online ISBN: 978-4-431-66911-1
eBook Packages: Springer Book Archive