Skip to main content
Log in

Cross-linguistic comparisons in the integration of visual and auditory speech

  • Published:
Memory & Cognition Aims and scope Submit manuscript

Abstract

We examined how speakers of different languages perceive speech in face-to-face communication. These speakers identified synthetic unimodal and bimodal speech syllables made from synthetic auditory and visual five-step /ba/-/da/ continua. In the first experiment, Dutch speakers identified the test syllables as either /ba/ or /da/. To explore the robustness of the results, Dutch and English speakers were given a completely open-ended response task. Tasks in previous studies had always specified a set of alternatives. Similar results were found in the two-alternative and open-ended task. Identification of the speech segments was influenced by both the auditory and the visual sources of information. The results falsified an auditory dominance model (ADM) which assumes that the contribution of visible speech is dependent on poor-quality audible speech. The results also falsified an additive model of perception (AMP) in which the auditory and visual sources are linearly combined. The fuzzy logical model of perception (FLMP) provided a good description of performance, supporting the claim that multiple sources of continuous information are evaluated and integrated in speech perception. These results replicate previous results found with English, Spanish, and Japanese speakers. Although there were significant performance differences, the model analyses indicated no differences in the nature of information processing across language groups. The performance differences across languages were caused by information differences due to different phonologies in Dutch and English. These results suggest that the underlying mechanisms for speech perception are similar across languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Binnie, C. A., Montgomery, A. A., &Jackson, P. L. (1974). Auditory and visual contributions to the perception of selected English consonants for normally hearing and hearing-impaired listeners. In H. Birk Nielsen & E. Kampp (Eds.),Visual and audio-visual perception of speech (Scandinavian Audiology,4[Suppl.], 181–209). Stockholm: Almquist & Wiksell.

    Google Scholar 

  • Breeuwer, M., &Plomp, R. (1984). Speechreading supplemented with frequency-selective sound-pressure information.Journal of the Acoustical Society of America,76, 686–691.

    Article  PubMed  Google Scholar 

  • Campbell, R., &Dodd, B. (1980). Hearing by eye.Quarterly Journal of Experimental Psychology,32, 85–99.

    Article  PubMed  Google Scholar 

  • Chandler, J. P. (1969). Subroutine STEPIT—Finds local minima of a smooth function of several parameters.Behavioral Science,14, 81–82.

    Google Scholar 

  • Cohen, M. M. (1984).Processing of visual and auditory information in speech perception. Unpublished doctoral dissertation, University of California, Santa Cruz.

    Google Scholar 

  • Cohen, M. M., &Massaro, D. W. (1990). Synthesis of visible speech.Behavior Research Methods, Instruments, & Computers,22, 260–263.

    Article  Google Scholar 

  • Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 362–381.

    Google Scholar 

  • Gouraud, H. (1971). Continuous shading of curved surfaces.IEEE Transactions on Computers,C-20, 623–628.

    Article  Google Scholar 

  • Green, K. P., &Kuhl, P. K. (1989). The role of visual information in the processing of place and manner features in speech perception.Perception & Psychophysics,45, 34–42.

    Article  Google Scholar 

  • Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer.Journal of the Acoustical Society of America,67, 971–995.

    Article  Google Scholar 

  • Lindau, M., &Ladefoged, P. (1986). Variability of feature specifications. In J. S. Perkell & D. H. Klatt (Eds.),Invariance and variability of speech processes (pp. 464–478). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • MacWhinney, B., &Bates, E. (Eds.) (1989).The crosslinguistic study of sentence processing. New York: Cambridge University Press.

    Google Scholar 

  • Maddieson, I. (1984).Patterns of sounds. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Massaro, D. W. (1987).Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Massaro, D. W. (1988). Ambiguity in perception and experimentation.Journal of Experimental Psychology: General,117, 417–421.

    Article  Google Scholar 

  • Massaro, D. W. (1989a). Multiple book review of Speech perception by ear and eye: A paradigm for psychological inquiry.Behavioral & Brain Sciences,12, 741–794.

    Article  Google Scholar 

  • Massaro, D. W. (1989b). Testing between the TRACE model and the fuzzy logical model of perception.Cognitive Psychology,21, 398–421.

    Article  PubMed  Google Scholar 

  • Massaro, D. W. (1990). A fuzzy logical model of speech perception. In D. Vickers & P. L. Smith (Eds.),Human information processing: Measures, mechanisms, and models (pp. 367–379). Amsterdam: North-Holland.

    Google Scholar 

  • Massaro, D. W., &Cohen, M. M. (1983). Evaluation and integration of visual and auditory information in speech perception.Journal of Experimental Psychology: Human Perception & Performance,9, 753–771.

    Google Scholar 

  • Massaro, D. W., &Cohen, M. M. (1990). Perception of synthesized audible and visible speech.Psychological Science,1, 55–63.

    Article  Google Scholar 

  • Massaro, D. W., &Cohen, M. M. (1993a). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.

    Article  Google Scholar 

  • Massaro, D. W., &Cohen, M. M. (1993b). Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables.Speech Communication,13, 127–134.

    Article  Google Scholar 

  • Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.

    Article  PubMed  Google Scholar 

  • Massaro, D. W., Tsuzaki, M., Cohen, M. M., Gesi, A., &Heredia, R. (1993). Bimodal speech perception: An examination across languages.Journal of Phonetics,21, 445–478.

    Google Scholar 

  • McGurk, H., &MacDonald, J. (1976). Hearing lips and seeing voices.Nature,264, 746–748.

    Article  PubMed  Google Scholar 

  • Parke, F. I. (1974).A parametric model for human faces (Tech. Rep. UTEC-CSc-75-047). Salt Lake City: University of Utah, Department of Computer Science.

    Google Scholar 

  • Parke, F. I. (1975). A model for human faces that allows speech synchronized animation.Computers & Graphics Journal,1, 1–4.

    Article  Google Scholar 

  • Parke, F. I. (1982). Parameterized models for facial animation.IEEE Computer Graphics,2(9), 61–68.

    Article  Google Scholar 

  • Pearce, A., Wyvill, B., Wyvill, G., & Hill, D. (1986). Speech and expression: A computer solution to face animation. InProceedings of Graphics Interface '86 (pp. 136-140).

  • Platt, J. R. (1964). Strong inference.Science,146, 347–353.

    Article  PubMed  Google Scholar 

  • Reisberg, D., McLean, J., &Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 97–113). Hove, U.K.: Erlbaum.

    Google Scholar 

  • Sekiyama, K., &Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility.Journal of the Acoustical Society of America,90, 1797–1805.

    Article  PubMed  Google Scholar 

  • Sekiyama, K., &Tohkura, Y. (1993). Inter-language differences in the influence of visual cues in speech perception.Journal of Phonetics,21, 427–444.

    Google Scholar 

  • Smeele, P. M. T., & Sittig, A. C. (1991a). The contribution of vision to speech perception. InProceedings of the 2nd European Conference on Speech Communication and Technology, Eurospeech 91 (pp. 1495-1497).

  • Smeele, P. M. T., & Sittig, A. C. (1991b). Effects of desynchronization of vision and speech on the perception of speech: Preliminary results. InCCITT Brazil Conference Sept. '91 (Stgrp. XII, Wp. XII/2 and XII/3, Contribution D.81).

  • Smeele, P. M. T., Sittig, A. C., &van Heuven, V. J. (1992). Intelligibility of audio-visually desynchronised speech: Asymmetrical effect of phoneme position.Proceedings of the International Conference on Spoken Language Processing 92,1, 65–68.

    Google Scholar 

  • Studdert-Kennedy, M. (1989). Reading gestures by light and sound. In A. W. Young & H. D. Ellis (Eds.),Handbook of research on face processing (pp. 217–222). Amsterdam: North-Holland.

    Chapter  Google Scholar 

  • Summerfield, A. Q. (1979). Use of visual information in phonetic perception.Phonetica,36, 314–331.

    Article  PubMed  Google Scholar 

  • Thompson, L. A., &Massaro, D. W. (1989). Before you see it, you see its parts: Evidence for feature encoding and integration in preschool children and adults.Cognitive Psychology,21, 334–362.

    Article  PubMed  Google Scholar 

  • Vroomen, J. H. M. (1992).Hearing voices and seeing lips: Investigations in the psychology of lipreading. Unpublished doctoral dissertation, Katholieke Universiteit Brabant.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominic W. Massaro.

Additional information

The research reported in this paper and the writing of the paper were supported, in part, by grants from the Public Health Service (PHS R01 NS 20314), the National Science Foundation (BNS 8812728), and the graduate division of the University of California, Santa Cruz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Massaro, D.W., Cohen, M.M. & Smeele, P.M.T. Cross-linguistic comparisons in the integration of visual and auditory speech. Memory & Cognition 23, 113–131 (1995). https://doi.org/10.3758/BF03210561

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/BF03210561

Keywords

Navigation