Cross-linguistic comparisons in the integration of visual and auditory speech

Massaro, Dominic W.; Cohen, Michael M.; Smeele, Paula M. T.

doi:10.3758/BF03210561

Cross-linguistic comparisons in the integration of visual and auditory speech

Published: 01 January 1995

Volume 23, pages 113–131, (1995)
Cite this article

Memory & Cognition Aims and scope Submit manuscript

Dominic W. Massaro¹,
Michael M. Cohen¹ &
Paula M. T. Smeele²

213 Accesses
23 Citations
Explore all metrics

Abstract

We examined how speakers of different languages perceive speech in face-to-face communication. These speakers identified synthetic unimodal and bimodal speech syllables made from synthetic auditory and visual five-step /ba/-/da/ continua. In the first experiment, Dutch speakers identified the test syllables as either /ba/ or /da/. To explore the robustness of the results, Dutch and English speakers were given a completely open-ended response task. Tasks in previous studies had always specified a set of alternatives. Similar results were found in the two-alternative and open-ended task. Identification of the speech segments was influenced by both the auditory and the visual sources of information. The results falsified an auditory dominance model (ADM) which assumes that the contribution of visible speech is dependent on poor-quality audible speech. The results also falsified an additive model of perception (AMP) in which the auditory and visual sources are linearly combined. The fuzzy logical model of perception (FLMP) provided a good description of performance, supporting the claim that multiple sources of continuous information are evaluated and integrated in speech perception. These results replicate previous results found with English, Spanish, and Japanese speakers. Although there were significant performance differences, the model analyses indicated no differences in the nature of information processing across language groups. The performance differences across languages were caused by information differences due to different phonologies in Dutch and English. These results suggest that the underlying mechanisms for speech perception are similar across languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration of Face and Voice During Emotion Perception: Is There Anything Gained for the Perceptual System Beyond Stimulus Modality Redundancy?

Audiovisual Integration of Emotional Information from Voice and Face

Face-voice space: Integrating visual and auditory cues in judgments of person distinctiveness

Article 21 July 2020

References

Binnie, C. A., Montgomery, A. A., &Jackson, P. L. (1974). Auditory and visual contributions to the perception of selected English consonants for normally hearing and hearing-impaired listeners. In H. Birk Nielsen & E. Kampp (Eds.),Visual and audio-visual perception of speech (Scandinavian Audiology,4[Suppl.], 181–209). Stockholm: Almquist & Wiksell.
Google Scholar
Breeuwer, M., &Plomp, R. (1984). Speechreading supplemented with frequency-selective sound-pressure information.Journal of the Acoustical Society of America,76, 686–691.
Article PubMed Google Scholar
Campbell, R., &Dodd, B. (1980). Hearing by eye.Quarterly Journal of Experimental Psychology,32, 85–99.
Article PubMed Google Scholar
Chandler, J. P. (1969). Subroutine STEPIT—Finds local minima of a smooth function of several parameters.Behavioral Science,14, 81–82.
Google Scholar
Cohen, M. M. (1984).Processing of visual and auditory information in speech perception. Unpublished doctoral dissertation, University of California, Santa Cruz.
Google Scholar
Cohen, M. M., &Massaro, D. W. (1990). Synthesis of visible speech.Behavior Research Methods, Instruments, & Computers,22, 260–263.
Article Google Scholar
Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 362–381.
Google Scholar
Gouraud, H. (1971). Continuous shading of curved surfaces.IEEE Transactions on Computers,C-20, 623–628.
Article Google Scholar
Green, K. P., &Kuhl, P. K. (1989). The role of visual information in the processing of place and manner features in speech perception.Perception & Psychophysics,45, 34–42.
Article Google Scholar
Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer.Journal of the Acoustical Society of America,67, 971–995.
Article Google Scholar
Lindau, M., &Ladefoged, P. (1986). Variability of feature specifications. In J. S. Perkell & D. H. Klatt (Eds.),Invariance and variability of speech processes (pp. 464–478). Hillsdale, NJ: Erlbaum.
Google Scholar
MacWhinney, B., &Bates, E. (Eds.) (1989).The crosslinguistic study of sentence processing. New York: Cambridge University Press.
Google Scholar
Maddieson, I. (1984).Patterns of sounds. Cambridge: Cambridge University Press.
Book Google Scholar
Massaro, D. W. (1987).Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
Google Scholar
Massaro, D. W. (1988). Ambiguity in perception and experimentation.Journal of Experimental Psychology: General,117, 417–421.
Article Google Scholar
Massaro, D. W. (1989a). Multiple book review of Speech perception by ear and eye: A paradigm for psychological inquiry.Behavioral & Brain Sciences,12, 741–794.
Article Google Scholar
Massaro, D. W. (1989b). Testing between the TRACE model and the fuzzy logical model of perception.Cognitive Psychology,21, 398–421.
Article PubMed Google Scholar
Massaro, D. W. (1990). A fuzzy logical model of speech perception. In D. Vickers & P. L. Smith (Eds.),Human information processing: Measures, mechanisms, and models (pp. 367–379). Amsterdam: North-Holland.
Google Scholar
Massaro, D. W., &Cohen, M. M. (1983). Evaluation and integration of visual and auditory information in speech perception.Journal of Experimental Psychology: Human Perception & Performance,9, 753–771.
Google Scholar
Massaro, D. W., &Cohen, M. M. (1990). Perception of synthesized audible and visible speech.Psychological Science,1, 55–63.
Article Google Scholar
Massaro, D. W., &Cohen, M. M. (1993a). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.
Article Google Scholar
Massaro, D. W., &Cohen, M. M. (1993b). Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables.Speech Communication,13, 127–134.
Article Google Scholar
Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.
Article PubMed Google Scholar
Massaro, D. W., Tsuzaki, M., Cohen, M. M., Gesi, A., &Heredia, R. (1993). Bimodal speech perception: An examination across languages.Journal of Phonetics,21, 445–478.
Google Scholar
McGurk, H., &MacDonald, J. (1976). Hearing lips and seeing voices.Nature,264, 746–748.
Article PubMed Google Scholar
Parke, F. I. (1974).A parametric model for human faces (Tech. Rep. UTEC-CSc-75-047). Salt Lake City: University of Utah, Department of Computer Science.
Google Scholar
Parke, F. I. (1975). A model for human faces that allows speech synchronized animation.Computers & Graphics Journal,1, 1–4.
Article Google Scholar
Parke, F. I. (1982). Parameterized models for facial animation.IEEE Computer Graphics,2(9), 61–68.
Article Google Scholar
Pearce, A., Wyvill, B., Wyvill, G., & Hill, D. (1986). Speech and expression: A computer solution to face animation. InProceedings of Graphics Interface '86 (pp. 136-140).
Platt, J. R. (1964). Strong inference.Science,146, 347–353.
Article PubMed Google Scholar
Reisberg, D., McLean, J., &Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 97–113). Hove, U.K.: Erlbaum.
Google Scholar
Sekiyama, K., &Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility.Journal of the Acoustical Society of America,90, 1797–1805.
Article PubMed Google Scholar
Sekiyama, K., &Tohkura, Y. (1993). Inter-language differences in the influence of visual cues in speech perception.Journal of Phonetics,21, 427–444.
Google Scholar
Smeele, P. M. T., & Sittig, A. C. (1991a). The contribution of vision to speech perception. InProceedings of the 2nd European Conference on Speech Communication and Technology, Eurospeech 91 (pp. 1495-1497).
Smeele, P. M. T., & Sittig, A. C. (1991b). Effects of desynchronization of vision and speech on the perception of speech: Preliminary results. InCCITT Brazil Conference Sept. '91 (Stgrp. XII, Wp. XII/2 and XII/3, Contribution D.81).
Smeele, P. M. T., Sittig, A. C., &van Heuven, V. J. (1992). Intelligibility of audio-visually desynchronised speech: Asymmetrical effect of phoneme position.Proceedings of the International Conference on Spoken Language Processing 92,1, 65–68.
Google Scholar
Studdert-Kennedy, M. (1989). Reading gestures by light and sound. In A. W. Young & H. D. Ellis (Eds.),Handbook of research on face processing (pp. 217–222). Amsterdam: North-Holland.
Chapter Google Scholar
Summerfield, A. Q. (1979). Use of visual information in phonetic perception.Phonetica,36, 314–331.
Article PubMed Google Scholar
Thompson, L. A., &Massaro, D. W. (1989). Before you see it, you see its parts: Evidence for feature encoding and integration in preschool children and adults.Cognitive Psychology,21, 334–362.
Article PubMed Google Scholar
Vroomen, J. H. M. (1992).Hearing voices and seeing lips: Investigations in the psychology of lipreading. Unpublished doctoral dissertation, Katholieke Universiteit Brabant.

Download references

Author information

Authors and Affiliations

Program in Experimental Psychology, University of California, Clark Kerr Hall, Santa Cruz, CA, 95064
Dominic W. Massaro & Michael M. Cohen
Delft University of Technology, Delft, The Netherlands
Paula M. T. Smeele

Authors

Dominic W. Massaro
View author publications
You can also search for this author in PubMed Google Scholar
Michael M. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Paula M. T. Smeele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominic W. Massaro.

Additional information

The research reported in this paper and the writing of the paper were supported, in part, by grants from the Public Health Service (PHS R01 NS 20314), the National Science Foundation (BNS 8812728), and the graduate division of the University of California, Santa Cruz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Massaro, D.W., Cohen, M.M. & Smeele, P.M.T. Cross-linguistic comparisons in the integration of visual and auditory speech. Memory & Cognition 23, 113–131 (1995). https://doi.org/10.3758/BF03210561

Download citation

Received: 09 December 1993
Accepted: 04 April 1994
Published: 01 January 1995
Issue Date: January 1995
DOI: https://doi.org/10.3758/BF03210561

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-linguistic comparisons in the integration of visual and auditory speech

Abstract

Access this article

Similar content being viewed by others

Integration of Face and Voice During Emotion Perception: Is There Anything Gained for the Perceptual System Beyond Stimulus Modality Redundancy?

Audiovisual Integration of Emotional Information from Voice and Face

Face-voice space: Integrating visual and auditory cues in judgments of person distinctiveness

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-linguistic comparisons in the integration of visual and auditory speech

Abstract

Access this article

Similar content being viewed by others

Integration of Face and Voice During Emotion Perception: Is There Anything Gained for the Perceptual System Beyond Stimulus Modality Redundancy?

Audiovisual Integration of Emotional Information from Voice and Face

Face-voice space: Integrating visual and auditory cues in judgments of person distinctiveness

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation