Skip to main content

Audio-Visual Speech Perception Without Speech Cues: A First Report

  • Chapter
Speechreading by Humans and Machines

Part of the book series: NATO ASI Series ((NATO ASI F,volume 150))

Abstract

A primary objective of a theory of audio-visual speech perception is to describe the process of audio-visual integration and the form of the auditory and visual streams of information. An experiment was conducted in which listeners were presented with audio-visual sentences in a transcription task. The visual components of the stimuli consisted of a face of a male talker. The acoustic components of the audio-visual stimuli consisted of: (1) natural speech (2) envelope-shaped noise which preserved the duration and amplitude of the original speech waveform and (3) various types of sinewave speech signals which preserved different aspects of the time-varying spectrum of the original speech signals. Sinewave speech is a skeletonized version of a natural utterance which contains frequency and amplitude variation of the formants, but lacks any fine-grained acoustic structure of speech. When all three formants are represented in this form (T1+T2+T3) and listeners are told they are listening to speech, the intelligibility of sentences is relatively high (above 75%) (Remez, Rubin, Pisoni, & Carrell 1981). However, when listeners are presented with only single tones (Ti, T2, or T3) performance falls to almost zero. Preliminary results reported here indicate that intelligibility of sine-wave sentences is greatly increased when visual information is combined with the auditory signal. We predicted that the increase in intelligibility for the sinewave speech with an added video display would be greater than the gain observed with the envelope-shaped noise. This prediction is based on the assumption that the phonetic properties of spoken utterances are retained in the audio-visual stream of the sine-wave condition. The results demonstrate that visual information significantly increases the intelligibility of the tonal analog of the second formant, but not the tonal analog of the first formant or the bit-flipped noise. Suggesting that the information contained in the tone 2 analog is useful for audio-visual integration. Thus, the dynamic time-varying properties of the vocal tract transfer function that are encoded in both the optical and acoustic signals play an important role in speech intelligibility, and therefore need to be incorporated in theoretical accounts of audio-visual speech perception.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 1996 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

SaldaƱa, H.M., Pisoni, D.B., Fellowes, J.M., Remez, R.E. (1996). Audio-Visual Speech Perception Without Speech Cues: A First Report. In: Stork, D.G., Hennecke, M.E. (eds) Speechreading by Humans and Machines. NATO ASI Series, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-13015-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-13015-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-08252-8

  • Online ISBN: 978-3-662-13015-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics