Skip to main content

Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence: Evidence for Endogenous Brain Rhythms in Speech Perception?

  • Conference paper
  • First Online:
The Neurophysiological Bases of Auditory Perception

Abstract

This study was motivated by the hypothesis that low-frequency cortical oscillations help the brain decode the speech signal. The intelligibility (in terms of word error rate) of natural-sounding, synthetically-generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of modulation frequencies. The material comprised 96 semantically unpredictable sentences (SUS), each approximately 2 s long, generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original and whose intelligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by segmenting the time-compressed waveform into consecutive 40-ms fragments each followed by a silent interval. The parameters varied were the length of the silent interval (0-160 ms) and whether the intervals of silence were equal (“periodic”) or not (“aperiodic”). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate occurred when the silence was 80-ms long and inserted periodically. This was also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (“TEMPO”) in which low-frequency brain rhythms influence the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high-theta frequency brain rhythms (6-12 Hz), comparable to the rate at which segments and syllables are spoken in conversational speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bastiaansen M, Hagoort P (2006) Oscillatory neuronal dynamics during language comprehension. Prog Brain Res 159:179–196

    Article  PubMed  Google Scholar 

  • Bunnell HT, Pennington C, Yarrington D et al (2005) Automatic personal synthetic voice construction. In: Proceedings of Interspeech-2005, Lisbon, Portugal, pp 89–92

    Google Scholar 

  • Buzsáki G (2006) Rhythms of the brain. Oxford University Press, New York

    Book  Google Scholar 

  • Freeman W (2007) My legacy: a launch pad for exploring neocortex. In: Brain network dynamics conference, Berkeley, CA, 26 January 2007

    Google Scholar 

  • Ghitza O (2007) Using auditory feedback and rhythmicity for diphone discrimination of degraded speech. In: Proceedings of XVIth International Congress of Phonetic Sciences, Saarbrücken, Germany, pp 163–168

    Google Scholar 

  • Giraud AL, Kleinschmidt A, Poeppel D et al (2007) Endogenous cortical rhythms determine cerebral specialisation for speech perception and production. Neuron 56:1127–1134

    Article  PubMed  CAS  Google Scholar 

  • Greenberg S (1999) Speaking in shorthand - a syllable-centric perspective for understanding pronunciation variation. Speech Commun 29:159–176

    Article  Google Scholar 

  • Greenberg S, Arai T (2004) What are the essential cues for understanding spoken language? IEICE Trans Inf Syst E87:1059–1070

    Google Scholar 

  • Greenberg S, Hollenback J, Ellis D (1996) Insights into spoken language gleaned from transcription of the Switchboard corpus. In: Proceedings of Fourth International Conference on Spoken Language Processing, Philadelphia, PA, pp S24-S27

    Google Scholar 

  • Huggins AWF (1975) Temporally segmented speech. Percept Psychophys 18:149–157

    Article  Google Scholar 

  • Ladd R (1996) Intonational phonology. Cambridge University Press, Cambridge

    Google Scholar 

  • Lakatos P, Shah AS, Knuth KH et al (2005) An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J Neurophysiol 94:1904–1911

    Article  PubMed  Google Scholar 

  • Liberman M (1975) The intonational system of English. Dissertation, MIT

    Google Scholar 

  • Luo H, Poeppel D (2007) Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54:1001–1010

    Article  PubMed  CAS  Google Scholar 

  • Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22:167–173

    Article  Google Scholar 

  • Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467

    Article  Google Scholar 

  • Roehm D, Schlesewsky M, Bornkessel I et al (2004) Fractionating language comprehension via frequency characteristics of the human EEG. Neuroreport 15:409–412

    Article  PubMed  Google Scholar 

  • Schroeter J (2008) Basic principles of speech synthesis. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 413–428

    Chapter  Google Scholar 

  • van Santen JPH, Mishra T, Klabbers E (2008) Prosodic processing. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 471–487

    Chapter  Google Scholar 

  • von Stein A, Sarnthein J (2000) Different frequencies for different scales of cortical integration: from local gamma to long range alpha/theta synchronization. Int J Psychophysiol 38:301–313

    Article  Google Scholar 

Download references

Acknowledgments

This study was funded by a research grant from the Air Force Office of Scientific Research. We thank Dr. Willard Larkin for his encouragement and valuable discussion as well as Professor Tim Bunnell for providing the SUSGEN sentence list. We also thank Dr. Ann Syrdal of AT&T who gave valuable advice about generating the stimuli and Dr. Udi Ghitza who provided valuable assistance with the statistical analyses.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oded Ghitza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this paper

Cite this paper

Ghitza, O., Greenberg, S. (2010). Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence: Evidence for Endogenous Brain Rhythms in Speech Perception?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_37

Download citation

Publish with us

Policies and ethics