Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence: Evidence for Endogenous Brain Rhythms in Speech Perception?

Ghitza, Oded; Greenberg, Steven

doi:10.1007/978-1-4419-5686-6_37

Oded Ghitza⁴ &
Steven Greenberg

1328 Accesses
2 Citations

Abstract

This study was motivated by the hypothesis that low-frequency cortical oscillations help the brain decode the speech signal. The intelligibility (in terms of word error rate) of natural-sounding, synthetically-generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of modulation frequencies. The material comprised 96 semantically unpredictable sentences (SUS), each approximately 2 s long, generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original and whose intelligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by segmenting the time-compressed waveform into consecutive 40-ms fragments each followed by a silent interval. The parameters varied were the length of the silent interval (0-160 ms) and whether the intervals of silence were equal (“periodic”) or not (“aperiodic”). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate occurred when the silence was 80-ms long and inserted periodically. This was also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (“TEMPO”) in which low-frequency brain rhythms influence the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high-theta frequency brain rhythms (6-12 Hz), comparable to the rate at which segments and syllables are spoken in conversational speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bastiaansen M, Hagoort P (2006) Oscillatory neuronal dynamics during language comprehension. Prog Brain Res 159:179–196
Article PubMed Google Scholar
Bunnell HT, Pennington C, Yarrington D et al (2005) Automatic personal synthetic voice construction. In: Proceedings of Interspeech-2005, Lisbon, Portugal, pp 89–92
Google Scholar
Buzsáki G (2006) Rhythms of the brain. Oxford University Press, New York
Book Google Scholar
Freeman W (2007) My legacy: a launch pad for exploring neocortex. In: Brain network dynamics conference, Berkeley, CA, 26 January 2007
Google Scholar
Ghitza O (2007) Using auditory feedback and rhythmicity for diphone discrimination of degraded speech. In: Proceedings of XVIth International Congress of Phonetic Sciences, Saarbrücken, Germany, pp 163–168
Google Scholar
Giraud AL, Kleinschmidt A, Poeppel D et al (2007) Endogenous cortical rhythms determine cerebral specialisation for speech perception and production. Neuron 56:1127–1134
Article PubMed CAS Google Scholar
Greenberg S (1999) Speaking in shorthand - a syllable-centric perspective for understanding pronunciation variation. Speech Commun 29:159–176
Article Google Scholar
Greenberg S, Arai T (2004) What are the essential cues for understanding spoken language? IEICE Trans Inf Syst E87:1059–1070
Google Scholar
Greenberg S, Hollenback J, Ellis D (1996) Insights into spoken language gleaned from transcription of the Switchboard corpus. In: Proceedings of Fourth International Conference on Spoken Language Processing, Philadelphia, PA, pp S24-S27
Google Scholar
Huggins AWF (1975) Temporally segmented speech. Percept Psychophys 18:149–157
Article Google Scholar
Ladd R (1996) Intonational phonology. Cambridge University Press, Cambridge
Google Scholar
Lakatos P, Shah AS, Knuth KH et al (2005) An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J Neurophysiol 94:1904–1911
Article PubMed Google Scholar
Liberman M (1975) The intonational system of English. Dissertation, MIT
Google Scholar
Luo H, Poeppel D (2007) Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54:1001–1010
Article PubMed CAS Google Scholar
Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22:167–173
Article Google Scholar
Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467
Article Google Scholar
Roehm D, Schlesewsky M, Bornkessel I et al (2004) Fractionating language comprehension via frequency characteristics of the human EEG. Neuroreport 15:409–412
Article PubMed Google Scholar
Schroeter J (2008) Basic principles of speech synthesis. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 413–428
Chapter Google Scholar
van Santen JPH, Mishra T, Klabbers E (2008) Prosodic processing. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 471–487
Chapter Google Scholar
von Stein A, Sarnthein J (2000) Different frequencies for different scales of cortical integration: from local gamma to long range alpha/theta synchronization. Int J Psychophysiol 38:301–313
Article Google Scholar

Download references

Acknowledgments

This study was funded by a research grant from the Air Force Office of Scientific Research. We thank Dr. Willard Larkin for his encouragement and valuable discussion as well as Professor Tim Bunnell for providing the SUSGEN sentence list. We also thank Dr. Ann Syrdal of AT&T who gave valuable advice about generating the stimuli and Dr. Udi Ghitza who provided valuable assistance with the statistical analyses.

Author information

Authors and Affiliations

Sensimetrics Corporation and Boston University, Boston, MA, USA
Oded Ghitza

Authors

Oded Ghitza
View author publications
You can also search for this author in PubMed Google Scholar
Steven Greenberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oded Ghitza .

Editor information

Editors and Affiliations

Inst. Neurociencias de Castilla y León, Universidad de Salamanca, Av. Alfonso X El Sabio s/n, Salamanca, 37007, Spain
Enrique A. Lopez-Poveda
MRC Inst.of Hearing Research, University Park, Nottingham, NG7 2RD, United Kingdom
Alan R. Palmer
University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ, United Kingdom
Ray Meddis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghitza, O., Greenberg, S. (2010). Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence: Evidence for Endogenous Brain Rhythms in Speech Perception?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_37

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5686-6_37
Published: 16 February 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics