Abstract
This study was motivated by the hypothesis that low-frequency cortical oscillations help the brain decode the speech signal. The intelligibility (in terms of word error rate) of natural-sounding, synthetically-generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of modulation frequencies. The material comprised 96 semantically unpredictable sentences (SUS), each approximately 2 s long, generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original and whose intelligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by segmenting the time-compressed waveform into consecutive 40-ms fragments each followed by a silent interval. The parameters varied were the length of the silent interval (0-160 ms) and whether the intervals of silence were equal (“periodic”) or not (“aperiodic”). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate occurred when the silence was 80-ms long and inserted periodically. This was also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (“TEMPO”) in which low-frequency brain rhythms influence the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high-theta frequency brain rhythms (6-12 Hz), comparable to the rate at which segments and syllables are spoken in conversational speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bastiaansen M, Hagoort P (2006) Oscillatory neuronal dynamics during language comprehension. Prog Brain Res 159:179–196
Bunnell HT, Pennington C, Yarrington D et al (2005) Automatic personal synthetic voice construction. In: Proceedings of Interspeech-2005, Lisbon, Portugal, pp 89–92
Buzsáki G (2006) Rhythms of the brain. Oxford University Press, New York
Freeman W (2007) My legacy: a launch pad for exploring neocortex. In: Brain network dynamics conference, Berkeley, CA, 26 January 2007
Ghitza O (2007) Using auditory feedback and rhythmicity for diphone discrimination of degraded speech. In: Proceedings of XVIth International Congress of Phonetic Sciences, Saarbrücken, Germany, pp 163–168
Giraud AL, Kleinschmidt A, Poeppel D et al (2007) Endogenous cortical rhythms determine cerebral specialisation for speech perception and production. Neuron 56:1127–1134
Greenberg S (1999) Speaking in shorthand - a syllable-centric perspective for understanding pronunciation variation. Speech Commun 29:159–176
Greenberg S, Arai T (2004) What are the essential cues for understanding spoken language? IEICE Trans Inf Syst E87:1059–1070
Greenberg S, Hollenback J, Ellis D (1996) Insights into spoken language gleaned from transcription of the Switchboard corpus. In: Proceedings of Fourth International Conference on Spoken Language Processing, Philadelphia, PA, pp S24-S27
Huggins AWF (1975) Temporally segmented speech. Percept Psychophys 18:149–157
Ladd R (1996) Intonational phonology. Cambridge University Press, Cambridge
Lakatos P, Shah AS, Knuth KH et al (2005) An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. J Neurophysiol 94:1904–1911
Liberman M (1975) The intonational system of English. Dissertation, MIT
Luo H, Poeppel D (2007) Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54:1001–1010
Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22:167–173
Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467
Roehm D, Schlesewsky M, Bornkessel I et al (2004) Fractionating language comprehension via frequency characteristics of the human EEG. Neuroreport 15:409–412
Schroeter J (2008) Basic principles of speech synthesis. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 413–428
van Santen JPH, Mishra T, Klabbers E (2008) Prosodic processing. In: Benesty J, Sondhi MM, Huang Y (eds) Handbook of speech processing. Springer-Verlag, Berlin, pp 471–487
von Stein A, Sarnthein J (2000) Different frequencies for different scales of cortical integration: from local gamma to long range alpha/theta synchronization. Int J Psychophysiol 38:301–313
Acknowledgments
This study was funded by a research grant from the Air Force Office of Scientific Research. We thank Dr. Willard Larkin for his encouragement and valuable discussion as well as Professor Tim Bunnell for providing the SUSGEN sentence list. We also thank Dr. Ann Syrdal of AT&T who gave valuable advice about generating the stimuli and Dr. Udi Ghitza who provided valuable assistance with the statistical analyses.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Ghitza, O., Greenberg, S. (2010). Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence: Evidence for Endogenous Brain Rhythms in Speech Perception?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_37
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5686-6_37
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)