Encyclopedia of Computational Neuroscience

Living Edition
| Editors: Dieter Jaeger, Ranu Jung

Electrophysiological Indices of Speech Processing

  • Sonja A. KotzEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7320-6_518-1


Target Word Speech Signal N400 Amplitude Sentence Context Speech Intelligibility 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Some Core Facts About Event-Related Brain Potentials (ERPs)

Speech and its acoustic and linguistic properties essentially evolve in time (Kotz and Schwartze 2010). Therefore, high-temporal-resolution methods such as the electroencephalography (EEG) or magnetoencephalography (MEG) are well suited to trace the temporal unfolding of the speech signal. In particular, ERPs embedded in the EEG permit speech to be monitored with millisecond resolution. By time-locking and averaging ERPs to a large number of specific and similar speech events, responses to acoustic and linguistic properties of these events can tell us how an event is perceived and understood as it occurs. The resulting wavelike pattern consists of an alteration of positive and negative peaks that, when compared to a control condition, leads to the emergence of components that are defined by their polarity (positive or negative), by the delay after the onset of an event of interest (latency), and by their distribution across the scalp (topography). For example, the N400 component is a negative deflection (hence “N”) that peaks around 400 ms post-stimulus onset. Note that while the N400 has a relatively global scalp distribution in the auditory modality, we cannot relate cognitive processes tied to the N400 scalp topography to specific underlying neural sources in the brain. The reason is that EEG activity is oriented orthogonally to the sulcated cortex surface and not to the skull surface. Consequently, ERP patterns recorded at the surface of the skull likely reflect the summation of infinite large numbers of differently oriented sources (generators).

What Can ERPs Tell Us About Successful Speech Comprehension?

When listening to speech, our primary goal is to understand what is said. This necessitates the integration of sensory, perceptual, and cognitive processes to ensure successful speech comprehension (Obleser and Kotz 2011). While there has been ample research identifying the neural sources of such integration in neutral (for a review, see Scott and Johnsrude 2003) and emotional speech (for a review, see Schirmer and Kotz 2006), there is less evidence from ERP studies. This is surprising in light of the fact that ERPs monitor the unfolding speech signal from the sensory to the cognitive stage at millisecond resolution and may thus give more precise insight into not only how these processes unfold but also how they are integrated with high temporal precision. Therefore, a number of exemplar ERP studies that have explored the sensory-cognitive interface in speech processing will be reviewed in the following.

A good way to test how early sensory and later cognitive processes in speech processing interact is to study speech in noise. In principle there are at least three sources of noise that can impact successful speech comprehension: noise (i) in the environment, (ii) in the hearing system, and (iii) in the speech signal itself. All three types of noise are considered to disrupt the acoustic/phonetic interface and consequently successful lexico-semantic integration (e.g., Boulenger et al. 2011). A number of ERP investigations have therefore systematically manipulated speech signal quality to look at its consequences on lexico-semantic integration and its impact on successful speech comprehension. In this research two ERP components feature prominently: the N100 and the N400.

Two Components of Interest: N100 and N400

Early ERP components such as the N100 have been linked to the physical properties of a stimulus and are considered to reflect early obligatory sensory processing (e.g., Luck 2005; Steinschneider and Dunn 2002). In audition the N100 is considered to encode a sound’s onset (e.g., Martin et al. 2008) and to index early perception formation (e.g., Näätänen 2001). It is also attuned to task demands in speech processing (e.g., Bonte et al. 2009; Obleser et al. 2004; Poeppel et al. 1996 for N100m magnetoencephalographic evidence) and the degradation of the sound signal (Miettinen et al. 2010).

N400 research looks back onto a long tradition and has tested semantic anomalies (whether a word fits into a sentence context or not; e.g., Kutas and Hillyard 1980), semantic cloze probability (how likely a word fits a sentence context; e.g., Kutas and Hillyard 1984), context manipulation (how expected a word is in a sentence context; e.g., Van Petten and Kutas 1990), and semantic priming (does a preceding word facilitate the processing of a following word; e.g., Holcomb 1988) in word lists and sentence paradigms in both the auditory and visual modalities. A common viewpoint is that a rise in N400 amplitude reflects the effort of how successfully lexico-semantic integration has taken place (Chwilla et al. 1995; Van Berkum et al. 1999; Friederici 2002; for recent comprehensive reviews, see Lau et al. 2008; Kutas and Federmeier 2011). As such, any modulation of the N400 amplitude, latency, or topography as a function of speech quality is an excellent marker of how alterations of acoustic/phonetic speech properties affect the outcome of lexico-semantic integration and, consequently, successful speech comprehension.

Some Exemplar ERP Investigations of Speech Comprehension

Aydelott and colleagues (2006) acoustically degraded the speech signal by utilizing a 1-kHz low-pass filter, thereby perceptually changing it. Further, the effect of degradation on lexico-semantic integration was tested on target words that either semantically matched or mismatched a given sentence context. Thus, the ease of lexico-semantic integration was tested by (i) contextually driving semantic expectancy and (ii) manipulating the signal quality of the sentence context. Results revealed that the N100 response to target words in degraded speech context was enhanced. In addition, the N400 effect (e.g., the difference in amplitude rise to matching and mismatching target words in a sentence context) was reduced in degraded speech when compared to non-degraded speech. The authors concluded that acoustic degradation of the speech signal affects both early perceptual (N100) and later lexico-semantic integration processes (N400). However, the authors did not further address how early and late degradation effects relate to each other. In an attempt to mimic real-life noise situations, Sivonen and colleagues (2006) altered the beginning of sentence final high- and low-cloze probability words by overlaying half of the sentence final words with a cough (short or long) that reduced the phonetic information of the word. Long but not short coughs elicited a larger N100 response to high-cloze probability words in continuous speech. Even though coughs disrupted early phoneme detection, target words were recognized. While there was no N400 amplitude difference for low-cloze compared to high-cloze probability words when words were presented in a noise condition, the onset of the N400 was delayed indicating some difficulties in lexico-semantic integration. The authors suggested that when semantic context is strong enough, incomplete phonemic information can be compensated (e.g., a cough overlaying phonemes).

Obleser and Kotz (2011) used spectrally degraded speech (Obleser and Kotz 2010) by applying a noise-band vocoding algorithm that alters the degree of speech intelligibility (Shannon et al. 1995). By varying the cloze probability of sentence final words, they tested how small semantic changes in a sentence context affect speech comprehension when the speech signal is compromised. Next to the expected modulation of the N400, the authors reported an early sensory-driven N100 response to speech signal quality at sentence onset. Both the N100 amplitude and latency responded to the degradation strength of the speech signal (strong speech signal degradation led to largest N100 rise and peak). Most importantly, the degree of speech intelligibility significantly impacted the degree of lexico-semantic integration in the case of low-cloze probability final words. The more intelligible the speech signal, the better the response to unexpected target words in a given sentence context. Therefore, these results clearly show that early sensory-driven effects interact with lexico-semantic integration (for more information on induced oscillatory brain activity, see Obleser and Kotz 2011).

A further study by Boulenger and colleagues (2011) tested whether temporally reversing (altering the temporal properties of speech while keeping spectral properties intact) the signal quality of a sentence final word affects how a high-cloze or a low-cloze probability word is integrated into a sentence context. Again, the core of the investigation was to find out how alterations of the speech signal affect lexico-semantic integration. The authors reported early fronto-central negativity in response to target words independent of the degree of temporal reversal (e.g., how much of the speech signal was temporally altered) and cloze probability. This early negativity was spatiotemporally comparable to a mismatch negativity (MMN) recorded in the same participants in an auditory oddball (detection of syllable deviance in temporally reversed speech) in a stream of standard syllables (temporally non-reversed speech). In line with an MMN interpretation that links this component to an automatic and fine-tuned discrimination auditory response mechanism (e.g., Näätänen 2001), the authors consider this interpretation also valid in auditory speech comprehension, that is, the MMN response is similar to sinusoidal sound or speech sounds. When looking at the effects of time reversal on lexico-semantic integration, the authors observed an interaction. The degree of temporal reversal (low rather than high reversal) reduced the N400 amplitude response to low-cloze probability words at fronto-central electrode sites. The authors conclude similar to previous accounts that early acoustic/phonetic information interacts with lexico-semantic integration to ensure successful speech comprehension.

Strauß and colleagues (Strauß et al. 2013) pushed forward the investigation of speech intelligibility (three levels of degradation) and lexico-semantic expectancy. The authors asked the critical question which one of the two factors supporting successful speech comprehension drives the interaction. In other words, is it the bottom-up sensory information or the top-down lexico-semantic information that drives the interaction? The extent of lexico-semantic expectancy in sentence context was varied by manipulating the context strength as well as the cloze probability of the critical word. The results showed that the N400 response to both high- and cloze probability words in a strong sentence context did not differ when speech was intelligible. However, when speech was moderately degraded, only high-cloze probability words in a strong sentence context revealed a decrease in N400 amplitude. This indicates that perceptual constraints may critically impact how context can influence lexico-semantic integration.

In summary, the current evidence on how early sensory and late cognitive processes interact to ensure successful speech comprehension appears to depend upon two processing streams, that is, a bottom-up sensory one and a top-down contextual one. Clearly future research will have to show how task- and/or stimulus-driven demands engage bottom-up or top-down processing streams during successful lexico-semantic integration in speech comprehension.


  1. Aydelott J, Dick F, Mills DL (2006) Effects of acoustic distortion and semantic context on event-related potentials to spoken words. Psychophysiology 43(5):454–464PubMedCrossRefGoogle Scholar
  2. Bonte M, Valente G, Formisano E (2009) Dynamic and task-dependent encoding of speech and voice by phase reorganization of cortical oscillations. J Neurosci 29:1699–1706PubMedCrossRefGoogle Scholar
  3. Boulenger V, Hoen M, Jacquier C, Meunier F (2011) Interplay between acoustic/phonetic and semantic processes during spoken sentence comprehension: an ERP study. Brain Lang 116(2):51–63PubMedCrossRefGoogle Scholar
  4. Chwilla DJ, Brown CM, Hagoort P (1995) The N400 as a function of the level of processing. Psychophysiology 32(3):274–285PubMedCrossRefGoogle Scholar
  5. Friederici AD (2002) Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6(2):78–84PubMedCrossRefGoogle Scholar
  6. Holcomb PJ (1988) Automatic and attentional processing: an event-related brain potential analysis of semantic priming. Brain Lang 35(1):66–85PubMedCrossRefGoogle Scholar
  7. Kotz SA, Schwartze M (2010) Cortical speech processing unplugged: a timely subcortico-cortical framework. Trends Cogn Sci 14(9):392–399PubMedCrossRefGoogle Scholar
  8. Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol 62:621–647PubMedCrossRefGoogle Scholar
  9. Kutas M, Hillyard SA (1980) Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol Psychol 11(2):99–116PubMedCrossRefGoogle Scholar
  10. Kutas M, Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association. Nature 307:161–163PubMedCrossRefGoogle Scholar
  11. Lau EF, Phillips C, Poeppel D (2008) A cortical network for semantics: (de-)constructing the N400. Nat Rev Neurosci 9:920–933PubMedCrossRefGoogle Scholar
  12. Luck SJ (2005) An introduction to the event-related potential technique. MIT Press, Cambridge, MAGoogle Scholar
  13. Martin BA, Tremblay KL, Kroczak P (2008) Speech evoked potentials: from the laboratory to the clinic. Ear Hearing 29:285–313PubMedCrossRefGoogle Scholar
  14. Miettinen I, Tiitinen H, Alku P, May PJ (2010) Sensitivity of the human auditory cortex to acoustic degradation of speech and non-speech sounds. BMC Neurosci 11:24PubMedCentralPubMedCrossRefGoogle Scholar
  15. Näätänen R (2001) The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology 38:1–21PubMedCrossRefGoogle Scholar
  16. Obleser J, Kotz SA (2010) Expectancy constraints in degraded speech modulate the language comprehension network. Cereb Cortex 20(3):633–640PubMedCrossRefGoogle Scholar
  17. Obleser J, Kotz SA (2011) Multiple brain electric signatures of semantic constraints in degraded speech. Neuroimage 55(2):713–723PubMedCrossRefGoogle Scholar
  18. Obleser J, Elbert T, Eulitz C (2004) Attentional influences on functional mapping of speech sounds in human auditory cortex. BMC Neurosci 5:24PubMedCentralPubMedCrossRefGoogle Scholar
  19. Poeppel D, Yellin E, Phillips C, Roberts TP, Rowley HA, Wexler K, Marantz A (1996) Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds. Brain Res/Cogn Brain Res 4:231–242CrossRefGoogle Scholar
  20. Schirmer A, Kotz SA (2006) Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn Sci 10(1):24–30PubMedCrossRefGoogle Scholar
  21. Scott SK, Johnsrude IS (2003) The neuroanatomical and functional organization of speech perception. Trends Neurosci 26(2):100–107PubMedCrossRefGoogle Scholar
  22. Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:303–304PubMedCrossRefGoogle Scholar
  23. Sivonen P, Maess B, Lattner S, Friederici AD (2006) Phonemic restoration in a sentence context: evidence from early and late ERP effects. Brain Res 1121:177–189PubMedCrossRefGoogle Scholar
  24. Steinschneider M, Dunn M (2002) Electrophysiology in developmental neuropsychology, Chap. 5. In: Segalowitz S, Rapin I (eds) Handbook of neuropsychology, 2 edn, vol 8, part 1. Elsevier, Amsterdam, pp 91–146Google Scholar
  25. Strauß A, Kotz SA, Obleser J (2013) Narrowed expectancies under degraded speech: revisiting the N400. J Cogn Neurosci 25(8):1383–1395PubMedCrossRefGoogle Scholar
  26. Van Berkum JJ, Haggort P, Brown CM (1999) Semantic integration in sentences and discourse: evidence from the N400. J Cogn Neurosci 11(6):657–671PubMedCrossRefGoogle Scholar
  27. Van Petten C, Kutas M (1990) Interactions between sentence context and word frequency in event-related potentials. Mem Cognition 18(4):380–393CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.School of Psychological SciencesUniversity of ManchesterManchesterUK
  2. 2.Max Planck for Human Cognitive and Brain SciencesLeipzigGermany