Electrophysiological Indices of Speech Processing
KeywordsTarget Word Speech Signal N400 Amplitude Sentence Context Speech Intelligibility
Some Core Facts About Event-Related Brain Potentials (ERPs)
Speech and its acoustic and linguistic properties essentially evolve in time (Kotz and Schwartze 2010). Therefore, high-temporal-resolution methods such as the electroencephalography (EEG) or magnetoencephalography (MEG) are well suited to trace the temporal unfolding of the speech signal. In particular, ERPs embedded in the EEG permit speech to be monitored with millisecond resolution. By time-locking and averaging ERPs to a large number of specific and similar speech events, responses to acoustic and linguistic properties of these events can tell us how an event is perceived and understood as it occurs. The resulting wavelike pattern consists of an alteration of positive and negative peaks that, when compared to a control condition, leads to the emergence of components that are defined by their polarity (positive or negative), by the delay after the onset of an event of interest (latency), and by their distribution across the scalp (topography). For example, the N400 component is a negative deflection (hence “N”) that peaks around 400 ms post-stimulus onset. Note that while the N400 has a relatively global scalp distribution in the auditory modality, we cannot relate cognitive processes tied to the N400 scalp topography to specific underlying neural sources in the brain. The reason is that EEG activity is oriented orthogonally to the sulcated cortex surface and not to the skull surface. Consequently, ERP patterns recorded at the surface of the skull likely reflect the summation of infinite large numbers of differently oriented sources (generators).
What Can ERPs Tell Us About Successful Speech Comprehension?
When listening to speech, our primary goal is to understand what is said. This necessitates the integration of sensory, perceptual, and cognitive processes to ensure successful speech comprehension (Obleser and Kotz 2011). While there has been ample research identifying the neural sources of such integration in neutral (for a review, see Scott and Johnsrude 2003) and emotional speech (for a review, see Schirmer and Kotz 2006), there is less evidence from ERP studies. This is surprising in light of the fact that ERPs monitor the unfolding speech signal from the sensory to the cognitive stage at millisecond resolution and may thus give more precise insight into not only how these processes unfold but also how they are integrated with high temporal precision. Therefore, a number of exemplar ERP studies that have explored the sensory-cognitive interface in speech processing will be reviewed in the following.
A good way to test how early sensory and later cognitive processes in speech processing interact is to study speech in noise. In principle there are at least three sources of noise that can impact successful speech comprehension: noise (i) in the environment, (ii) in the hearing system, and (iii) in the speech signal itself. All three types of noise are considered to disrupt the acoustic/phonetic interface and consequently successful lexico-semantic integration (e.g., Boulenger et al. 2011). A number of ERP investigations have therefore systematically manipulated speech signal quality to look at its consequences on lexico-semantic integration and its impact on successful speech comprehension. In this research two ERP components feature prominently: the N100 and the N400.
Two Components of Interest: N100 and N400
Early ERP components such as the N100 have been linked to the physical properties of a stimulus and are considered to reflect early obligatory sensory processing (e.g., Luck 2005; Steinschneider and Dunn 2002). In audition the N100 is considered to encode a sound’s onset (e.g., Martin et al. 2008) and to index early perception formation (e.g., Näätänen 2001). It is also attuned to task demands in speech processing (e.g., Bonte et al. 2009; Obleser et al. 2004; Poeppel et al. 1996 for N100m magnetoencephalographic evidence) and the degradation of the sound signal (Miettinen et al. 2010).
N400 research looks back onto a long tradition and has tested semantic anomalies (whether a word fits into a sentence context or not; e.g., Kutas and Hillyard 1980), semantic cloze probability (how likely a word fits a sentence context; e.g., Kutas and Hillyard 1984), context manipulation (how expected a word is in a sentence context; e.g., Van Petten and Kutas 1990), and semantic priming (does a preceding word facilitate the processing of a following word; e.g., Holcomb 1988) in word lists and sentence paradigms in both the auditory and visual modalities. A common viewpoint is that a rise in N400 amplitude reflects the effort of how successfully lexico-semantic integration has taken place (Chwilla et al. 1995; Van Berkum et al. 1999; Friederici 2002; for recent comprehensive reviews, see Lau et al. 2008; Kutas and Federmeier 2011). As such, any modulation of the N400 amplitude, latency, or topography as a function of speech quality is an excellent marker of how alterations of acoustic/phonetic speech properties affect the outcome of lexico-semantic integration and, consequently, successful speech comprehension.
Some Exemplar ERP Investigations of Speech Comprehension
Aydelott and colleagues (2006) acoustically degraded the speech signal by utilizing a 1-kHz low-pass filter, thereby perceptually changing it. Further, the effect of degradation on lexico-semantic integration was tested on target words that either semantically matched or mismatched a given sentence context. Thus, the ease of lexico-semantic integration was tested by (i) contextually driving semantic expectancy and (ii) manipulating the signal quality of the sentence context. Results revealed that the N100 response to target words in degraded speech context was enhanced. In addition, the N400 effect (e.g., the difference in amplitude rise to matching and mismatching target words in a sentence context) was reduced in degraded speech when compared to non-degraded speech. The authors concluded that acoustic degradation of the speech signal affects both early perceptual (N100) and later lexico-semantic integration processes (N400). However, the authors did not further address how early and late degradation effects relate to each other. In an attempt to mimic real-life noise situations, Sivonen and colleagues (2006) altered the beginning of sentence final high- and low-cloze probability words by overlaying half of the sentence final words with a cough (short or long) that reduced the phonetic information of the word. Long but not short coughs elicited a larger N100 response to high-cloze probability words in continuous speech. Even though coughs disrupted early phoneme detection, target words were recognized. While there was no N400 amplitude difference for low-cloze compared to high-cloze probability words when words were presented in a noise condition, the onset of the N400 was delayed indicating some difficulties in lexico-semantic integration. The authors suggested that when semantic context is strong enough, incomplete phonemic information can be compensated (e.g., a cough overlaying phonemes).
Obleser and Kotz (2011) used spectrally degraded speech (Obleser and Kotz 2010) by applying a noise-band vocoding algorithm that alters the degree of speech intelligibility (Shannon et al. 1995). By varying the cloze probability of sentence final words, they tested how small semantic changes in a sentence context affect speech comprehension when the speech signal is compromised. Next to the expected modulation of the N400, the authors reported an early sensory-driven N100 response to speech signal quality at sentence onset. Both the N100 amplitude and latency responded to the degradation strength of the speech signal (strong speech signal degradation led to largest N100 rise and peak). Most importantly, the degree of speech intelligibility significantly impacted the degree of lexico-semantic integration in the case of low-cloze probability final words. The more intelligible the speech signal, the better the response to unexpected target words in a given sentence context. Therefore, these results clearly show that early sensory-driven effects interact with lexico-semantic integration (for more information on induced oscillatory brain activity, see Obleser and Kotz 2011).
A further study by Boulenger and colleagues (2011) tested whether temporally reversing (altering the temporal properties of speech while keeping spectral properties intact) the signal quality of a sentence final word affects how a high-cloze or a low-cloze probability word is integrated into a sentence context. Again, the core of the investigation was to find out how alterations of the speech signal affect lexico-semantic integration. The authors reported early fronto-central negativity in response to target words independent of the degree of temporal reversal (e.g., how much of the speech signal was temporally altered) and cloze probability. This early negativity was spatiotemporally comparable to a mismatch negativity (MMN) recorded in the same participants in an auditory oddball (detection of syllable deviance in temporally reversed speech) in a stream of standard syllables (temporally non-reversed speech). In line with an MMN interpretation that links this component to an automatic and fine-tuned discrimination auditory response mechanism (e.g., Näätänen 2001), the authors consider this interpretation also valid in auditory speech comprehension, that is, the MMN response is similar to sinusoidal sound or speech sounds. When looking at the effects of time reversal on lexico-semantic integration, the authors observed an interaction. The degree of temporal reversal (low rather than high reversal) reduced the N400 amplitude response to low-cloze probability words at fronto-central electrode sites. The authors conclude similar to previous accounts that early acoustic/phonetic information interacts with lexico-semantic integration to ensure successful speech comprehension.
Strauß and colleagues (Strauß et al. 2013) pushed forward the investigation of speech intelligibility (three levels of degradation) and lexico-semantic expectancy. The authors asked the critical question which one of the two factors supporting successful speech comprehension drives the interaction. In other words, is it the bottom-up sensory information or the top-down lexico-semantic information that drives the interaction? The extent of lexico-semantic expectancy in sentence context was varied by manipulating the context strength as well as the cloze probability of the critical word. The results showed that the N400 response to both high- and cloze probability words in a strong sentence context did not differ when speech was intelligible. However, when speech was moderately degraded, only high-cloze probability words in a strong sentence context revealed a decrease in N400 amplitude. This indicates that perceptual constraints may critically impact how context can influence lexico-semantic integration.
In summary, the current evidence on how early sensory and late cognitive processes interact to ensure successful speech comprehension appears to depend upon two processing streams, that is, a bottom-up sensory one and a top-down contextual one. Clearly future research will have to show how task- and/or stimulus-driven demands engage bottom-up or top-down processing streams during successful lexico-semantic integration in speech comprehension.
- Luck SJ (2005) An introduction to the event-related potential technique. MIT Press, Cambridge, MAGoogle Scholar
- Steinschneider M, Dunn M (2002) Electrophysiology in developmental neuropsychology, Chap. 5. In: Segalowitz S, Rapin I (eds) Handbook of neuropsychology, 2 edn, vol 8, part 1. Elsevier, Amsterdam, pp 91–146Google Scholar