Background

Substantial data has been collected on the neural substrates of auditory speech perception and production. Lesion data as well as imaging studies have demonstrated that auditory information is processed in a bilateral neural network located in the perisylvian and inferior frontal regions of the brain [13]. However, it remains to be resolved how specific language functions are segregated within this linguistic macro-network and how these functions map onto specific anatomical areas. Recent functional neuroimaging studies of speech perception have begun to specify some of these functional subdivisions by demonstrating how specific anatomical regions are modulated by different types of the information (phonological, prosodic, and semantic). In particular, these studies have drawn attention to several distinct auditory processing streams, which originate in primary auditory cortex. Firstly, there is evidence for lateral neural projections within the superior temporal sulcus (STS) which are involved in the analysis of complex acoustic features [1, 4, 5]. Secondly, there is evidence for an anterior-posterior projection axis with two main neural nodes. One node is located within the lateral superior temporal gyrus (STG), mainly within the STS anterior to Heschl's gyrus (HG). It responds to speech-specific stimuli [6]. The other node has been found in the posterior STG and STS, primarily in the left hemisphere, and responds to the presence of auditory phonetic cues. Recent imaging studies of the IFG have also provided evidence of subsystems for word-frequency, naming vs. discrimination and syntactic difficulty [7]. Finally, neuroimaging studies have shown simultaneous activations in the inferior frontal gyrus (IFG) and in the STG/STS during semantic, phonetic, verbal-emotional categorizing and discrimination, and verbal working memory tasks. Thus, there is a close link between perisylvian and frontal brain areas during auditory speech perception.

The present study was designed to further examine the link between the perisylvian and frontal brain areas during auditory speech perception. Specifically, we investigated the responses of this temporal-frontal network to a stimulus manipulation designed to modulate a bottom-up process and a task manipulation designed to modulate top-down processes. We use the term "bottom-up" to denote the information output by the early automatic mechanisms that encode the physical properties of sensory inputs. One example of a bottom-up processing is the "rate-effect": When auditory stimuli are presented at different rates, several auditory areas show activations, which are positively correlated with the presentation rate (the "rate-effect", for the underlying neurophysiological mechanisms see [10]). The few studies, which have examined this auditory rate-effect, have yielded some inconsistent results. For example, three studies have reported rate-effects bilaterally in primary and secondary auditory cortices [1113], while another study found no rate-effect within the left posterior STG [14]. However, methodological differences between these studies with respect to both the tasks employed (e.g. active discrimination task vs. passive listening [14] and the use of different imaging techniques (e.g. PET vs. fMRI) limit the value of cross study comparisons (see [15] for a discussion on PET/fMRI differences). The present study was designed to re-evaluate the influence of the presentation rate of auditory stimuli when a second top-down factor, the need to semantically categorize stimuli, was varied. Whereas effects of presentation rate have usually been found in superior temporal areas, semantic categorisation has been repeatedly shown to modulate activity in inferior frontal cortex (e.g. Fiez). By varying both these factors, we hoped to distinguish more generally how top-down and bottom-up factors interact in the human linguistic system (see methods).

Results

Performance

Subjects reported that they could understand all of the stimulus words and performed the semantic categorization task well, with average accuracy (percentage of correct detections) of 95.8%. False alarms occurred less than 1 %. Accuracy did not differ significantly for the different presentation rates (repeated measures ANOVA: F (2,8) = 0.327, p > 0.05).

Hemodynamic responses for the "Rate Effect"

Parametric analysis for the word presentation rate revealed strong positive linear correlations between word presentation rate and bilateral hemodynamic responses bilaterally in the superior temporal gyrus (STG) (see Fig. 1 and Fig. 2). Within these clusters there were three activation peaks in each hemisphere (Table 1). One pair was located at (x,y,z): -60, -4, 4 and 56, -8, -4, anterior to Heschl's gyrus (HG) in the Planum Polare, a region identified by Penhune's [16] probabilistic map as probable (25–50%) auditory cortex. A second pair of peaks was found posterior to the first at x,y,z: -64, -16, 8 and 64, -20, 12, within the 75–100% contour of Penhune's probabilistic map for HG. The third pair of peaks was found at x,y,z: -44, -28, 12 and 56, -28, 16, in a posterior part of the STG within the 50–75% contour of Westbury's probability map of the planum temporale (PT). In the right hemisphere, only voxels within the STG were modulated by the presentation rate (Fig. 1). In the left hemisphere, the region of activated voxels was larger, encompassing the entire STG and extending into the STS, anterior insula, and posterior MTG.

Table 1 Local maxima within activated clusters for the "Rate Effect".
Figure 1
figure 1

Effect of word presentation rate (PR) overlaid on horizontal slices of the group's mean brain (neurological convention). Numbers at the bottom left of each slice denotes z-coordinate in MNI-coordinates.

Figure 2
figure 2

Bar graphs indicating local maxima of hemodynamic responses for brain regions significantly correlating with word presentation rate (see also Table 1). x-axis: word presentation rate in Hz; y-axis: effect size (and standard error) in arbitrary units. Abbreviations: HG = Heschl's gyrus (primary auditory area), PP = planum polare, PT = Planum temporale.

Hemodynamic responses for the "Semantic Categorization"

The main effect for semantic categorisation (categorisation > passive listening) was qualified by a strong bilateral activation in the IFG. In the left IFG the peak activation was located at (x,y,z): -44, 24, -4, ventrally and medially adjacent to the anterior insula. An additional local maximum was found in the ventral part of the MFG at x,y,z: -48, 52–8 (see Table 2 and Figure 3). The active cluster also covered parts of the left dorsal IFG (dIFG). The IFG activation in the right hemisphere was comprised of two clusters, one ventral with a peak at x,y,z: 40, 24, -8 and the other dorsal with a peak at: x,y,z: 24,56, 20. The dorsal cluster extended into the posterior MFG with a peak at x,y,z: 36, 48, 16.

Figure 3
figure 3

Effects of semantic categorization (SC) overlaid on coronal slices of the group's mean brain (neurological convention). Numbers at the bottom left of each slice denotes y-coordinate in MNI-coordinates. Encircled areas indicate location of local maxima (see Table 2). Anatomical location of local maxima: 1/2 = inferior frontal gyrus (IFG); 3 = inferior frontal gyrus (IFG), 4 = inferior frontal gyrus (IFG), 5 = medial frontal gyrus (MFG); 6 = medial frontal gyrus (MFG); 7 = medial frontal gyrus (MFG).

Table 2 Local maxima within activated clusters and cluster size for the "Semantic categorization effect".

Interactions between "Presentation Rate" and "Categorization"

We found significant interactions between word presentation rate and semantic categorization within the right medial HG, right anterior PT, left PP, and bilaterally in the insula (Figure 4A). As can be seen in Figure 4B and 5, this interaction is the result of a negative correlation between the hemodynamic response strength and word presentation rate in the categorization condition, coupled with a positive correlation in the passive listening condition in the insular cortex (69% of modulated voxels, green areas (Fig. 4B); post hoc test, p < 0.05, uncorrected). 23 % of the voxels only showed a positive correlation between the hemodynamic response strength and word presentation rates in the passive listening condition (Fig 4B, red areas). Those voxels were located in the right and left HG with activation peaks at 60, -16, 8 and 52, -24, 16 (see Table 4 and Figure 5).

Figure 4
figure 4

A) Effect 1 of interaction of presentation rate and semantic categorization overlaid on horizontal slices of the group's mean brain (neurological convention). Numbers at the bottom left of each slice denotes z-coordinate in MNI-coordinates. Statistical maps thresholded at p < 0.005, corrected for cluster size. B) Post-hoc test for interaction effect within significant clusters (see Fig. 3A). Areas of increased response for passive listening with increased presentation rate (red), areas showing an effect of decreased response with increased presentation rate during semantic categorization(blue) and overlap (green) (see Table 4).

Figure 5
figure 5

Bar graphs show interaction effect for local maxima within each activated cluster at increasing presentation rates (local maxima > 8 mm apart, see Table 3). Blue line indicates passive listening, red line semantic discrimination condition. X-axis: rate in Hz; y-axis: effect size (and standard error) in arbitrary units. Abbreviations: INS = posterior insula, HG = Heschl's Gyrus (primary auditory area), PP = planum polare, PT = Planum temporale.

The inverse interaction pattern occurs in one cluster within the left dorsal IFG (Fig. 6 and Fig. 7). This interaction is mediated by a positive correlation between hemodynamic responses and word presentation rates in the categorization condition, but negatively correlated in the passive listening condition in 65% of the modulated voxels (green areas, Fig 8). Remaining voxels only showed a positive correlation with word presentation rate in the categorization condition (Table 5). A similar but non-significant cluster is present in left ventral IFG. Thus, we found an inverse interaction pattern in temporal and frontal structures (see Tables 3,4, and 5).

Figure 6
figure 6

A) Main effect of categorization > listening (red) and Effect 2 of interaction of presentation rate and word processing (blue) overlaid on coronal slices of the group's mean brain (neurological convention). Statistical maps thresholded at p < 0.05 (see Table 3). B) Post-hoc test for interactioneffect within significant clusters (Blue cluster in Fig. 4A). Effect of increased response for semantic categorization with increased presentation rate (red) and overlap (green) with areas also showing an effect of decreased response with increased presentation rate during passive listening (see Table 5).

Figure 7
figure 7

Bar graphs show interaction effect for local maxima within each activated cluster at increasing presentation rates (> 8 mm apart, see tab. 1). Blue line indicates passive listening, red line semantic categorization condition. x-axis: rate in Hz; y-axis: effect size (and standard error) in arbitrary units. Abbreviations: IFG = inferior frontal Gyrus.

Figure 8
figure 8

Upper: Region of interest (ROI) centered in the left superior posterior temporal gyrus (-60,-36,12) superimposed on a SPM 'glass' brain. Within this spherical ROI (radius 12 mm) 84 voxels showed a word presentation rate effect at p > 0.005. Lower: Bar graph shows increasing effect correlated with increasing presentation rate in the ROI (x-axis presentation rate y-axis, mean effect size averaged across all voxels within the ROI in arbitrary units).

Table 3 Local maxima within activated clusters and cluster size for the "Interaction effects" between word presentation rate and semantic categorization.
Table 4 Local maxima and cluster size for post hoc tests of i1 (p(voxel) < 0.05 uncorr.).
Table 5 Local maxima and cluster size for post hoc tests of i2 (p(voxel) < 0.05 uncorr.).

Discussion

This experiment was designed to evaluate the effect of auditory word presentation rate on hemodynamic responses within speech-related brain areas. Extending previous research, we introduced a top-down variable (semantic categorization vs. passive listening) to study whether the bottom-up effects of presentation rate would interact with the top-down control of semantic processing. In accord with previous studies, we found strong rate-effects bilaterally in HG, the PT, and the PP. Also in accord with previous studies, relative to passive listening, semantic categorization evoked increased hemodynamic responses bilaterally in the ventral and dorsal IFG, extending into the ventral part of the MFG. In addition, we observed activations in a right-sided cluster in the IFG and MFG, located more dorsally than the left-sided IFG activation. Finally, we found a surprising pattern of interactions between semantic categorization and presentation rate bilaterally in the HG, the posterior insula, and the left dorsal IFG. We will discuss these effects separately.

Presentation rate and the auditory cortex

The current results are consistent with previous studies demonstrating that hemodynamic responses in the primary and secondary auditory cortex increase with increasing rates of word presentation [18]. In accordance with Dhankhar et al. [12] we found rate-effects bilaterally (equally strong on both hemispheres) in the whole auditory cortex including HG, PT, and the dorsal bank of the STS. These findings are in contrast to those presented by Price et al[14]. Using PET these investigators reported a linear increase in signal from 0 to 90 words per minute (wpm) in all regions examined except for Wernicke's area (-58 -34 12), leading them to speculate that the left PT (Wernicke's area) is works in a time-invariant mode to subserve comprehension. However, we did not replicate this finding; in our study, the left posterior auditory cortex behaved similarly to the other auditory areas (see Figure 8). In general, our observations indicate that with each word presentation the auditory word processing nodes in HG, PT, and the PP were automatically activated irrespective of whether high level (categorization) processing was required.

Semantic categorization and frontal brain areas

When semantic categorization was explicitly required there were stronger bilateral hemodynamic responses in the IFG extending into the MFG. However, the left-sided activations were located more ventrally than those on the right, and additional right-sided activations were found in dorsal positions. Previous studies have shown that ventral parts of the left IFG are preferentially active during the performance of tasks requiring semantic as opposed to phonological processing, the latter function being associated with activations in a more posterior and dorsal part of the IFG [2, 3, 1924]. Additional studies have argued this area is responsible for response selection in the context of semantic operations [25]. Placing these findings in a more global context, Gabrieli et al. [16] concluded that "activations in left inferior prefrontal cortex reflect a domain-specific semantic working memory capacity that is invoked more for semantic than nonsemantic analyses regardless of stimulus modality, more for initial than for repeated semantic analysis of a word or picture, more when a response must be selected from among many than few legitimate alternatives, and that yields superior later explicit memory for experiences".

The findings of the present study support this conclusion by showing that the left ventral IFG is active during semantic categorization. Interestingly, hemodynamic responses in most parts of this area were independent of word presentation rate, suggesting that the left ventral IFG mainly operates in a task-dependent rather than stimulus-independent fashion. Only a small ventral region of the ventral IFG was influenced by the word presentation rate during semantic categorization, and this only reached the p = 0.07 level. In contrast, the dorsal part of the left IFG was significantly influenced by varying word presentation rates, but we found increasing hemodynamic responses in these areas with increasing presentation rates only during semantic classification, while there was an opposite trend during passive listening. Therefore, the dorsal part of the left IFG is also involved in semantic classification, but in a different way to the ventral IFG. Following the argument made by Gabrieli et al. [19] and Fiez [2] that the left dorsal IFG is involved in phonological processing, one might suggest that the dorsal part is involved in the analysis of phonological features of those words entering semantic analysis later. Since each stimulus word comprises of several phonological features, increasing word presentation rates will increase the processing demands placed on this neural processor.

Although most studies found bilateral activations in the IFG during semantic tasks, right-sided activations in the dorsal right IFG have also been reported for other tasks; among them pattern encoding [26], processing of unusual semantic relationships [27], identification of emotional prosody [8], demanding working memory tasks [28], visually and auditory guided finger movements[29], and learning to associate sensory cues with particular movements according to arbitrary rules [30]. It is difficult to infer from the present data if one of these processes contribute to the right-sided IFG activation found in our study. However, a study by Rypma et al. [28] might support our findings. These authors found activations within the IFG during verbal working memory tasks had a left-sided activation dominance during the easier tasks but a right-sided dominance during difficult tasks, suggesting that the right IFG becomes increasingly active with increasing processing demands. Presumably, our semantic categorization task was more demanding than passive viewing, producing our observed right-sided activation in the IFG.

Interaction between "Presentation Rate" and "Semantic categorization" in temporal and frontal lobes

We found strong interactions between "Presentation Rate" and "Semantic Categorization" with peak activations located bilaterally in the posterior insula (extending into the right HG/PT and the left PP). These interactions were qualified by a strong negative rate effect for the semantic categorization condition (a decreasing hemodynamic response with increasing word presentation rates), and a positive rate effect (increasing hemodynamic response with increasing word presentation rate) for the passive listening condition. Since the insula is strongly interconnected with temporal and frontal structures [31], this brain region may play a role in linking together the different neural networks involved in auditory processing. However, although there is substantial data about the anatomical connections between insular and other brain regions in primates, relatively little is known about the precise function of the human insula in auditory processing.

Cytoarchitectonic studies of the human post-mortem brain have revealed cytoarchitectonic profiles for parts of the posterior insula (area PIA) that may correspond to early auditory areas of an intermediate level between primary auditory areas (A1) and the posterior supratemporal plane (area STA) [32]. Thus, there is some kind of cytoarchitectonic similarity between the auditory cortex and the posterior insula. However, functional and lesion studies in humans are rare.

A few lesion studies lend credence to the idea that the insula is involved in aspects of language [33, 34]. It has also been noted that injury to the insula appears to cause aphasia [35, 36]. Some brain imaging studies also demonstrate insular involvement during word generation [37], verbal memory tasks [38], auditory-vocal integration processes in the context of singing [39], the perception of moving sound [40], speech perception [41], and automatic word processing [42]. A recent study [43] which compares word vs. non word repetition in literate vs. illiterate subjects reports stronger connectivity between temporal areas and the posterior insula for illiterate subjects. This study concludes that the posterior insula might serve phonological processing. However, other studies report insular activations to non-auditory or non-verbal stimuli [4446].

Most of the aforementioned studies found insular activations in the context of different verbal or auditory-verbal tasks. Thus, there seems to be a link between verbal-auditory processing and insular functions. However, the exact function of the insula in auditory and verbal processing remains unknown. Our data leads us to speculate that the posterior insula might be involved in specific modulation processes. The negative correlation between word presentation rates and hemodynamic response during semantic categorization could be explained by a down-regulation of early auditory areas (including HG and the insula) in situations when specific targets have to be semantically segregated out of a stream of auditory stimuli. This selection process might require specific tuning of the auditory networks in HG and the insula, whereas this tuning might not be necessary for the passive listening condition. Additionally, there might be a close functional connection from the insula region to the dorsal IFG region. In situations where the dorsal IFG is involved in demanding processing – as in the case during semantic categorization complicated by high presentation rates – the inhibitory influence from the insula might decrease.

Conclusions

The bottom-up-factor "word presentation rate" modulated hemodynamic responses bilaterally in the primary and secondary auditory cortices of the superior temporal lobe. Thus, these areas operate in a stimulus-dependent fashion. The top-down-factor "semantic categorization" modulated hemodynamic responses in the left and right ventral IFG and in the right dorsal IFG extending into the MFG, supporting earlier studies also using semantic tasks. Interactions between these factors were found bilaterally in the medial HG, the posterior adjacent insula, and in the left dorsal IFG. This interaction in the left dorsal IFG might point to the fact that phonological processing is controlled in these areas, whereas the interaction effects in the insula and HG can be seen in the context of modulating functions. All in all, this study demonstrates that the examination of interaction effects between top-down and bottom-up factors helps to disentangle the function of language-related neural networks.

Methods

Subjects

Ten healthy right-handed male subjects (age: 21–27 years) were run. Subjects gave their written informed consent according to the guidelines of the Research Center Juelich before participating in the study. Hand preference was assessed with the 12-item questionnaire of Annett [47], which allowed us to select subjects who were consistently right-handed (CRH). CRH was defined as performance of all 12 tasks with the right hand with up to two "either" preferences being acceptable. Female subjects were not examined in this study because it has been reported that they may have a more bilateral language representation than males [48].

FMRI measurements

BOLD dependent functional magnetic resonance images were obtained using a 1.5 T Siemens Magnetom Vision system (Siemens, Erlangen), with echo planar imaging capabilities and a radiofrequency (RF) head coil (gradient echo EPI, TR = 6 s, TE = 66 ms, FOV = 200 × 200 mm2, flip angle = 90°, matrix size = 64 × 64, in-plane resolution 3.125 × 3.125 mm2, slice thickness = 3 mm, interslice gap 0.3 mm, 16 slices oriented parallel to the AC-PC-line, specified with a midsaggital scout image). Additionally, a high-resolution anatomical image was acquired (MP-RAGE, T1-weighted, gradient-echo pulse sequence, TR = 11.4 ms, TE = 4.4 ms, flip angle = 15°, FOV = 250 mm, matrix size = 256 × 256, 128 sagittal slices, in-plane resolution 0.98 × 0.98 mm2, slice thickness = 1.25 mm).

Experimental set-up

After the structural scan was obtained, subjects performed 6 consecutive fMRI-runs lasting 8.4 min each. A 2 × 3 factorial block design was employed, with 2 levels of word processing (passive and active) and 3 levels of word presentation rate (0.25 Hz, 0.5 Hz, 1.0 Hz). Each experimental block began with a 0.4 min. interval to allow the fMRI acquisition signals to stabilize. Alternating cycles with a 1 minute rest period (off) and 1 minute activation (on) period followed this. Each block contained 4 of these 2 minute off-on cycles. In the passive listening blocks, subjects were asked to listen carefully to the presented stimuli, but no response was required. In the semantic categorization blocks, subjects were asked to press a response key when they heard animal words, which occurred in 20% of the presentations. For each of these words processing conditions, the order of the presentation-rate blocks was counterbalanced across subjects. However, all 3 levels of passive listening condition were always run before the 3 levels of the semantic categorization condition. This order was employed so that the prior occurrence of the classification condition would not prompt subjects to engage in semantic classifications during the passive listening task.

Stimulus material consisted of 840 German one- or two-syllabic concrete nouns. Word frequency differences between target and non-target words were not significant (p = 0.21). The words were spoken by a trained reader and had a duration of <0.9 sec. and were presented binaurally using an audio playback system ending in piezo-electric headphones. These words were randomly presented during periods of 1 minute duration. The presentation rate was adjusted by varying the inter-word ISI. The intensity of the stimuli was approximately 85 dB SPL. Scanner noise was approximately 90–100 dB, but the tightly fitting headphones suppressed at least 20 dB of this ambient noise. With a TR of 6 seconds and an acquisition time of less than 2 seconds per volume, subjects were able to listen to approximately 70% of the stimulus words without any masking by the background noise. However, the noise reduction produced by the headphones made it possible for subjects to understand all stimuli. This was confirmed in a post session interview.

Image analysis

Image analysis was performed on a PC workstation using MATLAB (Mathworks Inc., Natiek, MA, USA) and SPM99 software http://www.fil.ion.ucl.ac.uk/spm. For analysis, all images were realigned to the fifth volume, corrected for motion artefacts, co-registered with the subjects' corresponding anatomical (T1-weighted) images, resliced and normalised (4 mm3) into standard stereotaxic space using the template provided by the Montreal Neurological Institute [49], and smoothed using an 8 mm full-width-at-half-maximum Gaussian kernel. The data were analyzed by statistical parametric mapping in the context of the general linear model approach of SPM99. The effect of global differences in scan intensity was removed by scaling each scan in proportion to its global intensity.

The statistical analysis corresponds to a random effects analysis that can be generalised for the population as a whole. This was implemented in a two-stage procedure by first estimating the subject specific contrasts of interest for each condition (semantic classification and passive listening at 3 different presentation rates resulting in 6 contrast images for each subject) and then entering these contrast-images into a second level analysis to produce parametric maps of the T statistic. The contrasts at the first level contain parameter estimates pertaining to each of the six conditions. These six conditions were modelled with box-car stimulus functions convolved with a hemodynamic response function [50].

Significant activations were analyzed in a repeated measurement ANOVA with the main effects PR and SC. Voxels showing significant interaction effects (p < 0.005, uncorrected for the entire brain volume if not otherwise mentioned) were excluded from the analysis of main effects. The resulting sets of voxel values for both main effects and the interaction constitute statistical parametric maps of the T statistic (SPM(T)) which was then transformed to the unit normal distribution SPM(Z). Significant activations were thresholded at a t = 2.69 (p = 0.005) and a spatial extent criterion of p < 0.05, corrected for multiple comparisons. Because of the remarkably high between-subject variability with respect to anatomy and cytoarchitectonics in frontal brain regions [51, 52] and prespecified statistical hypotheses with respect to activations in the inferior frontal areas, we applied a lenient statistical threshold for these regions (height threshold of p = 0.05 and an extend criterion of p < 0.05, uncorrected for multiple comparisons).