Abstract
The sensorimotor cortex is responsible for the generation of movements and interest in the ability to use this area for decoding speech by brain–computer interfaces has increased recently. Speech decoding is challenging however, since the relationship between neural activity and motor actions is not completely understood. Non-linearity between neural activity and movement has been found for instance for simple finger movements. Despite equal motor output, neural activity amplitudes are affected by preceding movements and the time between movements. It is unknown if neural activity is also affected by preceding motor actions during speech. We addressed this issue, using electrocorticographic high frequency band (HFB; 75–135 Hz) power changes in the sensorimotor cortex during discrete vowel generation. Three subjects with temporarily implanted electrode grids produced the /i/ vowel at repetition rates of 1, 1.33 and 1.66 Hz. For every repetition, the HFB power amplitude was determined. During the first utterance, most electrodes showed a large HFB power peak, which decreased for subsequent utterances. This result could not be explained by differences in performance. With increasing duration between utterances, more electrodes showed an equal response to all repetitions, suggesting that the duration between vowel productions influences the effect of previous productions on sensorimotor cortex activity. Our findings correspond with previous studies for finger movements and bear relevance for the development of brain-computer interfaces that employ speech decoding based on brain signals, in that past utterances will need to be taken into account for these systems to work accurately.
Similar content being viewed by others
Introduction
The execution of everyday voluntary body movements generally occurs without effort and is the result of the concerted action of different neural processes and brain areas. The sensorimotor cortex is known to play a central role in the different aspects of the generation of movement, such as the control of body part positions, the velocity and direction of movements, applied force and the planning of motor actions (Tanji and Evarts 1976; Georgopoulos et al. 1982, 1992; Donoghue et al. 1998; Moran and Schwartz 1999; Wang et al. 2007; Truccolo et al. 2008). However, for subjects suffering from severe forms of paralysis, even the most common forms of movements, such as those involved in speech and communication can sometimes be completely absent (American Congress of Rehabilitation Medicine 1995; Smith and Delargy 2005; Posner et al. 2007). To restore communication in these subjects, brain-computer-interface (BCI) systems are being developed (Wolpaw et al. 2002). These systems may convert neural activity into written or spoken computer output, and sensorimotor cortex activity related to speech has been shown useful in an attempt to identify, from the neural signals, which sound or word a user may want to communicate (Kellis et al. 2010; Mugler et al. 2014; Herff et al. 2015; Ramsey et al. 2017). These attempts usually rely on the assumption that each specific sound or word is associated with a unique neural signature. Imaging and patient studies, however, have shown that repeating a movement in a discrete way (with short pauses between each movement) may involve different brain areas than performing the same movements in a continuous way (without short pauses between each movement; Kennerley et al. 2002; Spencer et al. 2003; Miall and Ivry 2004; Schaal et al. 2004), even though the movements are almost identical. Moreover, there is evidence for a non-linear relationship between movement-performance and neural activity in the sensorimotor cortex. Various studies have suggested that previous actions influence the neural activity associated with subsequent actions, if spaced close enough together (Miezin et al. 2000; Soltysik et al. 2004). Indeed, during repeated finger movements, the amplitude of sensorimotor neural activity, as measured with fMRI and electrocorticography (ECoG), was shown to decline over repetitions, despite equal movement output (Hermes et al. 2012b; Siero et al. 2013; for a comparison between BOLD and ECoG see: Logothetis et al. 2001; Hermes et al. 2012a; Siero et al. 2014).
Importantly, the studies mentioned above focused on hand and finger movements and it remains to be determined whether the observed complex and non-linear relationship between movement and underlying neural activity is a general feature of the sensorimotor cortex, or whether it is specific to the areas involved in hand movement. Especially relevant in this respect is our previous finding that different parts of the sensorimotor cortex show different response profiles to the same speech movement. Some cortical foci show sustained neural activity during a sustained motor speech action whereas in other locations responses are transient during the same movement (Salari et al. 2018). This finding indicates that the relationship between neural activity and overt speech behaviour differs between subareas of the sensorimotor cortex. It could be speculated that the presence, or absence, of a non-linear relationship between neural responses and behavioural output during repeated movements is specific for cortical foci as well.
With the current study, we aimed to obtain a better understanding of the link between speech pronunciation and underlying sensorimotor cortex activity. This is of interest for BCIs that employ neural signal changes related to (attempted) speech. If the neural signal associated with a specific (attempted) pronunciation would be affected by previous speech actions, the same word or sound may be related to a diversity of neural signatures, which have to be taken into account for a sensorimotor-speech-BCIs to function accurately.
In this study, we investigated the relationship between repeated orofacial movements during speech, and sensorimotor brain activity. We recorded neural signals in three subjects while they pronounced the same vowel multiple times, at different repetition rates. Neural activity was recorded with subdural ECoG electrodes, which allows for recording at high temporal resolution and with high spatial specificity (Siero et al. 2014). We focused on frequencies in the range of 75–135 Hz, which are known to have a spatially specific relationship with (speech and articulator) movements (Crone et al. 1998; Miller et al. 2007; Bouchard et al. 2013), and which are thought to reflect underlying neural population firing (Manning et al. 2009; Miller et al. 2009; Ray and Maunsell 2011). We focused mostly on the ventral parts of the sensorimotor cortex as this area has previously been shown to be responsible for the generation of speech movements (Penfield and Boldrey 1937; Crone et al. 2001; Towle et al. 2008; Pei et al. 2011; Bouchard et al. 2013) and has been the focus of BCI-studies for the classification of speech sounds (see for instance Kellis et al. 2010; Mugler et al. 2014; Herff et al. 2015; Ramsey et al. 2017) and articulator movements (Bleichner et al. 2015).
Method
Subjects
Subjects included in this study (n = 3, 2 females, 19, 41 and 30 years old respectively) were implanted with subdural clinical ECoG electrodes for epilepsy treatment at the University Medical Center Utrecht. All subjects had an additional high-density (HD) electrode grid placed over the sensorimotor cortex (SMC; left for subject A & B and right for subject C). These grids were exclusively placed for research purposes with the subject’s consent, over an area that was not clinically relevant. For subject A & B, the inter-electrode distance of the HD grid was 4 mm with an exposed electrode diameter of 1 mm for subject A and 1.17 mm for subject B. For subject C, the inter-electrode distance was 3 mm with an exposed electrode diameter of 1 mm. Only the HD electrodes were used for the current analysis.
This research was approved by the ethics committee of the University Medical Center Utrecht. All participants gave written informed consent in accordance with the Declaration of Helsinki (2013).
Task
Participants were asked to produce the /i/ vowel repeatedly at different rates (see below), guided by instructions that were visually presented on a computer screen that was placed at a distance of approximately 1 m from the participant. A trial started with an indication of the production speed by a visual cue. Subsequently, to guide the participants in producing the sound at the correct speed, the letters ‘ie’, corresponding in Dutch to the /i/ sound, were repeatedly visually presented for 300 ms at a rate of 5, 4, or 3 times in 3 s (1.66, 1.33 and 1 Hz). These repetition rates were chosen as they were relatively easy to perform (not too slow or too fast) and because previous research for finger movements has shown that repetition effects are mostly apparent at rates of 1 Hz or higher (Hermes et al. 2012b). During the inter-trial interval (1800 ms), a fixation cross was presented. Trials of different rates were randomized and each rate was repeated 26 times, divided over two recording sessions. Any trial for which the number of pronunciations was incorrect, was excluded from the analyses.
Data Acquisition & Preprocessing
Brain data was recorded and preprocessed as described previously (Salari et al. 2018). In short, ECoG data was recorded (number of electrodes: 64 for subject A and 128 for subject B & C) at a sampling frequency of 512 Hz, 2048 Hz (subject A; Micromed, Treviso, Italy), or 2000 Hz (subject B & C; Blackrock Microsystems LLC, Salt Lake City, USA). Different sampling frequencies were used due to the availability of different clinical and research recording setups and the possibility, or not, to choose the most optimal sampling frequency for the current study. For subject A, the data obtained at the highest sampling frequency was down sampled such that the sampling frequencies of all datasets of subject A were the same. Electrodes in the region of interest (sensorimotor cortex) were identified by visual inspection of the electrode positions (as determined by using a post-implantation CT scan) plotted over a 3D rendering of a presurgical MRI scan (Hermes et al. 2010; Branco et al. 2018b). Sensorimotor cortex electrodes with noisy or flat signal were removed from further analysis. For the remaining electrodes, line noise (50 Hz) and harmonics thereof were removed and common average re-referencing was applied. Audio recordings of the subject’s pronunciation were made during the task, to identify the voice onsets and offsets and to be able to correct for possible differences in behavioral performance (see below). Voice onset and offset were determined for each vowel pronunciation, as described previously (Salari et al. 2018). Shortly, these time points were first automatically determined using a vowel detection algorithm (Hermes 1990), which was adjusted by Hermes to also detect vowel offsets. Subsequently, we corrected the on- & offsets if necessary (due to background noise for instance) using Praat software (Boersma 2002).
Matlab software (The Mathworks, Inc., Natick, MA, USA) was used for data analysis, unless specified otherwise. For all sensorimotor electrodes, the high frequency band (75–135 Hz) power was computed per sample point using a Gabor wavelet (Bruns 2004) for all frequencies between 75 and 135 Hz in bins of 1 Hz with a full width half maximum (fwhm) of four wavelets per frequency. Subsequently, a log transformation (10 × log10) was applied and these results were then averaged (over frequencies) to create the HFB power signal. These signals were normalized and subsequently smoothed with a moving average window (centered around the sample point) of 0.1 s. This smoothing setting has been shown to be within the optimal range for accurate classification of phonemes (Branco et al. 2018a), and we used it to preserve the individual peaks per repetition in the data while reducing noise. The data from the two runs were concatenated.
Analysis of ECoG data was conducted in two steps. First, electrodes were identified and selected for further analysis based on their response to the task. Then signals from these electrodes were interrogated for vowel repetition effects.
Electrode Selection
For each electrode, we determined whether it was responsive to the task. To that purpose, we modeled the neural signal by performing a regression analysis on the whole time series. Five predictors were used in this study, each representing a transient response to one of the possible repetition numbers (max 5). Predictor 1 represents the response to all first pronunciations, the second predictor to all second pronunciations etcetera. The fourth and fifth predictors had only (predicted) responses during the trials in which there actually was a fourth and/or fifth pronunciation (Fig. 1). The predictors were created by convolving a Gaussian function with an impulse function that indicated when a vowel was spoken. The full width at half maximum of the Gaussians were determined for each subject separately, as follows. First, for each electrode that had, in the trials of the slowest repetition rate, a maximum peak response higher than 1 standard deviation above the mean of the signal, we estimated the fwhm of that peak. The mean fwhm over electrodes was then used as the fwhm for the Gaussian peak of the model for the slowest production rate. For the two faster production rates, this value was adjusted to match those repetition rates by dividing it by the repetition frequency. We used the slowest repetition rate for the fwhm estimation under the assumption that this is least ‘contaminated’ with activity of other utterances. The fwhm values of the three subjects were, respectively, 0.59, 0.51 and 0.65 s for the 1 Hz repetition rate (leading to a 0.35, 0.30 and 0.39 s fwhm for the 1.66 Hz repetition rate and 0.44, 0.38 and 0.49 s fwhm for the 1.33 Hz repetition rate). Visual inspection showed that using these Gaussian widths, the neural activity could be accurately modeled for all subjects (Fig. 2). Subsequently, since it is known that the HFB response onset of different areas in the brain can occur at different time-points relative to the overt motor action (Crone et al. 1998; Coudé et al. 2011; Hermes et al. 2012b; Bouchard et al. 2013), we shifted the timing of the model peaks and repeated the regression until an optimal fit was found with the data. Timing shifts ranged from 0.5 s before voice onset time to 0.5 s after voice onset, in 0.1 s increments. An electrode was considered significantly responsive to the task if it was significantly explained by the best fitting model and if the average response over trials was an increase in power associated with at least the first vowel production (for all three repetition rates). For subject A–C, a total number of 14, 28 and 59 electrodes were significantly active, respectively. All other electrodes were considered not-responsive (NR). Statistical analysis was conducted using analysis of variance (ANOVA; α = 0.05, false discovery rate corrected), similar as in Salari et al. (2018). For this analysis, normally, each sample point is assumed to be independent and is used as a degree of freedom. The degrees of freedom (DF) value relates to the number of independent observations but since consecutive sample points are not independent (due to the Gabor wavelet power conversion) we counted every 0.5 s of data as an independent sample point to not overestimate the degrees of freedom. Note that, even though the data do not necessarily meet all the assumptions for parametric testing (as discussed above), inspection of the data showed that the current analysis was useful for selecting task related electrodes.
Repetition Effects
Only the significantly responsive electrodes were used for further analysis. For these electrodes, we determined the HFB power peak amplitude for every vowel production. Since the HFB response peak timing, with respect to voice onset, could be different for each electrode we used the shift of the model that explained the data best as the determinant for timing of the HFB response peak with respect to voice onset. Each peak amplitude was determined by taking the median of the HFB signal in a window of 0.1 s before and 0.1 s after the determined peak timing. We used this value instead of the maximum value, to prevent possible noise peaks in the data to affect the results.
Correction for Performance Differences
To investigate if the duration of vowel production was influenced by the repetition rate, we performed an ANOVA for each subject with pronunciation duration (derived from the audio signal) as dependent variable and production rate as independent variable. Furthermore, we corrected for possible differences in HFB response peak amplitudes that might be caused by differences in pronunciation between repetitions. We derived four behavioral performance measures, namely (1) sound intensity, (2) lip aperture, (3) lip movement and (4) lip velocity, for this correction. Sound intensity was calculated by taking the envelope of the normalized audio signal that was recorded during the task, using the absolute value of the Hilbert transform. The sound intensity was normalized per run to make measures from different sessions comparable. Normalization (of each run) was based on the mean and standard deviation of a silent part of that run. This envelope was then smoothed with a moving window of 0.05 s and down sampled to 600 Hz. Lip aperture was measured by analyzing video footage of the subjects while they performed the task. For each repetition, the mean distance (in pixels) between the lips was calculated for the video frames corresponding the pronunciation. Lip movements were calculated in a similar way but the video frames during silent parts just before each pronunciation were now used. The difference between the lip aperture during silence and the upcoming pronunciation served as a measure of lip movement. Lip velocity was calculated by taking the derivative of the lip positions during the silent part before each vowel production and subsequently taking the maximum value thereof. The lip position for each analyzed frame was normalized, per run. This was done by subtracting from each lip position sample, the mean number of pixels between the lips (over analyzed frames) and dividing this by the standard deviation (of pixels over time). To see if any of the measures could explain possible differences in the brain signals, we calculated the correlation value of each of these measures with the HFB response peak amplitudes for all included electrodes. Furthermore, a principal component analysis (PCA) was performed on these measures to dissect covariance among the different measures. The principal components were used as predictors of the HFB response peak amplitudes in a regression analysis, per electrode, the result of which was subtracted from the actual HFB response peak amplitudes to regress out any performance effects on the brain data. Outliers in the HFB response peak amplitudes were disregarded and outliers in the PCA values were replaced by the average value of that component. Outliers were determined by using the ‘isoutlier’ function from matlab. See Supplementary Figure S1 for an indication of the variance in brain data and behavioral measures and their relation before and after correction.
Since we could not measure the tongue position in the patient subjects, we did not correct for possible differences therein. However, after the current study we repeated the task with five healthy volunteers (who signed informed consent, median age: 26 years, range 22–31 years, 1 female) and recorded their tongue position using ultrasound measures. A total of 114 echo pulse scan lines were recorded at 60.11 frames per second at a depth of 90 mm with a EchoBlaster 128 ultrasound machine. The probe was stabilized using an ultrasound headset (Articulate Instruments Ltd 2008). The data were analyzed with Articulate Assistant Advanced software (Articulate Instruments Ltd 2012). We then evaluated whether repeated vowel production caused systematic changes in tongue movements.
After correction for performance, the peak amplitudes of all included electrodes were averaged and grouped by repetition number (1–3, 4 or 5) for each repetition rate separately. Subsequently, for each rate an ANOVA was performed with repetition number as independent variable and HFB response amplitude as dependent variable, to see if there was a significant difference in HFB-amplitude between repetition numbers. The result of this step was used as an indication of whether there was an influence of previous productions of the same vowel on subsequent productions. Since the slowest production rate only contained three repetitions, the ANOVA was performed on the first two and the last repetition only, for all production rates. Hermes et al. (2012b) suggested that a non-linear function in the form of a × (1/x) + bx + c best fitted their results with respect to the shape of the HFB response during finger movements. For visualization of the response profile, we fitted this function with the current data. Furthermore, since those authors found that for finger movements the HFB profile was dependent on movement rate, we compared the HFB response profiles of the three different repetition rates using an ANOVA. The repetition rate was used as independent variable and the HFB response peak amplitude was used as dependent variable (each repetition rate group consisted of the amplitudes of the first, second and last repetitions combined). Note that also in this step only the first, second and last repetitions were used to allow for comparison across rates.
HFB Response Profiles
Based on previous research (Hermes et al. 2012b; Salari et al. 2018) and on inspection of the data, five models were defined to describe the HFB response profiles of the included electrodes. Electrodes could show (1) high activity for the first vowel production followed by a ‘non-linear decrease’ (NLD) of activity for the remaining productions, (2) high activity for the ‘first production’ (FP) but none or very little activity for the remaining productions, (3) high activity for the first and last production with a lower response for the productions in between, in the form of a ‘u-shape’ (US), (4) linearly decreasing (LD) activity over productions, or (5) activity could be equally responsive (ER) to all productions. Each electrode was classified as one of these response profiles for each repetition rate separately, by regressing three predictors to the HFB amplitude data of each electrode. Only three predictors were necessary to describe these five profiles as will be explained below. The first predictor models a NLD and a FP profile in a simplified way, with the first peak higher than the other peaks, and the other peaks being more or less equally high. For both the NLD and the FP profile the predictor was [1 0 0 0 0], [1 0 0 0] or [1 0 0] for a five, four and three repetitions trial respectively. If the intercept of the regression was significantly above zero (α = 0.05), the whole predictor would be moved up. In that situation, there would be a response present for all repetitions, which differentiates the NLD from the FP profile. The second predictor characterized the US model, (i.e., [1 0 0 0 1], [1 0 0 1] or [1 0 1]). The third predictor represented the LD model, (i.e., [1 0.75 0.5 0.25 0], [1 0.67 0.33 0] or [1 0.5 0]). Note that the slope of this linear predictor was not fixed as the beta and intercept value of the regression determined the slope. The predictor with the highest correlation to the data was chosen as the best fit. Subsequently, we tested if this predictor could significantly explain the amplitude response, based on the beta value from the regression analysis (α = 0.05). If an electrode was significant for the best fitting profile (i.e. NDL, FP, US or LD) it was classified as such. If none of the models were significant (and the electrode therefore did not show any difference between the response amplitudes of the repetitions), an electrode was assigned to the ER profile.
We determined, per repetition rate, the percentage of electrodes that belonged to each profile, and evaluated effects of production rate on the number of electrodes per profile.
To investigate the presence of an anatomical organization of particular response profiles within the sensorimotor cortex (i.e., whether some profiles are more prominent in specific sensorimotor regions than others), we determined for each electrode if it was classified as the same profile more than once (out of three repetition rates). If so, this profile was considered the most prominent profile for that electrode. We visualized the distribution of these most prominent response profiles on a 3D rendering of the subject’s brain as described in (Hermes et al. 2010; Branco et al. 2018b).
Results
Task Performance and Behavioral/Acoustic Measures
The task was performed well by all subjects, although subject C showed some difficulties during the first run. For subjects A & B, 7.7% (6/78) of the trials were disregarded due to an incorrect response and for subject C this was 35.9% (28/78). For the trials performed accurately (i.e. with the correct number of repetitions), the intended and performed repetition rates did not differ much (see Table 1). Subject A produced the vowels significantly slower than instructed for the fastest production rate, t(83) = − 7.54, p < 0.001 and subject C produced them faster for the two fastest repetition rates, t(43) = 2.63, p = 0.01 and t(50) = 4.21, p < 0.001 respectively.
For subject A & C there was a significant difference between vowel production durations for the three different repetition rates after Bonferroni correction (α = 0.05), F(2,281) = 5.15, p = 0.006 and F(2,186) = 14.13, p < 0.001 respectively, see Table 1. For subject B, the vowel production durations did not differ significantly. Since the difference for subject A is relatively small (only 0.01 s), and there is no significant difference for subject B, these results suggest that there was not a strong overall difference between vowel production duration for the three repetition rates for these two subjects.
The derived behavioral performance measures (sound intensity, lip aperture, lip movement and lip velocity) did not correlate with the brain signal peak amplitudes for most electrodes (see Table 2 for the mean correlation over electrodes) in subjects A & B. In fact, for subject A, none of the electrodes showed a significant correlation to any of the measures. For subject B, only 17.86% (5/28) of the included electrodes showed a significant (α = 0.05, FDR corrected) correlation of HFB signal amplitude with sound intensity (mean r = 0.36, SD = 0.09). For subject C, many electrodes did show a significant correlation with the lip measures; 11.86% (7/59, mean r = 0.26, SD = 0.05) with lip position, 74.58% (44/59, mean r = 0.36, SD = 0.09) with lip movement, and 47.46% (28/59, mean r = 0.30, SD = 0.06) with lip velocity. Note that we correct for these effects in our ECoG analyses, see Supplementary Figure S1. This figure shows that most of the signal variability due to for instance sound intensity (see subject B) or lip movement (see subject C) is reduced by the correction we applied and will not have contributed to the results presented in the paper.
Electrode Selection and Peak Timing Models
The models used to select significant electrodes and to determine the peak timings showed an accurate correspondence to the HFB response signals (see Fig. 2), with an average of 62% (SD = 12), 67% (SD = 11) and 73% (SD = 12) variance explained for the included electrodes, of subjects A–C respectively.
Average HFB Peak Profile During Vowel Repetitions
In general, the first vowel production of a trial was associated with a larger HFB peak amplitude than subsequent pronunciations (Fig. 3). For all subjects, the mean HFB response peak amplitudes over all significant sensorimotor electrodes differed significantly between repetitions for almost all repetition rates (Table 3, note that we used the first two and the last repetition for all rates). For subject C, there was no significant difference during the three repetitions condition.
We investigated whether certain repetition rates were associated with a stronger average decrease in amplitude than other repetition rates. Only subject A showed a significant difference between peak amplitudes over production rates (Fig. 4), F(2,209) = 3.27, p = 0.04.
Electrode HFB Peak Profiles
Each included electrode was classified as belonging to one of five response profiles (Fig. 5) based on the development of the peak amplitude over repetitions, for each repetition rate separately. In general, the NLD was the most frequent response profile for subjects A & B. For subject C, the US and ER responses were most frequent. None of the response profiles showed a clear anatomical clustering (Fig. 6).
We investigated whether there was a general change in the number of electrodes per profile depending on repetition rate (Fig. 7). Although no statistical conclusions can be drawn from the results with the current number of subjects, there was an overall trend for an increasing number of ER electrodes with decreasing frequency rate. For subjects A, B and C, respectively, 15.38% (2/13), 20.83% (5/24) and 52.63% (20/38) of all electrodes that showed a repetition effect in the five repetitions condition convert to ER in the three repetitions condition.
Tongue Movements in Healthy Volunteers
The results from the tongue movement measures indicated that 4 subjects did not show much difference in tongue position over the different repetitions and one subject showed a slightly higher tongue position for the first vowel production compared to subsequent repetitions for the two fastest repetition rates (Fig. 8). Whether or not people returned their tongue to the rest position in-between repetitions was quite different over subjects.
Discussion
The effect of movement repetition on the sensorimotor HFB response during vowel production was investigated using a simplified speech task with controlled speed of repetition. The HFB signal from high-density electrode grids was evaluated in three epilepsy patients undergoing a surgical procedure for epilepsy diagnosis.
We show that sensorimotor activity related to discrete speech movements is influenced by previous speech movements when spaced a second (or less) apart. Averaged across electrodes, the HFB-response of sensorimotor cortex had a similar amplitude between different production rates but did not show equal amplitudes over the course of repetitions (Fig. 3). This was seen for all subjects and all tested production rates (1–1.66 Hz; except for one instance in subject C where the effect was near-significant).
The data suggest that the HFB-amplitude is not linearly related to motor output since amplitudes mainly decline non-linearly for repeated vowel productions. The analysis included a correction for the small variations in sound intensity, lip aperture, lip movement and lip velocity, making it unlikely that this finding can be explained by differences in performance over repetitions. However, movements of the tongue could not be measured (discussed below).
The results for speech movements are in agreement with earlier electrophysiological and fMRI data that report a repetition effect for finger movements (Hermes et al. 2012b; Siero et al. 2013). We extend these by showing that complicated movements such as those involved in speech show a decline in HFB response when repeated at a frequency of 1 Hz or higher. Furthermore, our data suggest a tipping-point between 1 and 2 s (the production rate at which the HFB amplitude decline disappears) since the repetition effect was still visible for repetitions 1 s apart but was no longer visible after 2 s, the time approximately between the last pronunciation of a trial and the first pronunciation of the following trial, as indicated by the recovery of a high amplitude for each first pronunciation of a trial.
Across electrodes, the non-linear decrease (NLD) profile was dominant for subject A & B. Other response profiles were observed, but less frequently in these subjects. For subject C, the US and ER responses were most frequent. For all subjects, the number of electrodes with an equally responsive (ER) profile tended to increase with decreasing vowel production rate. Considering the earlier discussion on the tipping-point, it could be speculated that different cortical patches in sensorimotor cortex exhibit different tipping points, which would cause more electrodes to display the repetition effect as vowel production rate increases. Note however, that the total number of pronunciations and therewith the number of data points, is different for each repetition rate, which could make the statistical chance to find a particular response profile unequal between repetition rates. Therefore, we cannot fully exclude the possibility that the tipping point effect may be caused partly by an unequal number of data points between repetition rates.
The current results did not indicate a clear anatomical organization with respect to the different response profiles although some clustering seemed to be present (most clearly visible in subject B). Please note that, even though we used HD electrode grids, individual electrodes are still 3 or 4 mm apart. Therefore, the spatial sampling is somewhat sparse compared to for instance high field fMRI recordings. Possibly, repeating the experiment with even higher spatial sampling may reveal spatial organization with respect to the repetition effect.
Neural Underpinnings of the Repetition Effect
There are several phenomena that may account for the decrease in HFB-power observed with repeated speech movements. It may be speculated that some articulators which we did not correct for, moved more for the first pronunciation than for subsequent pronunciations. Analysis of tongue movements during the same task in healthy volunteers revealed quite constant tongue positions over repetitions within subjects but also revealed variations in tongue movements between subjects, ranging from full contraction and relaxation for each repetition to a fixed tongue position throughout repetitions (Fig. 8). Previous research has suggested that not only articulator position but also features related to articulator movements (such as velocity) are represented in the sensorimotor cortex (Conant et al. 2018). In case the tongue may not return to its rest position between repetitions, the first utterance may be associated with more activation than the subsequent productions as the articulator movement is then largest for the first pronunciation and smaller for the subsequent ones. This may also explain why the last repetition sometimes showed an increase in activity compared to its predecessor(s) as the tongue has to return to its rest position. Furthermore, it can be speculated, that between phonemes the musculature used for the production is not fully at rest (in anticipation of the next production), even if the articulator position between phonemes is close to the rest position. In this case, one could see the full sequence of repetitions as one, albeit complex, movement with an onset and an offset. Since various reports have shown a neural response at movement offset (Ball et al. 2008; Hermes et al. 2012b; Salari et al. 2018), neural activity at the end of a sequence may also be attributed to the movement towards full rest. However, this cannot explain all the found response profiles. Hence, our findings cannot be fully explained by differences in tongue movements between repetitions and are therefore also in line with the existence of a non-linearity between motor output and neural activity during repeated speech-movements that are spaced closely apart. Since HFB power is thought to be associated with neural firing (Manning et al. 2009; Miller et al. 2009; Ray and Maunsell 2011), a decrease in HFB power as observed here may suggest that fewer neurons are involved in subsequent motor acts, (see Hermes et al. 2012b for a similar interpretation), or that the same neurons fire less frequently. Indeed, repetition suppression effects have been found for other modalities than motor execution. Suppression during repeated visual stimulation has for example been attributed to a reduction of neural excitability for repeated stimuli (see Grill-Spector et al. 2006 for an interesting discussion on the possible mechanisms behind a reduction of neural activity for repeated stimuli and the possible function this may have).
Furthermore, repetition suppression effects for repeated speech have also been found in other areas than the sensorimotor cortex and may be involved in motor planning. Previous fMRI research in the left posterior inferior frontal gyrus, has shown for instance, a repetition suppression effect which is related to the degree of shared phonological features (such as voicing or manner of articulation) over the course of repeated words (Okada et al. 2018). This suggests that similar phonological features during speech reduce activity in motor planning areas. It would be interesting to see if such motor planning effects of similar phonological features is related to the repetition effect we observed in the sensorimotor cortex as it has been suggested that repetition effects in some areas may affect the activity in other areas (Grill-Spector et al. 2006).
Furthermore, even though in the current study we focused on sensorimotor cortex activity, other areas such as the supplementary motor area (SMA), cerebellum, basal ganglia and premotor cortex, which are connected to the motor cortex, have been suggested to play an important role in the timing of speech production (Kotz and Schwartze 2010). It would therefore be interesting to investigate the role of those areas during repeated speech production and to see if they have an influence on the repetition effect.
Neural Underpinnings of the Different Response Types
Although further research is needed to better understand the different response types we found, we may speculate about the possible underlying mechanisms. There are multiple theories on the mechanisms behind repetition suppression that may explain the current results. For instance, increased influx of potassium ions over the course of repetitions may lead to hyperpolarization of the cell membrane, causing a reduction in neural firing. If this effect is asymptotic, this may lead to a non-linear decrease. Another theory suggests that only neurons that are most specific to the task continue firing over repetitions. It could be that in some areas the number of task-specific neurons is higher than in others, which may lead to the different response types. If most of the neurons are task-specific, it would be likely that the responses are equal over repetitions (ER). If the ratio between task-specific neurons and task-unspecific neurons is high, the number of firing neurons may decrease initially and stabilize at some point, leading to a non-linear decrease (NLD or FP). With a lower ratio, the decrease may be linear, as the number of neurons that can stop firing is larger. As discussed earlier, some areas may be related to movements in two directions (Fetz et al. 1980; Soso and Fetz 1980), which may explain the u-shape response type (US), as at the end of the trial the articulators are likely to return to rest position, see for instance the tongue position data in Fig. 8. This may result in increased neural firing at the end of the trial. Besides these theories on the neural underpinnings of repetition suppression, there may be an alternative explanation for a higher response for the first vowel production. Some parts of the sensorimotor cortex may be involved in the planning of a motor sequence (Tanji and Evarts 1976) and it may be speculated that this could lead to more neural activity for the first production (i.e. beginning of a rhythmic sequence) or to only neural activity at the beginning of the sequence.
Implications for Neural Based Speech Decoding
Our results are highly relevant for the development of sensorimotor-speech BCIs: systems that aim to decode (attempted) speech from sensorimotor brain signals. Classification of speech sounds based on sensorimotor activity has been shown before (Kellis et al. 2010; Brumberg et al. 2011; Mugler et al. 2014; Herff et al. 2015; Ramsey et al. 2017), but accuracy levels and degrees of freedom do not meet the standards for home-use by patients. It has been suggested however, that these systems are likely to benefit from the use of high-density electrode grids (Kellis et al. 2016; Ramsey et al. 2017). Classification of sensorimotor signals may also benefit from taking linguistic structures, such as syntax or likely word combinations, into consideration by incorporating a language corpus to the predictions (Herff et al. 2015, 2017). We postulate that a third factor needs to be taken into account for optimal decoding accuracy: previous speech movements. Since sensorimotor-speech BCIs try to find specific patterns in the brain signals that can consistently be linked to a specific sound, and use this information to determine which sound the user made (or tried to make), any effect of previous (attempted) utterances on the brain signal of a current utterance is important information. Sensorimotor-speech BCIs may therefore be improved when information about previously spoken sounds is incorporated in the decoding pipeline. The current study provides a method for creating models of HFB profiles related to vowel repetitions. Models like these may be used for creating a library of models related to variations in brain activity patterns associated with speech production and may potentially improve speech classification. It will be crucial to extend the current findings to more real-life application scenarios of sensorimotor-speech BCIs, and to investigate whether the results can be generalized to natural speech circumstances such as repeating the same phoneme within a word or over the course of words. Furthermore, since these BCI systems are intended for paralyzed subjects, it is essential to investigate if also repeated covert/attempted speech is associated with similar phenomena.
Limitations & Future Work
One of the limitations of the current study is the small number of subjects. Yet all subjects show similar results (decreased activity for repeated vowel production) across all investigated production rates (except for one instance in subject C where the effect was near-significant) and our findings do correspond to that of previous studies for finger movements. Also, a larger range of production rates could have been more informative, notably to determine the tipping point for HFB response recovery. Third, we did not record articulator positions directly (except for the lips) and could therefore not correct for all variations in motor output. Fourth, in the current study we did not control for a possible effect of auditory stimulation on the cortical responses (by the subjects hearing their own voice during the task). However, since previous studies of other repeated movements (not involving speech or auditory stimulation) have shown similar results as the current study (e.g. Hermes et al. 2012b), we argue that it is likely that the repetition effect is more a sensorimotor cortex effect related to movements than to auditory stimulation. Finally, from our study it is not possible to determine whether the repetition effect is specific for the same phoneme, or could generalize to different phonemes following one another. This issue clearly warrants further investigation, as it is relevant for decoding speech where different phonemes are produced in sequence.
Conclusion
We show that neural activity related to discrete repeated speech movements is influenced by previous speech movements spaced a second or less apart. The most prominent response profile for repeated speech movements is a non-linear decrease of neural activity over repetitions. These findings are of importance for the development of communication-brain-computer interfaces that use decoding of (overt or covert) speech.
References
Articulate Instruments Ltd (2008) Ultrasound stabilisation headset users manual: revision 1.4. Articulate Instruments Ltd, Edinburgh
Articulate Instruments Ltd (2012) Articulate assistant advanced user guide: version 2.14. Articulate Instruments Ltd, Edinburgh
American Congress of Rehabilitation Medicine (1995) Recommendations for use of uniform nomenclature pertinent to patients with severe alterations in consciousness. Arch Phys Med Rehabil 76:205–209. https://doi.org/10.1016/S0003-9993(95)80031-X
Ball T, Demandt E, Mutschler I et al (2008) Movement related activity in the high gamma range of the human EEG. NeuroImage 41:302–310. https://doi.org/10.1016/j.neuroimage.2008.02.032
Bleichner MG, Jansma JM, Salari E et al (2015) Classification of mouth movements using 7 T fMRI. J Neural Eng 12:66026. https://doi.org/10.1088/1741-2560/12/6/066026
Boersma P (2002) Praat, a system for doing phonetics by computer. Glot Int 5:341–345
Bouchard KE, Mesgarani N, Johnson K, Chang EF (2013) Functional organization of human sensorimotor cortex for speech articulation. Nature 495:327–332. https://doi.org/10.1038/nature11911
Branco MP, Freudenburg ZV, Aarnoutse EJ et al (2018a) Optimization of sampling rate and smoothing improves classification of high frequency power in electrocorticographic brain signals. Biomed Phys Eng Express 4:45012. https://doi.org/10.1088/2057-1976/aac3ac
Branco MP, Gaglianese A, Glen DR et al (2018b) ALICE: a tool for automatic localization of intra-cranial electrodes for clinical and high-density grids. J Neurosci Methods 301:43–51. https://doi.org/10.1016/j.jneumeth.2017.10.022
Brumberg JS, Wright EJ, Andreasen DS et al (2011) Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Front Neurosci. https://doi.org/10.3389/fnins.2011.00065
Bruns A (2004) Fourier-, Hilbert- and wavelet-based signal analysis: are they really different approaches? J Neurosci Methods 137:321–332. https://doi.org/10.1016/j.jneumeth.2004.03.002
Conant DF, Bouchard KE, Leonard MK, Chang EF (2018) Human sensorimotor cortex control of directly-measured vocal tract movements during vowel production. J Neurosci. https://doi.org/10.1523/JNEUROSCI.2382-17.2018
Coudé G, Ferrari PF, Rodà F, Maranesi M, Borelli E, Veroni V, Monti F, Rozzi S, Fogassi L, Bartolomucci A (2011) Neurons controlling voluntary vocalization in the macaque ventral premotor cortex. PLoS ONE 6(11):e26822
Crone NE, Miglioretti DL, Gordon B, Lesser RP (1998) Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. II. Event-related synchronization in the gamma band. Brain 121:2301–2315. https://doi.org/10.1093/brain/121.12.2301
Crone NE, Hao L, Hart J et al (2001) Electrocorticographic gamma activity during word production in spoken and sign language. Neurology 57:2045–2053. https://doi.org/10.1212/WNL.57.11.2045
Donoghue JP, Sanes JN, Hatsopoulos NG, Gaál G (1998) Neural discharge and local field potential oscillations in primate motor cortex during voluntary movements. J Neurophysiol 79:159–173
Fetz EE, Finocchio DV, Baker MA, Soso MJ (1980) Sensory and motor responses of precentral cortex cells during comparable passive and active joint movements. J Neurophysiol 43:1070–1089
Georgopoulos AP, Kalaska JF, Caminiti R, Massey JT (1982) On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. J Neurosci 2:1527–1537
Georgopoulos AP, Ashe J, Smyrnis N, Taira M (1992) The motor cortex and the coding of force. Science 256:1692–1695
Grill-Spector K, Henson R, Martin A (2006) Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn Sci 10:14–23. https://doi.org/10.1016/j.tics.2005.11.006
Herff C, Heger D, de Pesters A et al (2015) Brain-to-text: decoding spoken phrases from phone representations in the brain. Neural Technol 9:217. https://doi.org/10.3389/fnins.2015.00217
Herff C, Pesters A de, Heger D et al (2017) Towards continuous speech recognition for BCI. Brain–Comput Interface Res. https://doi.org/10.1007/978-3-319-57132-4_3
Hermes DJ (1990) Vowel-onset detection. J Acoust Soc Am 87:866–873. https://doi.org/10.1121/1.398896
Hermes D, Miller KJ, Noordmans HJ et al (2010) Automated electrocorticographic electrode localization on individually rendered brain surfaces. J Neurosci Methods 185:293–298. https://doi.org/10.1016/j.jneumeth.2009.10.005
Hermes D, Miller KJ, Vansteensel MJ et al (2012a) Neurophysiologic correlates of fMRI in human motor cortex. Hum Brain Mapp 33:1689–1699. https://doi.org/10.1002/hbm.21314
Hermes D, Siero JCW, Aarnoutse EJ et al (2012b) Dissociation between neuronal activity in sensorimotor cortex and hand movement revealed as a function of movement rate. J Neurosci 32:9736–9744. https://doi.org/10.1523/JNEUROSCI.0357-12.2012
Kellis S, Miller K, Thomson K et al (2010) Decoding spoken words using local field potentials recorded from the cortical surface. J Neural Eng 7:56007. https://doi.org/10.1088/1741-2560/7/5/056007
Kellis S, Sorensen L, Darvas F et al (2016) Multi-scale analysis of neural activity in humans: implications for micro-scale electrocorticography. Clin Neurophysiol 127:591–601. https://doi.org/10.1016/j.clinph.2015.06.002
Kennerley SW, Diedrichsen J, Hazeltine E et al (2002) Callosotomy patients exhibit temporal uncoupling during continuous bimanual movements. Nat Neurosci 5:376–381. https://doi.org/10.1038/nn822
Kotz SA, Schwartze M (2010) Cortical speech processing unplugged: a timely subcortico-cortical framework. Trends Cogn Sci 14:392–399. https://doi.org/10.1016/j.tics.2010.06.005
Logothetis NK, Pauls J, Augath M et al (2001) Neurophysiological investigation of the basis of the fMRI signal. Nature 412:150–157. https://doi.org/10.1038/35084005
Manning JR, Jacobs J, Fried I, Kahana MJ (2009) Broadband shifts in LFP power spectra are correlated with single-neuron spiking in humans. J Neurosci 29:13613. https://doi.org/10.1523/JNEUROSCI.2041-09.2009
Miall RC, Ivry R (2004) Moving to a different beat. Nat Neurosci 7:1025–1026. https://doi.org/10.1038/nn1004-1025
Miezin FM, Maccotta L, Ollinger JM et al (2000) Characterizing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing. NeuroImage 11:735–759. https://doi.org/10.1006/nimg.2000.0568
Miller KJ, Leuthardt EC, Schalk G et al (2007) Spectral changes in cortical surface potentials during motor movement. J Neurosci 27:2424–2432. https://doi.org/10.1523/JNEUROSCI.3886-06.2007
Miller KJ, Sorensen LB, Ojemann JG, Nijs M (2009) Power-law scaling in the brain surface electric potential. PLOS Comput Biol 5:e1000609. https://doi.org/10.1371/journal.pcbi.1000609
Moran DW, Schwartz AB (1999) Motor cortical representation of speed and direction during reaching. J Neurophysiol 82:2676–2692
Mugler EM, Patton JL, Flint RD et al (2014) Direct classification of all American English phonemes using signals from functional speech motor cortex. J Neural Eng 11:35015. https://doi.org/10.1088/1741-2560/11/3/035015
Okada K, Matchin W, Hickok G (2018) Phonological feature repetition suppression in the left inferior frontal gyrus. J Cogn Neurosci. https://doi.org/10.1162/jocn_a_01287
Pei X, Leuthardt EC, Gaona CM et al (2011) Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition. NeuroImage 54:2960–2972. https://doi.org/10.1016/j.neuroimage.2010.10.029
Penfield W, Boldrey E (1937) Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical stimulation. Brain J Neurol 60:389–443. https://doi.org/10.1093/brain/60.4.389
Posner JB, Plum F, Saper CB, Schiff N (2007) Plum and Posner’s diagnosis of stupor and coma. Oxford University Press, Oxford
Ramsey NF, Salari E, Aarnoutse EJ et al (2017) Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids. NeuroImage. https://doi.org/10.1016/j.neuroimage.2017.10.011
Ray S, Maunsell JHR (2011) Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLOS Biol 9:e1000610. https://doi.org/10.1371/journal.pbio.1000610
Salari E, Freudenburg ZV, Vansteensel MJ, Ramsey NF (2018) Spatial-temporal dynamics of the sensorimotor cortex: sustained and transient activity. IEEE Trans Neural Syst Rehabil Eng 26:1084–1092. https://doi.org/10.1109/TNSRE.2018.2821058
Schaal S, Sternad D, Osu R, Kawato M (2004) Rhythmic arm movement is not discrete. Nat Neurosci 7:1136–1143. https://doi.org/10.1038/nn1322
Siero JC, Hermes D, Hoogduin H et al (2013) BOLD consistently matches electrophysiology in human sensorimotor cortex at increasing movement rates: a combined 7T fMRI and ECoG study on neurovascular coupling. J Cereb Blood Flow Metab 33:1448–1456. https://doi.org/10.1038/jcbfm.2013.97
Siero JC, Hermes D, Hoogduin H et al (2014) BOLD matches neuronal activity at the mm scale: a combined 7 T fMRI and ECoG study in human sensorimotor cortex. NeuroImage 101:177–184. https://doi.org/10.1016/j.neuroimage.2014.07.002
Smith E, Delargy M (2005) Locked-in syndrome. BMJ 330:406–409. https://doi.org/10.1136/bmj.330.7488.406
Soltysik DA, Peck KK, White KD et al (2004) Comparison of hemodynamic response nonlinearity across primary cortical areas. NeuroImage 22:1117–1127. https://doi.org/10.1016/j.neuroimage.2004.03.024
Soso MJ, Fetz EE (1980) Responses of identified cells in postcentral cortex of awake monkeys during comparable active and passive joint movements. J Neurophysiol 43:1090–1110
Spencer RMC, Zelaznik HN, Diedrichsen J, Ivry RB (2003) Disrupted timing of discontinuous but not continuous movements by cerebellar lesions. Science 300:1437–1439. https://doi.org/10.1126/science.1083661
Tanji J, Evarts EV (1976) Anticipatory activity of motor cortex neurons in relation to direction of an intended movement. J Neurophysiol 39:1062–1068
Towle VL, Yoon H-A, Castelle M et al (2008) ECoG gamma activity during a language task: differentiating expressive and receptive speech areas. Brain 131:2013–2027. https://doi.org/10.1093/brain/awn147
Truccolo W, Friehs GM, Donoghue JP, Hochberg LR (2008) Primary motor cortex tuning to intended movement kinematics in humans with tetraplegia. J Neurosci 28:1163–1178. https://doi.org/10.1523/JNEUROSCI.4415-07.2008
Wang W, Chan SS, Heldman DA, Moran DW (2007) Motor cortical representation of position and velocity during reaching. J Neurophysiol 97:4258–4270. https://doi.org/10.1152/jn.01180.2006
Wolpaw JR, Birbaumer N, McFarland DJ et al (2002) Brain–computer interfaces for communication and control. Clin Neurophysiol 113:767–791. https://doi.org/10.1016/S1388-2457(02)00057-3
Acknowledgements
The authors would like to thank the participants, the staff of the clinical neurophysiology department and the neurosurgeons for their contribution and Etske Ooijevaar for her help with the ultrasound measurements.
Funding
This study was funded by the European Union (ERC-Advanced ‘iConnect’ Grant 320708).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Handling Editor: Micah M. Murray.
Electronic supplementary material
Below is the link to the electronic supplementary material.
10548_2018_673_MOESM1_ESM.tif
Supplementary Figure S1—Correlation, for subject A-C, between four behavioral measures (sound intensity, lip position, lip movement and lip velocity) with the normalized brain signal peak amplitudes, averaged over electrodes, before (blue) and after correction (red) for behavioral measures. On the x-axis, the HFB signal peak amplitude is indicated and on the y-axis the behavioral measure. The correlation value (r) and significance value (p) are indicated above each plot in the corresponding color (TIF 7380 KB)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Salari, E., Freudenburg, Z.V., Vansteensel, M.J. et al. Repeated Vowel Production Affects Features of Neural Activity in Sensorimotor Cortex. Brain Topogr 32, 97–110 (2019). https://doi.org/10.1007/s10548-018-0673-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10548-018-0673-4