The Relative Contribution of High-Gamma Linguistic Processing Stages of Word Production, and Motor Imagery of Articulation in Class Separability of Covert Speech Tasks in EEG Data
- 369 Downloads
Word production begins with high-Gamma automatic linguistic processing functions followed by speech motor planning and articulation. Phonetic properties are processed in both linguistic and motor stages of word production. Four phonetically dissimilar phonemic structures “BA”, “FO”, “LE”, and “RY” were chosen as covert speech tasks. Ten neurologically healthy volunteers with the age range of 21–33 participated in this experiment. Participants were asked to covertly speak a phonemic structure when they heard an auditory cue. EEG was recorded with 64 electrodes at 2048 samples/s. Initially, one-second trials were used, which contained linguistic and motor imagery activities. The four-class true positive rate was calculated. In the next stage, 312 ms trials were used to exclude covert articulation from analysis. By eliminating the covert articulation stage, the four-class grand average classification accuracy dropped from 96.4% to 94.5%. The most valuable features emerge after Auditory cue recognition (~100 ms post onset), and within the 70–128 Hz frequency range. The most significant identified brain regions were the Prefrontal Cortex (linked to stimulus driven executive control), Wernicke’s area (linked to Phonological code retrieval), the right IFG, and Broca’s area (linked to syllabification). Alpha and Beta band oscillations associated with motor imagery do not contain enough information to fully reflect the complexity of speech movements. Over 90% of the most class-dependent features were in the 30-128 Hz range, even during the covert articulation stage. As a result, compared to linguistic functions, the contribution of motor imagery of articulation in class separability of covert speech tasks from EEG data is negligible.
KeywordsBrain-computer interfaces EEG Linguistic processing stages Motor imagery of articulation Gabor transform Davies-Bouldin index
Speech is the most natural and intuitive form of human communication. Language and cognition are closely related processes. A BCI system designed to understand commands covertly spoken in the user’s mind, is highly desirable. Most neocortical territories in both hemispheres, as well as many subcortical brain regions are involved in language . EEG signals can successfully identify 200–600 Hz cortical spikes [2, 3, 4] for medical diagnostic applications. In artefact-free conditions, EEG signals accurately measure induced/evoked high-Gamma brain activity, up to 150 Hz [5, 6, 7, 8]. Based on the unique cognitive Neuroanatomy of each individual, the spatial, temporal, and spectral patterns of activity may vary from person to person .
Word production begins with semantic (conceptual preparation), lexical (Lemma retrieval), and phonetic (phonological code retrieval and syllabification) linguistic processes, followed by planning the movements of language muscles (phonetic encoding) for articulation [10, 11].
Linguistic phonetic processing is an automatic brain function, which elicits high-Gamma (70–160 Hz) oscillations [12, 13]. In each individual, Phonetic processing activity for a specific word does not change over time [14, 15] and is not affected by priming, cognitive activity, or task frequency [16, 17]. In contrast, semantic and lexical processing, is affected by task frequency, priming, and cognitive activity [18, 19, 20], which would also arbitrarily shift the temporal course of all following functions. These problems can be avoided by using a suitable experimental protocol.
If the word class is known by the user before the trials, the conceptual preparation stage will be completed in advance. The Lemma selection stage, with multiple competing lemmas will have temporal inconsistencies. If trials are recorded in blocks, only one Lemma is activated and selected. In block recordings, the same auditory time cue, in the form of a “beep’ sound, can be used for task onset in all word classes, thus eliminating class-dependent auditory evoked responses from trials. By consolidated the semantic and lexical activities, conceptual preparation and lemma selection are complete before task onset. As a result, trials only contain automatic phonetic linguistic processing stages, and will not be affected by the temporal inconsistency of cognitive activity. Mental effort causes activation of scalp and neck muscles , which can mask high-Gamma cortical components. In this work, no mental effort is required from the user during trials. These conditions can be easily reproduced for the online application of this Linguistic BCI, with the same block recordings used for training.
After cue recognition (~100 ms post-onset), the following stages are : Lemma activation (~100-175 ms post-onset), phonological code retrieval (~175-250 ms post-onset) and syllabification (~250-300 ms post-onset). Covert articulation (~500-800 ms post-onset) and the corresponding Motor imagery activity, are separated from the linguistic stages by a ~200 ms interval, during which covert articulation is designed by an internal perceptual process using the working memory and the somatosensory association cortex . Initially one-second trials are used. By using shorter trials (0-312 ms post-onset), the covert articulation stage can be excluded from analysis to study its contribution to classification accuracy.
The EEG signals were recorded using a 64 channel Biosemi ActiveTwo™ system . One computer generated the graphical user interface and sent trigger signals to the ActiveTwo device at the instant a time cue was presented to the user. The triggers were sent via the parallel port and were visible in the recorded data. A second computer saved the EEG recordings and was connected to the ActiveTwo’s A/D box via USB. Electrode placement was done per the international ABC system, which for 64 channels corresponds to the 10/10 system. The ActiveTwo has a pre-amplifier stage on the electrode and can correct for high impedances. However, the offset voltage between the A/D box and the body was kept between 25 mV and 50 mV as recommended by the manufacturer. The data were recorded at a sampling rate of 2048 samples/s, with guaranteed data frequency content of 0-409 Hz according to BioSemi.
The pre-processing was done with the use of EEGLAB , an open source MATLAB™ toolbox. Studies conducted with the use of intra-cranial implants confirm high gamma band activity during covert speech tasks [20, 31, 32]. One of the main reasons that numerous studies have failed in achieving high classification accuracy, is that covert speech tasks are treated as motor imagery, and information above the beta band is often ignored or even filtered out . A suitable frequency range (0-128 Hz) for analysing Linguistic activity is achieved by down-sampling the data to 256 Hz. This frequency range is within the operating capability of the ActiveTwo system. The data is then referenced using surface Laplacian. To remove 50 Hz noise from UK power lines, a FIR notch filter, with rejection band of (49.2–50.8 Hz) was applied. Using the Automatic Artifact Removal (AAR) toolbox in EEGLAB , EOG and EMG artifacts were reduced, with SOBI  and CCA algorithms  respectively. These methods outperform ICA, which is ineffective beyond 70 Hz [37, 38]. Unfortunately, no algorithm can completely eliminate EMG, which elicits 20-200 Hz oscillations in EEG [28, 39]. The most effective solution is to reduce the possibility of recording EMG by controlling the experiment protocol and the environment. The final stage of pre-processing is extracting epochs from the continuous EEG recordings. Each epoch begins when beep sound is generated and ends exactly one second (or 312 ms for shortened trials) later.
A 1-s epoch from a single EEG channel (256 samples) is converted into a 64 × 32 feature matrix. For the 312 ms trials (80 samples), one epoch from one channel is converted into a 64 × 10 feature matrix.
Only the feature generation stage, using the discrete Gabor transform, is applied to the entire dataset. All other calculations are unique and fold-dependent. In this study, for the 1-s trials each DBI matrix has a dimension of 4096 × 32 (64 frequency-bands, 64 channels, 32 time-steps). Based on the DBI, features are ranked and sorted in order of importance. The indexes of the most valuable 4000 features are saved, and these features used for training the classification object. This filtering approach for feature selection reduces the dimensionality of the feature space by 97%, with acceptable computational cost. The 312 ms trials use the same analysis pipeline as 1-s trials. For 64 channels, the dimension of the DBI matrix for 312 ms trials is 4096 × 10 (64 frequency bands, 64 channels, 10 time-steps).
Pseudo-Linear discriminant analysis was applied for classification, as it consistently out-performed all other supervised machine learning methods, for EEG recorded covert speech data . Compared to the training process, the computational cost of testing is negligible.
The Wilcoxon rank-sum test on both columns returns a p value of 0.9269. By using 312 ms trials instead of 1-s trials to exclude covert articulation, the computational cost is reduced to one third, with less than 2% penalty in classification accuracy. During covert speech, the language motor regions are suppressed, but not completely deactivated . As a result, during the covert articulation stage, there will be minute involuntary muscle movements related to each phonemic structure, which will create class-related, high-Gamma Myoelectric artefacts. The 312 ms trials are complete before the covert articulation stage begins (~500 ms post onset) and are guaranteed to be free from class-related EMG. Possible involuntary early muscle ticks (i.e. lip movements ~160 ms after cue) can cause significant EMG contamination. The CCA algorithm used here, only removes such artefacts from the first 400 ms of data (312 ms trials included) .
In a recent publication by these authors  an identical experimental protocol and analysis pipeline to this work were used to record mixed randomised trials in a single run using an Enobio dry electrode system with 20 channels. To achieve a manageable recording duration (6–7 min), only 20 trials were recorded per class, and the idle period between trials was reduced to 1–3 s. A grand average classification accuracy of 85% was achieved. Despite using fewer channels, inferior electrodes, and fewer trials compared to the current work, the system performed extremely well for mixed randomised recordings.
[0-62 ms] Left, and right Auditory Cortex: response to auditory cue.
[62-124 ms] Prefrontal Cortex : Stimulus-driven executive control, initiating covert speech with auditory cue recognition (100 ms). Left Middle Temporal Gyrus: Lemma activation (100-124 ms).
[124-186 ms] left Superior Temporal Gyrus: Phonological code retrieval.
[186-248 ms] Left and right Inferior Frontal Gyrus: syllabification.
[248-312 ms] Left inferior, and Superior Parietal Cortex : Goal-driven executive control, by suppressing the Primary Motor Cortex, and activating an internal perceptual planning process [60, 61, 62, 63].
The syllabification stage is completed sooner than estimated, and the 312 ms trials contain the very early stages of perceptual planning. However, the covert articulation stage, which occurs after the activation of the Supplementary Motor Area [9, 64], is excluded from shortened trials as intended. In the 312 ms trials, the spatial, temporal, and spectral properties of the 4e5 most valuable features identified from 10 participants (Figs. 9 and 11), correspond to the automatic linguistic processing stages of word production prior to articulation, and are supported by a substantial body of evidence [9, 10, 12, 13, 14, 15, 20, 21, 22, 25, 31, 32, 60]. This, in addition to eliminating the possibility of drifts in the raw EEG recordings, confirm the validity of our findings.
By excluding motor imagery, grand average classification accuracy dropped from 96.4% to 94.5%. Compared to the high-Gamma linguistic processing stages of word production, the contribution of motor imagery of articulation in class separability of covert speech tasks is negligible. However, by using 312 ms trials instead of 1-s trials, the computational cost is significantly reduced. The 312 ms trials used in this work, only contain phonetic linguistic processing activity. Phonetic linguistic processing prior to articulation, elicits a unique and word-specific pattern of high-Gamma activity [12, 65], which does not change over time [14, 15] and is not affected by frequency  or priming . Phonetic codes are set up and consolidated with the acquisition of language during childhood, and remain unchanged throughout a person’s life . Phonetic codes are stored in the long term memory, and are processed automatically by the brain requiring no conscious effort from the user during trials, with immunity from any influence or modification [16, 17, 65, 66]. The experimental protocol and analysis pipeline for 312 ms trials presented in this work can be used as a framework to create an online EEG-based 4-class linguistic BCI in future studies. The raw EEG recordings for all ten participants in this work have been published on “Mendeley Data” ( https://doi.org/10.17632/5c2z92vw3g.2) for the benefit of our readers.
Compliance with ethical standards
Conflicts of interest
- 1.Kraft, E., Gulyas, B., and Poppel, E., Neural correlates of thinking, 1 ed. Berlin, Springer-Verlag, pp. 65–139, 2009.Google Scholar
- 2.Hsu D., Hsu, M., Grabenstatter, H. L., Worrell, G. A., and Sutula, T. P., Characterization of high frequency oscillations and EEG frequency spectra using the damped-oscillator oscillator detector (DOOD). arXiv, vol. 1309, no. 1086, 2013.Google Scholar
- 3.Pulvermoller, F., Birbaumer, N., Lutzenberger, W., and Mohr, B., High-frequency brain activity: Its possible role in attention, perception and language processing. Prog. Neurobiol. 52:427–445, 1997.Google Scholar
- 4.Baker, S. N., Curio, G., and Lemon, R. N., EEG oscillations at 600 Hz are macroscopic markers for cortical spike bursts. Physiol 550(2):529–534, 2003.Google Scholar
- 16.Martin, R. C., Lesch, M. F., and Bartha, M. C., Independence of Input and Output Phonology in Word Processing and Short-Term Memory. J. Mem. Lang. 41:3–29, 1999.Google Scholar
- 18.Kaan, E., Event related potentials and language processing: a brief overview. Lang Ling Compass 1(6):571–579, 2007.Google Scholar
- 21.Numminena, J., and Curio, G., Differential effects of overt, covert and replayed speech on vowel- evoked responses of the human auditory cortex. Neurosci. Lett. 272:29–32, 1999.Google Scholar
- 22.Chakrabarti, S., Sandberg, H. M., Brumberg, J. S., and Krusienski, D. J., Progress in Speech Decoding from the Electrocorticogram. Biomed. Eng. Lett. 5:10–21, 2015.Google Scholar
- 25.Cummingsn, A., Seddoh, A., and Jallo, B., Phonological code retrieval during picture naming: Influence of consonant class. Brain Res. 1635:71–85, 2016.Google Scholar
- 26.Fry, D. B., The Physics of Speech (Cambridge Textbooks in Linguistics). Cambridge University Press Online Publication, 2012.Google Scholar
- 27.Jahangiri, A. and Sepulveda, F., The contribution of different frequency bands in class separability of covert speech tasks for BCIs. In Engineering in Medicine and Biology Society (EMBC), 2017 39th Annual International Conference of the IEEE, pp. 2093–2096: IEEE, 2017.Google Scholar
- 29.I. BioSemi, ActiveTwo-Multichannel, DC amplifier, 24-bit resolution, biopotential measurement system with active electrodes, 2001.Google Scholar
- 32.Greenlee, J. D. et al., Human Auditory Cortical Activation during Self- Vocalization. PLoS One 6(3):1–15, 2011.Google Scholar
- 33.Chi, X., Hagedorn, J. B., Schoonover, D., and D'Zmura, M., EEG-Based Discrimination of Imagined Speech Phonemes. International Journal of Bioelectromagnetism 13(4):201–206, 2011.Google Scholar
- 34.Gómez-Herrero, G., Automatic artifact removal (AAR) toolbox v1. 3 for MATLAB. 2007.Google Scholar
- 35.Gomez-Herrero, G., et al., Automatic Removal of Ocular Artifacts in the EEG without an EOG Reference Channel. In: Signal Processing Symposium NORSIG, pp. 130–133, 2006.Google Scholar
- 36.Xun, C., Chen, H., and Hu, P., Removal of Muscle Artifacts from Single-Channel EEG Based on Ensemble Empirical Mode Decomposition and Multiset Canonical Correlation Analysis. J. Appl. Math. 2014:1–10, 2014.Google Scholar
- 38.McMenamina, B. W. et al., Validation of ICA-Based Myogenic Artifact Correction for Scalp and Source-Localized EEG. Neuroimage 49(3):2416–2432, 2010.Google Scholar
- 40.Shie, Q., and Dapang, C., Optimal biorthogonal analysis window function for discrete Gabor transform. IEEE Trans. Signal Process. 42(3):694–697, 1994.Google Scholar
- 41.Qian, S., and Chen, D., Discrete Gabor transform. IEEE Trans. Signal Process. 41(7):2429–2438, 1993.Google Scholar
- 42.Quiroga, R. Q., Blanco, S., Rosso, O. A., Garcia, H., and Rabinowicz, A., Searching for hidden information with Gabor Transform in generalized tonic-clonic seizures. Electroencephalogr. Clin. Neurophysiol. 103(4):434–439, 1997.Google Scholar
- 43.Blanco, S., D'Attellis, C. E., Isaacson, S. I., Rosso, O. A., and Sirne, R. O., Time-frequency analysis of electroencephalogram series. II. Gabor and wavelet transforms. Phys. Rev. E 54(6):6661–6672, 1996.Google Scholar
- 44.Bekhti, Y., Strohmeier, D., Jas, M., Badeau, R., and Gramfort, A., M/EEG source localization with multi-scale time-frequency dictionaries. In 6th International Workshop on Pattern Recognition in Neuroimaging (PRNI), pp. 31–35, 2016.Google Scholar
- 47.Varghese, S. M., and Sushmitha, M. N., Efficient Feature Subset Selection Techniques for High Dimensional Data. IJIRCCE 2(3):3509–3515, 2014.Google Scholar
- 48.Sutha, K., and Tamilselvi, J. J., A Review of Feature Selection Algorithms for Data Mining Techniques. IJCSE 7(6):63–67, 2015.Google Scholar
- 49.Kumar, V., and Minz, S., Feature Selection: A literature Review. Smart Computing Review 4(3):211–229, 2014.Google Scholar
- 51.Rojas-Thomas, J. C., New version of Davies-Bouldin index for clustering validation based on cylindrical distance. In V Chilean Workshop on Pattern Recognition, pp. 81–86, 2013.Google Scholar
- 52.Maulik, U., and Bandyopadhyay, S., Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12):1650–1654, 2002.Google Scholar
- 53.Webb, A. R., Statistical pattern recognition. Hoboken: John Wiley and Sons Ltd., 2002.Google Scholar
- 57.Cortical Functions Rererence. Trans Cranial Technologies ldt. Wanchai, Hong Kong, 2012.Google Scholar
- 59.Jahangiri, A., Chau, J. M., Achanccaray, D. R., and Sepulveda, F., Covert speech vs. motor imagery: a comparative study of class separability in identical environments. In: Engineering in Medicine and Biology Society (EMBC), 2018 40th Annual International Conference of the IEEE, IEEE, 2018.Google Scholar
- 61.Watkins, K., and Paus, T., Modulation of Motor Excitability during Speech Perception: The Role of Broca’s Area. Cogn. Neurosci. 16(6):978–987, 2004.Google Scholar
- 62.Tian, X., and Poeppel, D., Mental imagery of speech: linking motor and pereptual systems through internal simulation and estimation. Frontiers in Human Neuroscience. 2, 2012.Google Scholar
- 65.Schiller, N. O., Bles, M., and Jansma, B. M., Tracking the time course of phonological encoding in speech production: an event-related brain potential study. Cogn. Brain Res. 17:819–831, 2003.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.