Keywords

1 Introduction

Figure 1 presents some conditions that can impair cognitive and/or motor function. The bottom half of this figure includes locked-in syndrome (LIS) and complete locked-in syndrome (CLIS). In these syndromes, patients have little or no remaining motor control. Thus, they may not even be able to communicate with assistive technologies or alternative and augmentive communication systems that are designed for disabled persons with some remaining movement. For these patients, a brain-computer interface (BCI), which can provide communication without movement, could be the only means of communication possible for them. A BCI is a real-time system that measures activity from the brain, automatically analyzes the data, and provides some output that influences user interaction. Classic BCI review articles have focused on BCIs as communication systems for persons with LIS and CLIS, such as patients with late-stage Amyotrophic Lateral Sclerosis (ALS; [1, 2]). Over the past several years, BCI research has also begun to address new BCI approaches that can help broader groups of patients [3,4,5,6]. One prominent new approach uses BCIs to assess conscious awareness and provide communication for persons diagnosed with a disorder of consciousness (DOC). Figure 1 includes three different categories of DOC. In coma, patients do not appear to have any cognitive or motor functions. In the unresponsive wakefulness state (UWS), patients may indicate arousal but do not seem to have any awareness. Minimal consciousness (MCS) patients also do not have reliable voluntary motor control, and have substantial cognitive impairment, although their awareness and cognitive function may fluctuate.

Fig. 1.
figure 1

The left panel shows the types of DOC as well as other conditions, categorized by remaining cognitive and motor functions. The right panel shows that typical psychological and testing batteries are not designed for DOC patients or some other persons with disabilities. DOC assessment such as the CRS-R and GCS scales are designed for DOC patients, but provide behavioral measures only. The mindBEAGLE system has an EEG-based assessment battery and communication tools for patients with DOC, as well as locked-in syndrome (LIS) and complete locked-in syndrome (CLIS).

Different factors have indicated a growing interest in BCIs for DOC patients. Numerous peer-reviewed articles on this topic have been published from several groups (e.g. [7,8,9,10,11,12,13,14]), including recent review articles [15,16,17]. Several major international conferences in 2016 alone held workshops, special sessions, symposia, or related events focused on this research direction. Examples include the Sixth International Brain Computer Interface Meeting (Pacific Grove, CA), 18th International Conference on Human Computer Interaction (Toronto, ON), 46th Annual Society for Neuroscience meeting (San Diego, CA), 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Orlando, FL), and the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (Rome, IT). These and other conferences have also featured talks, posters, papers, and other work presenting new research with BCIs for DOC. The Annual Brain-Computer Interface Research Award has had a growing number of submitted projects, and more projects nominated for awards, that relate to BCIs for DOC [6]. Most importantly, ongoing advances in relevant technologies and validation efforts with patients provide growing hope that BCI technology can help patients and their families.

These articles, conference activities, and award submissions have presented different BCI platforms from different groups that use a variety of analysis methods, hardware, and software. One of the platforms that several groups have used across different studies is the mindBEAGLE platform [17]. This system provides BCIs based on motor imagery (MI) and different evoked potential (EP) paradigms, and has been used with over 100 persons with DOC to date. Given the growing usage of mindBEAGLE and related systems with a vulnerable patient group, and the potentially life-changing impact of a system error, there is an increasingly pressing need for basic validation studies with healthy people to confirm that the BCI system works as expected. Such studies may provide limited new scientific knowledge, but are crucial precursors to wider adoption.

The main goal of the present study was to validate different components of the mindBEAGLE system with healthy users. We evaluated assessment tools that used motor imagery and three types of EP paradigms that relied on auditory or vibrotactile stimuli. We tested the hypothesis that healthy persons would exhibit indicators of conscious awareness, and be able to communicate, using the mindBEAGLE system. Obviously, results indicating that any healthy person did not exhibit conscious awareness would raise serious concerns about the system. We evaluated the assessment tools with regular system operation and “sham” testing. We also evaluated the BCI-based communication components with the MI approach and one of the vibrotactile EP paradigms.

2 Methods

2.1 Equipment

All of the hardware and software used for data recording, stimulus presentation, and data analysis were implemented through the mindBEAGLE platform. The mindBEAGLE platform has a laptop, a g.USBamp signal amplifier, one g.STIMbox, one EEG cap, three vibrotactile stimulators, two earbuds, and all cables required to connect system components to each other. The g.USBamp provided 24 bit ADC resolution, and the cap contained 16 g.LADYbird active electrodes positioned at sites FC3, Fz, FC4, C1, C2, C3, Cz, C4, C5, C6, CP1, CP3, CPz, CP2, CP4 and Pz. Figure 2 shows the mindBEAGLE system used in this study.

Fig. 2.
figure 2

This image shows mindBEAGLE system components. The left side shows the amplifier, electrode cap, and earbuds. The middle part of the image shows a laptop with mindBEAGLE software running, and the right side shows one of the vibrotactile stimulators. The image on the laptop shows some of the real-time feedback during the AEP paradigm. This includes an electrode signal quality check, raw EEG from different channels, and EPs from target and nontarget stimuli (with differences between them shaded in green). The green progress bar on the bottom of the monitor shows the time remaining before a break. Operators can also pause or stop the system using the icons in the top right. (Color figure online)

2.2 Participants

The participants were three healthy persons (2 male, age 38–43, SD = 2.6). All participants reported that they had never been diagnosed with any DOC, neurological damage, or psychiatric conditions. The participants signed a consent form, and the procedure was approved by the local ethics committee. All participants’ native language was German. Hence, the mindBEAGLE system was set to provide all instructions in German.

2.3 Experimental Procedure

The participants were seated during the study. Participants were positioned so they could not see the mindBEAGLE laptop monitor, since the monitor activity might have distracted participants and disrupted the sham condition. Each recording session began with mounting the electrode cap, earbuds, and vibrotactile stimulators. One stimulator was placed on each hand, and a third was placed on the middle of the back. The experimenter explained the procedure for each run and played examples of each stimulus to the subject. For example, the experimenter played the words “left” and “right” in the motor imagery paradigm, and showed the subject what each of the three vibrotactile stimuli felt like.

The mindBEAGLE system has software modules that can manage four types of paradigms. These paradigms are: Auditory evoked potential (AEP); Vibro-tactile stimulation with 2 tactors (VT2); Vibro-tactile stimulation with 3 tactors (VT3); and Motor imagery (MI). The first three of these paradigms rely on EPs, while the MI paradigm relies on event-related (de) synchronization (ERD/S) around roughly 10–12 Hz [3]. All four paradigms can be used for assessment, while the VT3 and MI paradigms can also be used for communication. All instructions, stimuli, and feedback were presented to the participants via the earbuds, and the VT2 and VT3 paradigms also utilized the vibrotactile stimulators. The software presented information to the system operator through the laptop monitor as well (see Fig. 2).

Each participant performed two regular runs and one sham run across each of the AEP, VT2, VT3, and MI assessment paradigms. Next, each participant performed two regular runs with each of the VT3 and MI communication paradigms. Therefore, there were twelve assessment and then four communication runs. Aside from the constraint that all assessment runs occurred before any communication run, the order of the runs was decided pseudo-randomly. The communication runs each entailed five or ten yes/no questions. Subjects took a brief break between each run.

2.3.1 Assessment Runs

The AEP paradigm utilized two stimuli: high and low pitch tones (1000 and 500 Hz) presented at a ratio of 1:7. Each tone lasted 100 ms, and the delay between tone onset was 900 ms. Participants were instructed to silently count the less frequent high tones and ignore the low tones, thus creating a classic “oddball” paradigm [18]. Each run had four trial groups. Each trial group had 30 trials that each contained one high tone and seven low tones, in pseudorandom order. Thus, each run presented 480 tones in total. There was no pause between trial groups.

The two stimuli in the VT2 paradigm were vibrotactile stimulators on the left and right wrists. Like the AEP paradigm, each run contained 4 trial groups with 30 trials each. At the beginning of each trial group, each participant was instructed via earbuds to silently count each pulse to the target wrist and ignore pulses to the nontarget wrist. Each trial presented one vibrotactile pulse to the target wrist and seven pulses to the nontarget wrist. Each pulse lasted 100 ms with a 300 ms delay between pulse onset. The software automatically decided which wrist to designate as the target wrist (left or right) at a 1:1 ratio, and provided instructions and stimuli accordingly.

The VT3 paradigm was identical to VT2, except as follows. First, a third vibrotactile stimulator was placed on the back. Second, each trial presented three types of stimuli – one to the left wrist, one to the right wrist, and six to the back, determined pseudorandomly. Like VT2, the instructions prior to each trial group instructed the participants to silently count pulses to either the left or right wrist at a 1:1 ratio. The back was never designated as a target, and was thus intended as a “distracter” stimulus within the oddball paradigm [18].

Since each trial presented one target stimulus and seven other stimuli (non-target or distracter), chance accuracy was 12.5%. The mindBEAGLE software treated all eight stimuli within each trial equally in terms of classification; that is, non-target stimuli were not grouped together and the classifier was blind to the target stimulus.

In the MI paradigm, each participant heard the word “left” or “right” at the beginning of the trial, followed by a tone 2 s later. These two cues instructed the participant to imagine moving the left or right hand (1:1 ratio, determined randomly) for four seconds when the tone begins. Next, another tone cued the participant to relax. Next, a random interval of 0.5–2 s provided a brief break before the next trial. Each MI run had 60 trials (30 for each hand) and lasted about 9 min.

2.3.2 Communication Runs

The VT3 and MI communication runs were identical to the corresponding assessment runs, with several differences. Prior to each trial group, the system provided a pause during which the experimenter asked the participant a YES/NO question. In this study, we only asked questions in copy-spelling mode, meaning that the answers were known beforehand.

In the VT3 paradigm, subjects were told to count pulses to the left wrist to answer YES or right wrist to answer NO. Each VT3 communication run consisted of one trial group with 15 trials (120 total stimuli), and lasted about 38 s. At the end of each run, a circle near the bottom of the monitor moved to YES or NO if the classifier chose the left or right wrist, or remained in the center if the distracter was selected, reflecting an indeterminate response.

In the MI paradigm, each participant was asked to imagine left hand movement to answer YES and right hand movement to answer NO. Each MI run consisted of one trial lasting about eight seconds, and thus could potentially provide faster communication. At the end of the run, the circle moved to YES or NO; indeterminate responses were not possible. The time estimates for VT3 and MI communication runs do not include the time required to ask a question, convey the answer, or pause before the next run if desired.

2.3.3 Regular vs. Sham Runs

During the regular runs, the mindBEAGLE system performed normally. During the sham runs for the assessment paradigms, the subjects still wore the cap and followed a similar procedure, but did not receive critical stimuli from the mindBEAGLE system. The vibrotactile stimulators were unplugged during the VT2 sham runs. Thus, while subjects still heard instructions during the VT2 sham runs, they never received tactile stimuli required to elicit EPs. During the AEP, VT3, and MI sham runs, the system was muted. Thus, the participants did not hear the auditory cues that elicit EPs (AEP) or the system’s instructions cueing them to the left or right and information about trial timing (VT3 and MI).

2.4 Signal Processing and Classification

The mindBEAGLE software installed on the laptop in Fig. 2 managed all data recording and processing in real-time to allow real-time feedback. Data were sampled at 256 Hz and bandpass filtered from .1-30 Hz. In the AEP, VT2 and VT3 paradigms, data from eight sites (Fz, C3, Cz, C4, CP1, CPz, CP2, and Pz) were used. First, epochs were created with the data from -100 ms before to 600 ms after each auditory or vibrotactile stimulus began. Automated tools then performed baseline correction based on the 100 ms preceding stimulus onset and rejected all trials in which EEG amplitude exceeded ±100 µV. Next, a linear discriminant analysis (LDA) classifier attempts to identify which of the eight stimuli presented in each run is the target stimulus. The LDA classifier then checks to see which stimulus was the target and calculates a classification accuracy that ranges from 0% to 100%. This process is repeated as more trials are presented within each trial group, and classifier accuracy can be plotted against the number of events required to attain that level of classifier performance. The software also presents the EPs on the laptop, updated as new epochs are created. A significance test is performed that presents areas with significant differences between targets and non-targets as green-shaded areas in the EPs (p < 0.05). Trials where the amplitude of the EEG signal exceeds a threshold are rejected from the EP and classifier calculation.

In the MI paradigm, a common spatial patterns (CSP) classifier is trained on data from 3–5 s after the cue to begin an imagined movement (left or right). This training creates weights for each electrode that reflect the relative contributions of different electrode sites to correct classification. The system then estimates the variance of a 1.5 s window and trains an LDA classifier to calibrate the system for the participant. Like the EP paradigms, the LDA classifier then outputs a classification accuracy that can range from 0% to 100%. The mindBEAGLE manual recommends a threshold of 64% for MI assessment paradigm, which is calculated with a binomial test (alpha < 0.05). We used a threshold of 60% for EP assessment paradigms based on our experience with patients.

The four paradigms were trained as follows. Within each paradigm, data from the first assessment run was used to train the classifier that was used for real-time analysis during the second assessment run. The resulting classifier settings were then used for real-time analysis in the communication runs for the VT3 and MI paradigms. Also, when mindBEAGLE presents classifier results at the end of each run and in saved files, these results are based on a classifier that was updated based on that same run. For example, the results displayed on the monitor at the end of the first VT2 assessment run reflect a classifier that was trained on the first VT2 assessment run. Otherwise, accuracy would have to be based on generic data templates, which would be less accurate. All accuracy plots employed a cross-validation strategy to counteract overfitting.

3 Results

Table 1 summarizes results from the regular and sham runs. The table includes classification accuracies during assessment runs and communication. Next, Fig. 3 shows results from all three of the participants.

Table 1. This presents results from the three participants (S1–S3), including the BCI classification accuracies from assessment and communication runs. This table also shows number of questions answered correctly and the number of questions asked (e.g. 5 questions are answered correctly out of 5).
Fig. 3.
figure 3figure 3

These images show results from assessment runs from all three participants. Each group of images for each participant contains three rows of images. The top row presents EPs from site Cz, and BCI accuracy, for one of the two regular runs in the AEP, VT2 and VT3 paradigms. We selected a representative image from each of the two regular assessment runs for each of these paradigms. In the EP plots, the vertical red line shows the onset of the stimulus (tone or vibrotactile pulse), EPs elicited by target stimuli are shown in green, and EPs from non-target stimuli are shown in blue. The areas that are shaded light green reflect time periods with significant targets vs. non-target differences (Kruskal-Wallis test). The BCI accuracy, shown to the right of each of the EP plots, shows classification accuracy on the y-axis plotted against the number of trials that the classifier averaged together in the x-axis. The median accuracy is shown at the top right of each accuracy plot. Chance performance in all three EP paradigms is 12.5%. Middle: These images have the same format as the top images, but show results from the sham run with all three participants. Bottom: These images show changes in BCI classification accuracy during trial execution. Presentation of the cue indicating which hand movement to imagine occurred 2 s in to each trial, shown by the vertical red line. The horizontal red line reflects the chance performance accuracy of 50% for the MI paradigm. Each image shows the results averaged across all trials from left hand imagination, right hand imagination and imagination of both hands. The left image shows results from regular runs, and the right image shows sham results. (Color figure online)

3.1 Regular Runs

Across the four paradigms, assessment accuracy ranged from 60% to 100%. These results are above the thresholds we recommend, which are 60% for the EP paradigms and 61% for the motor imagery paradigm. The assessment accuracy was not always 100% and did vary between runs. Fluctuations in accuracy have been widely reported in the BCI literature [3], and the fluctuations did not push the results from any regular assessment run below the threshold. Nonetheless, these results underscore the importance of multiple recording sessions to properly assess patients, discussed below. For the communication runs, results generally showed that the BCI could answer YES/NO questions. S2’s performance fluctuated from 20% to 100%, while the two other participants each achieved 80% to 100% accuracy.

The EPs from the regular runs in Fig. 3 show common EP components such as the N1 (most distinct in AEP), P2, and P3 [19]. The N1 does not differ substantially for target vs. nontarget trials, as expected for such an early EP component. The green shaded areas show that the P2 and P3 exhibit significant target vs. nontarget differences. BCI classification accuracy reaches 100% after about five events and remains at 100% as more events are averaged together. The MI accuracy plots in Fig. 3 indicate that S1’s average performance (green line) increased from about 50% (chance accuracy) early in the trial to over 80% about two seconds after cue onset, as typical of MI BCI paradigms [3, 20].

3.2 Sham Runs

For the three EP paradigms, accuracy ranged from 0% to 25%. This is well below the EP threshold of 60%, and is closer to the chance accuracy of 12.5%. Assessment results from the MI paradigm ranged from 50% to 59%. This is also below the threshold of 64%, and near the chance accuracy of 50%. Hence, results from sham assessments across all four paradigms did not cross the threshold.

The results shown in Fig. 3 further indicate that the system did not detect activity that reflects stimulus and task following during any of the sham run. The EPs do not reflect any activity associated with selective attention to stimuli, the target and nontarget lines look similar to each other (without noteworthy significant differences shaded green), and classification accuracy is very low regardless of how many trials are averaged together. The MI classification accuracy remains near chance level throughout the trial with a low median accuracy. All of these MI results, like the EP results, are consistent with expectations for regular and sham operation.

4 Discussion

Results from all four paradigms (AEP, VT2, VT3, and MI) were appropriate for healthy persons. That is, these four paradigms indicated that the three healthy participants did exhibit conscious awareness, and could communicate using the BCI tools, in regular and (for assessment) not sham paradigms. These results indicate that the system is working as expected. However, one participant did not perform well in the first VT3 communication run, which may be due to training effects (learning the relatively difficult VT3 paradigm), distraction, or other causes. Patients may exhibit greater fluctuations. Thus, when working with patients whose conscious awareness may fluctuate, repeated assessments are strongly recommended. We also recommend attempting BCI communication as soon as possible after an assessment indicates this is possible in many cases. If patients may present only a limited window of awareness when communication is possible, then opportunities for them to communicate be rare.

This ties in to a related question. What does the assessment assess? The most direct answer is that the assessment protocol assesses the user’s ability to generate distinct EEG signals, by performing simple tasks that do not require movement, which a specific signal processing platform can discriminate. The results may be used to infer the potential for communication through a BCI. More broadly, proof that a patient can perform the required mental activities could influence the views of family, friends, and physicians relating to the patient’s state and treatment.

BCIs for this latter type of assessment may develop into clinical decision support tools. This prospect raises many issues relating to new standards and norms. Prior work has suggested standardized scales for clinical DOC diagnosis that include the EEG. Recording and utilizing EEG activity using auditory BCI methods during the CRS-R scale could also provide supplemental information [14]. Other standards could address the number and types of assessments needed, training requirements and recommended methods, ethical guidelines, and certifications for equipment and staff expertise. The hierarchical approach to assessing conscious function that has been proposed [21, 22] could lead to standardized scales and testing methods that conduct specific tests in sequence to provide a much richer picture of remaining cognitive function.

The hierarchical approach entails passive vs. active paradigms. In passive paradigms, the user is not instructed to perform any task. Passive paradigms can thus assess relatively basic brain function. For example, the mismatch negativity (MMN) is a signal that may be elicited by an oddball stimulus even without any instructions. The P300 and other signals can also be elicited through passive paradigms. Indeed, the AEP and VT2 paradigms presented here could be adapted to passive paradigms by removing the instruction to count the oddball stimulus. These modified paradigms could supplement existing and new paradigms to evaluate different levels of cognitive function. Other simple modifications to existing paradigms could involve removing distractor and even non-target stimuli to assess both active and passive P300s and related EPs [18, 23]. Auditory tests could present more complex information, such as different words, which could add established paradigms for DOC assessment such as the subject’s own name (SON) paradigm [7] or N400 priming [9].

We do note some limitations of this study. We did not explore a communication sham condition. The AEP and VT2 sham runs did not present any stimuli. Thus, EEG activity or noise that could have been caused by stimuli would not occur in the sham condition. The reason is that presenting stimuli could have led to passive P300s. Also, both the experimenter and the subject were aware of which runs were regular or sham. Blinding the experimenter and especially the subject would have required changing the paradigm in some way, which we did not want to do. Furthermore, the differences between regular and sham performance are quite pronounced, both in BCI performance and EPs, and these minor confounds are probably irrelevant. Also, we could study performance across many recording sessions in future work. Additional sessions with healthy users might have indicated training effects and provided more information about system consistency.

In future work, the YES/NO communication tools could become faster, such as by training persons with the MI approach or reducing the number of events before the classifier reaches a decision. New devices such as belts, necklaces, bracelets, or other wearable devices with a variety of stimulators could provide broader communication options, as could more varied auditory stimuli. These new communication approaches must account for many unique design issues for interacting with target users, including non-visual stimuli [24,25,26,27]. Future systems might further leverage the “hybrid BCI” approach in mindBEAGLE with a broader variety of BCI communication tools [28]. The inclusion of MI and other approaches in mindBEAGLE provides some flexibility for end users. MI BCIs may require more training and may not work for some users, but could provide faster communication, at least with the settings used here.

Many other future directions merit further research. In addition to advanced communication, patients might be provided with sensory, cognitive, and/or motor rehabilitation tools. New methods could aim to predict recovery or periods of conscious awareness. Improved hardware might provide more EEG channels, higher signal quality and smaller and more comfortable electrodes (perhaps allowing long recording periods that could detect conscious awareness and inform staff). Improved software could incorporate new protocols for assessment and communication, perhaps within a hierarchy or scale, as well as better signal processing that might use ECG, EOG, EMG, and/or other signals. This approach could provide a hybrid BCI for communication for persons with LIS and CLIS who cannot use visual stimuli [3]. Perhaps most importantly, additional research with target patients in real-world settings is essential for validating systems and approaches and providing new data.

In summary, the present results indicate that the mindBEAGLE assessment tools work when expected, providing correct results that indicate conscious awareness during regular operation and indicating otherwise during sham operation. The communication components of mindBEAGLE were also successful, although (like the assessment components), the MI approach should be supplemented with other tools. Overall, these results could support and encourage wider system use.