1 Introduction

Electroencephalography (EEG) is a technique of measuring voltage fluctuations on the scalp, arising from large pools of electrical currents generated by regional neural activity. Although EEG is largely viewed as a medical technique used for diagnosis of brain dysfunction and injury [1], it has also been used as a research tool for several decades, especially in conjunction with cognitive and perceptual neuroscience studies aimed at identifying the neural basis of behavior. In more recent years, a wide range of studies have shown the utility of using EEG to identify or classify different cognitive states (such as drowsiness and fatigue [2], or anxiety [3]) as well as a the detection and identification of novel events [4, 5]. Based on these successes, there is increasing promise that EEG can be a useful tool as a component of human-computer interactive systems [6], many of which could be embedded within every day, typical-life scenarios.

However, currently most implementations of such algorithms remain constrained to laboratory or fairly limited short usage cases, leaving a lack of transition into real-world environments. While there is increasing interest in performing so-called real-world neuroimaging [7] – that is, monitoring brain activity within its full, natural context of the real world – current designs for EEG systems are not amendable to usage in a truly fieldable format, and a number of technical hurdles must be addressed. For example, because the brain-source voltage fluctuations represented by EEG are extremely small (microvolts) against a potential background that is heavily contaminated by larger fluctuations (milivolts to volts) created by movement artifacts [8, 9], environmental electrical noise, and long-term capacitive drift effects [10], the overall signal-to-noise (SNR) is very poor. As a result, EEG data acquisition (DAQ) systems must be extremely sensitive.

The conventional approach for addressing this problem is to use very high-resolution components, such as low-noise amplifiers and 24-bit sigma-delta analog-to-digital converters (ADCs), on the ideal premise of acquiring a maximal resolution signal in order to guarantee information content from the data. Targeting ideal, high-resolution signals is particularly pertinent in research settings where the user may not know exactly what features of the data will be the most critical for a successful outcome, or in medical cases where very small differences may have critical health relevance. Unfortunately this demand for high resolution comes at the cost of systems tending to be being very expensive and power-draining. This is generally a trivial concern for typical laboratory or patient-care environments where the benefits of having a high-quality DAQ system outweigh the disadvantages of size, cost, and power consumption of the system. However, it creates substantial limitations on the ability to build and use truly fieldable, long-application-time systems designed for use in every-day settings [7, 11] or use in large scale “crowdsourced” implementations where cost would play a major factor [12].

In many targeted research and translational neuroscience settings, as well as brain-computer interaction technologies, the user typically already has a specific use identified for the acquired data stream, and knows what the relevant signal properties are ahead of time. In these cases, having perfectly “ideal”, high-resolution DAQ systems may not be necessary as long as the most critical features for the application can be extracted, and considerable savings in both power and expense can be garnered. For example, the power consumption of an ADC increases exponentially with bit resolution, with each additional bit requiring nearly 10× the power draw [13, 14]. Since the analog front end is a major portion of the total power consumption of a typical EEG DAQ system, this equates to a substantial difference between, for example, 24, 16, or even 10-bit resolution. Meanwhile, ultra low-noise amplifiers are costly and challenging to design in a low-power format, and in many cases the added sensitivity results in more susceptibility to artifacts.

This suggests that, at least for targeted applications, there may be room for considerable power and cost savings by sacrificing the resolution of certain components of the DAQ system. To date, however, this issue has been largely ignored, with no systematic analyses of the pragmatic minimum technical specifications necessary to detect or assess particular brain mental states. While a practical minima will vary between applications, at the very least there is a need to establish paradigms or test methods for answering this question.

Here, we highlight results from two target mental-state detection applications, with the goal of providing initial insight into the necessity for such analyses and the relationship between DAQ signal fidelity and performance for specific applications. Applications include: (1) detection of alpha spindle oscillations representative of “drowsiness” during a driving task, and (2) the detection of P300 visual evoked responses a rapid serial visual presentation (RSVP) paradigm. In each case, we examine the effect of simulating degraded DAQ performance (decreased vertical resolution and increased RMS noise) on the performance of classifiers already shown to have reasonable success at differentiating the target states [1518].

2 Methods

We empirically assessed the practical impact of EEG signal record fidelity by evaluating the performance of several classifiers at various stages of signal degradation. Here we chose to classify two different neural events of interest: alpha spindle oscillations [15, 16] which have been associated with fatigue and drowsiness in prolonged experiments, and P300 visual evoked responses elicited during a rapid serial visual presentation (RSVP) experiment [19, 20], a neural feature associated with the detection and recognition of novel visual stimuli. For each neural event class (alpha oscillations and P300 responses) two separate classifiers were used. Details about the data and classifiers used in each paradigm are described below.

2.1 Datasets

Driving Task – Drowsiness Detection.

The datasets used here have been analyzed previously [15]; a brief description is presented here. Two healthy males, both right handed, were instructed to drive in a simulated driving environment in a sound-attenuated room. The subjects were presented with a straight four-lane highway with minimal scenery (highway, roadside, and horizon) except for an occasional speed limit sign. Subjects were requested to maintain the posted speed limit (25 mph or 45 mph) while keeping their vehicle in the center of the lane. A perturbation force would occasionally cause the vehicle to veer left or right in a manner similar to the effects experienced when a gust of wind crosses a real vehicle [21]. Subjects drove for approximately 70 min after a 10–15 min practice period. The EEG was recorded using a 64-channel Biosemi ActiveTwo system, and offline referenced to the average of the two mastoids.

RSVP Task – Target Object Recognition.

The data used here has been analyzed previously [17, 22]. Short video clips were used in an rapid serial visual presentation (RSVP) paradigm. Video clips included two classes of scenes, either those containing people or vehicles on background scenes, or containing background scenes alone. Observers were instructed to make a manual button press with their dominant hand when they detected a person or vehicle (targets), and to abstain from responding when a background scene (distractor) was presented. Video clips consisted of five consecutive images of 100 ms in duration each; each video clip was presented for 500 ms without pause between, such that the first frame was presented immediately after the last frame of the prior video. If a target appeared in the video clip, it was present on each 100 ms image. The distracter to target ratio was 90/10. RSVP sequences were presented in two minute blocks after which time participants were given a break, and participants completed a total of 25 blocks.

2.2 Classifiers

Drowsiness Detection.

Two classifiers were used for detecting alpha spindle oscillations in EEG. The first method, proposed by Simon and colleagues [16] defines the Full-Width at Half-Max (FWHM), which is a measure of peak amplitude and frequency distribution in the alpha band in sliding EEG windows. The method starts by determining if the largest peak of the power spectrum from 4–50 Hz in the EEG window lies in the alpha band (8–13 Hz). If the peak lies in the alpha band, the width of the peak at half maximum is calculated. If this width is less than two times the bandwidth of a Hamming window, it is identified as an alpha oscillation. The second method is based on sequentially discounted autoregressive modeling (SDAR) of a narrowband filtered EEG signal [15]. The goal of the SDAR approach is to detect statistically irregular time segments in a time-adaptive nature. The statistics of specificity, precision and recall are used to evaluate the performance of both algorithms under varied levels of signal fidelity (see below).

P300 Object Recognition.

Two classifiers were used for detecting target object recognition (e.g., P300 visual evoked potentials) during the RSVP task. The first method, Hierarchical Discriminant Component Analysis (HDCA) [19, 23], is a two stage binary classification method which is based on an ensemble of logistic regression classifiers. In the first stage, the EEG data window, relative to image onset, is divided into K non-overlapping segments. In each segment, a logistic regression classifier is trained to segregate between two different sets of stimuli. This logistic regression is trained for each segment independently. The outputs of these regressions are then used as parameters in another logistic regression to make the final overall classification. The first stage classification collapses information across channels in a small time window, while the second classification collapses information across time. We used K = 10 windows for discrimination as this value has been used previously in similar studies [20].

The second classification method employed here used a combination of the xDAWN spatial filtering technique coupled with a Bayesian linear discriminant analysis (BLDA) classifier. Collectively this technique will be referred to as XDBLDA; a full description can be found in [2426]. xDAWN spatial filtering results in a set of spatial filters that are rank ordered such that the highest rank filters maximize the signal to signal plus noise ratio in the EEG signals. Our implementation uses the top eight spatial filters for classifier input, and the input vector is obtained by concatenation of the eight spatially filtered EEG signals. The BLDA classifier is then used to discriminate targets from non-targets. For both classifiers, the area under the receiver operating characteristic (AUC) curve was used to evaluate overall classifier performance under each level of signal fidelity. In order to assess baseline control (chance) AUC values for each classifier, performance was re-assessed with data event tags randomly intermixed between target and non-target classes for each dataset and trained using cross-validation.

2.3 Simulated Degradation of Signal Fidelity

In order to simulate the effects of data acquisition using different bit-rate ADCs, the previously-acquired data from each subject were iteratively re-quantized using a rounding quantization scheme for a series of 6 to 24 bit depths covering a range of ± 4 mV (8 mV total). Values outside this range were clipped. This yields a net effect of decreasing the effective vertical resolution (increased step size) between possible data points and likewise increasing “blockiness” of continuous waveforms and loss of discrimination of small-amplitude fluctuations (See the top row of Fig. 1 for examples). This step was performed at the earliest stage prior to any further processing and application of each classification algorithm. We created a total of 13 datasets per subject ranging from a minimum resolution of 62.5 nV to 256 μV in log2 scale. To simulate effects of amplifier random noise on data acquisition, a series of noise files with uniform random distribution were iteratively created and added to the original data. Noise data were zero-centered with uniform distribution, each with root-mean-square power (across the entire waveform). Amplitude values tested ranged from 2 to 62 μV RMS. The bottom row of Fig. 1 shows an example of how this affects the data. Note that these steps were applied to the raw data prior to any signal processing required for each classification method.

Fig. 1.
figure 1figure 1

Examples of how the recorded signal is affected by decreasing vertical resolution (top row) and increased RMS random noise (bottom row). As the vertical resolution decreases (top row, left to right), signal features of low amplitude are effectively removed, while the overall signal becomes similar to that of a step function. Also, as the RMS noise increases (bottom row, left to right), small changes in the overall signal become masked and more difficult to observe.

3 Results

3.1 Decreased Vertical Resolution

First, we examined the importance of the vertical resolution of the acquired signal, simulated by re-quantizing pre-recorded data to represent a range of different ADC bit rates. Figure 2 shows results for two types of experimental paradigms, alpha-oscillation detection during a driving task (Panels 2A–2C) and P300/target object detection during an RSVP task (Panel 2D). For alpha oscillation detection, we tested the performance of two existing classifiers, the SDAR method of Lawhern and colleagues [15] (blue lines in Fig. 2) and the FWHM-based method of Simon and colleagues [16] (red lines) for each of two subjects (separate similar colored lines in Fig. 2). Each panel depicts scores based on recall (Panel 2A), precision (2B) and specificity (2C) per ADC rate.

Fig. 2.
figure 2figure 2

Classifier performance per vertical resolution for alpha oscillation detection evaluated as (a) Recall, (b) Precision, and (c) Specificity; and group mean AUC for P300 detection (d). Blue lines: SDAR performance (2 subjects); Red FWHM (2 subjects); Black: XDBLDA (mean ± SD);Green: HDCA (mean ± SD) (Color figure online).

Similar to previous reports, overall performance for the AR-based method (blue lines) is stronger than for the spectral method [15], especially for subject 2 (lowest line). More notably however, this overall trend is fairly consistent through vertical resolutions as large as 16 µV. Beyond this point, all three statistics become less stable, especially recall and sensitivity for the spectral classifier, suggesting substantially higher false positive rates.

Meanwhile a similar trend is seen when examining P300 detection performance (Fig. 2D) with XDBLDA (black lines) and HDCA (green lines). Here, group mean values for area under the curve (AUC) are shown for a set of 15 subjects. As has been reported, overall performance with XDLBDA is very accurate, superseding HDCA; however both classifiers follow similar patterns with consistent performance through 8 µV of vertical resolution. Notably, even above this point, performance plateaus above chance, confirmed in our randomized control simulation averaged to 0.70 ± 0.025 AUC for XDLBDA and 0.51 ± 0.026 AUC for HDCA.

3.2 Increased Random Noise

Figure 3 shows results of adding simulated uniform random noise to each dataset and classifier similar to the above.

In this case, the trend is for a more gradual decline, with performance not becoming notably unstable for alpha-oscillation detection (Fig. 2A–C) until around 11–13 µV RMS noise is added. As before, the AR classification method appears more robust overall to the signal degradation. A similar pattern is seen with P300 detection in an RSVP task (Fig. 3C) – both XDBLDA and HDCA show steady declines in performance. In this case even though additional noise cases (up to 60 µV) are shown, group mean AUC rates remain significantly above estimate levels of chance.

Fig. 3.
figure 3figure 3

Classifier performance per increased RMS noise added to the signal for alpha oscillation detection evaluated as (a) Recall, (b) Precision, and (c) Specificity; and group mean AUC for P300 detection (d). Blue lines: SDAR performance (2 subjects); Red FWHM (2 subjects); Black: XDBLDA (mean ± SD); Green: HDCA (mean ± SD) (Color figure online).

4 Discussion

Above, we have a shown a clear impact of acquired EEG signal fidelity on classifier performance by parametrically decreasing vertical resolution as might occur with decreasingly lower ADC bit rates, and increasing RMS background noise to the EEG signal analogous to use of poorer fidelity analog components. While it is not surprising that both of these negatively impact classification, it is notable that the outcome performance is meaningfully impacted only with a substantial degradation of the signal. Specifically, classifier performance only significantly degraded when the effective vertical resolution was at least 8 µV or background RMS added noise exceeded about 10 µV. Although initial baseline performance differed, this trend appears to be independent of the type of classifier used. These results suggest that tailored systems can functionally operate with substantially lower signal fidelity than is considered a typical requirement, and that it is important to carefully consider what are the truly necessary minimum performance specifications for the application at hand. While the traditional approach to system design for experimental settings (where the emphasis is on maximizing signal quality) has served those purposes well, the approaches in use for those applications currently suffer a number of challenges regarding overall usability [27] and are not tenable for truly mobile applications where power, size, and cost are considerable factors. In contrast, technological design for targeted real-world application must be prepared to address these challenges in light of potential tradeoff regarding idealistic signals.

Our vision of “real-world neuroimaging” (RWN) involves conducting neuroimaging within realistic, non-contrived situations, where neural responses reflect what is expected in real-life situations [6, 7, 11]. While some RWN scientific endeavors can be managed using currently available DAQ approaches without the need for high mobility or long-term recording, in order to fully realize our goals of measuring and monitoring brain activity in truly unconstrained circumstances, we must re-think the way DAQ occurs with EEG. In particular, our vision is to develop and utilize systems which are completely “user transparent” – that is, require no direct interaction or intervention from the user. One facet of this is operation solely on locally-harvested power, so as to have no requirement for recharging or battery replacement [28]. In order to meet such stringent constraints, we advocate a careful re-examination of the logic by which we collect neurophysiogical data, and focus on the specific needs of the system given it’s intended use. Some initial prototype integrated circuit designs suggest such ultra-low power operation is feasible [2931]; however an empirical evaluation of performance trade-off, such as carried out here, is necessary in order to validate the utility of these approaches.

While promising, much follow-up work remains. Regarding the examples highlighted here, the performance of HCDA and XDBLDA for P300 discrimination are surprisingly robust to the manipulations of vertical resolution and RMS noise. Even above 8 µV resolution, the performance plateau for both classifiers remains above statistical chance. It is likely that the reason for the plateau is that once resolution exceeds about 25 µV the data becomes functionally binary in nature – no meaningful brain-related change in amplitude will be observed. Thus a functional floor effect would be expected. We hypothesize that the remaining accuracy that occurs despite the overall lack of information in the amplitude domain stems from the power of pooling across multiple channels (in this case, 64). Future efforts will focus on exploring this additional feature domain, and pooling multiple signal features into a single multivariate analysis, so that we can ascertain the relative pragmatic impact of each feature, as well as the interactions between them from combined analyses. With that approach, we can develop models which focus on and refine only the most critical system design components which affect overall utility of the system. Such a refined approach is critical due to the complexity of fieldable system design, where numerous factor affect overall utility for research applications [27, 32, 33]. The cases highlighted here are only some examples from a wide variety of potential uses for EEG in fieldable application. Undoubtedly the performance of any classifier is very tightly coupled to the circumstances under which it is used and the data to which it is applied. Therefore the data presented here alone cannot be used to set overall standards for EEG data acquisition even for targeted applications. Rather, this report is intended to highlight the importance of this issue by using some specific example cases. The degree to which other applications are dependent on signal fidelity, and the specific signal features which may be lost due to decreasing fidelity, is likely to vary widely across domains. For example, successful discrimination of extremely small amplitude but highly localized signals, such as an auditory brainstem response, would likely be substantially more dependent on both vertical resolution and higher-fidelity noise, but with minimal dependence on spatial pooling. Our suggestion is for future work to focus on large-scale parametric analyses involving several datasets and classifiers covering a wide range of domains, with the goal of establishing the most critical DAQ features tied to a particular class of applications.

In summary, we have shown that successful discrimination of multiple classes of neural state is entirely feasible using signals acquired with relatively low fidelity without a direct consequence to accuracy. These results suggest that, for targeted research applications, acquiring data at the typical high-fidelity resolution is not necessary. This has direct implications for programs utilizing EEG in real-world domains where data acquisition methods are critical to success. For example the ability to use lower-resolution components, such as lower bit-rate ADCs and amplifiers, dramatically increases opportunities for design and use of ultra-low power systems, or low-cost systems that can be easily distributed and managed on a large scale.