Improving Classification Performance by Combining Feature Vectors with a Boosting Approach for Brain Computer Interface (BCI)

Rajan, Rachel; Thekkan Devassy, Sunny

doi:10.1007/978-3-319-72038-8_7

Rachel Rajan¹⁶ &
Sunny Thekkan Devassy¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10688))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

9724 Accesses

Abstract

In the classification of multichannel electroencephalograph (EEG) based BCI studies, the spatial and spectral information related to brain activities associated with BCI paradigms are usually pre-determined as default without speculation, which can lead to loses effects in practical applications due to individual variability across different subjects. Recent studies have shown that feature combination of each specifically tailored for different physiological phenomena such as Readiness Potential (RP) and Event Related Desynchronization (ERD) might benefit BCI making it robust against artifacts. Hence, the objective is to design a CSSBP with combined feature vectors, where the signal is divided into several sub bands using a band pass filter, and this channel and frequency configurations are then modeled as preconditions before learning base learners and introducing a new heuristic of stochastic gradient boost for training the base learners under these preconditions. Results showed that Boosting approach using feature combination clearly outperformed the state-of-the-art algorithms, and improved the classification performance, resulting in increased robustness.

Rachel Rajan M. Tech student; S. Thekkan Devassy Asst. Professor.

You have full access to this open access chapter, Download conference paper PDF

A Framework for Enhancing Classification in Brain–Computer Interface

EEG-based Binary Classification of Brain State of Activities Level Using a Single-Sensor Headset

Optimization of parameters for improving the performance of EEG-based BCI system

Article 22 October 2020

Keywords

1 Introduction

Brain-computer interfaces (BCIs) provide a communication channel for a user to control an external device using only one’s brain neural activity. They can be used as a rehabilitation tool for patients with severe neuromuscular disabilities [7], and also a range of other applications including neural prosthesis, Virtual Reality (VR), internet access etc. Among different types of neuroimaging techniques, electroencephalogram (EEG) is among one of the non-invasive methods exploited mostly in BCI experiments. And, among them event related desynchronization (ERD), visually evoked potential (VEP), slow cortical potential (SCP), and P300 evoked potentials are widely used for BCI studies.

In accordance with the topographic patterns of brain rhythm modulations, feature extraction using Common Spatial Patterns (CSP) algorithm [17] provides subject-specific and discriminant spatial filters. However, CSP has some limitations, as it is sensitive to frequency bands related to neural activity, because of that the frequency band are manually selected or set to a broad band filter. Apart from that, it also results in overfitting problem when dealt with large number of channels. Hence, the problem of overfitting the classifier and spatial filter rises due to trivial channel configuration. Henceforth, a simultaneous optimization of spatial and spectral filter is highly desirable in BCI studies.

Recent years, motor imagery (MI) based BCI has proven to be an independent system with high classification accuracy. Most of the MI based BCI use brain oscillations at mu (8–12 Hz) and beta (13–26 Hz) rhythms, which displays particular areas of event related desynchronization (ERD) [16] each corresponding to respective MI states (such as right hand or right foot motion). Apart from that, Readiness-potential (RP) [18] which is a slow negative event-related potential that appears before a movement is initiated can also be used as input to BCI to predict future movements. RP is mainly divided into early RP and late RP. Early RP is slow negative potential that begins 1.5 s before action, which is immediately followed by late RP that occurs 500 ms before the movement. In MI based BCI, combining of features vectors [5] i.e., ERD and RP have shown a significant boost in the classification performance.

In the literature, several number of sophisticated CSP based algorithms have been witnessed especially in the BCI study. A brief review has been presented here. Taking into account of avoid overfitting and selection of optimal frequency bands for CSP algorithm, various methods were proposed. To avoid overfitting problem, Regularized CSP (RCSP) [13] was proposed, in which the regularization information was added into the CSP learning procedure. The Common Spatio-Spectral Pattern (CSSP) [11] is an extension of CSP algorithm with time delayed sample. However, due to flexibility issues the Common Sparse Spectral-Spatial Pattern (CSSSP) [6] was presented, where its FIR filter consists of single time delay parameter. Since, these methods were computationally expensive, a Spectrally-weighted Common Spatial Pattern (SPEC-CSP) [19] was designed which alternatively optimizes the temporal filter in frequency domain and then the spatial filter in the iteration process. To improve the performance of SPEC-CSP, Iterative Spatio-Spectral Pattern Learning (ISSPL) [22] was proposed which does not rely on statistical assumptions and optimizes all temporal filters under a common optimization framework.

Despite of various studies and advanced algorithm, it is still a challenge to extract optimal spatial spectral filters for BCI studies, so as to be used as a rehabilitation tool especially for disabled subjects. The spatial and spectral information related to brain activities associated with BCI paradigms are usually pre-determined as default in EEG analysis without speculation, which can lead to loses effects in practical applications due to individual variability across different subjects. Hence, to solve this issue, a CSSBP [12] with combined feature vectors is designed for BCI based paradigms, since the combination of features each corresponding to different physiological phenomena such as Readiness Potential (RP) and Event Related Desynchronization (ERD) can benefit BCI making it more robust against artifacts from non-Central Nervous System (CNS) activity such as eye blinks (EOG) and muscle movements (EMG) [5]. At first, the EEG signal is first divided into several sub bands using a band pass filter, then the channel and frequency bands are modeled as preconditions before classifying and a heuristic of stochastic gradient boost is used to train the base learners under these preconditions. The effectiveness and robustness of the designed algorithm along with feature combination is evaluated on widely used benchmark dataset BCI competition IV (IIa). The remaining part of the paper is organized as follows; a detailed design of proposed Boosting Algorithm is given in Sect. 2, performance comparison results shown in Sect. 3. Finally, conclusion is given in Sect. 4.

2 Proposed Algorithm

Under this section, a combination model of CSSBP (common spatial spectral boosting pattern) with feature combination is given in detail; it includes modeling the problem, and learning algorithm for the model. The model consists of five stages, data preprocessing which includes multiple spectral filtering by decomposing the signal into several sub bands using a band pass filter and spatial filtering, feature extraction using common spatial pattern (CSP), feature combination, training the weak classifiers, and pattern recognition with the help of a combinational model. The architecture of the designed algorithm is shown in Fig. 1. The EEG data is firstly spatial filtered and band pass filtered under multiple spatial-spectral preconditions.

Afterwards, the CSP algorithm is applied to extract features of the EEG training dataset and combine these feature vectors, then the weak classifiers $ \{ f_{m } \}_{m = 1}^{M} $, are trained and combined to a weighted combination model. Lastly, a new test sample $ \hat{x} $ is classified using this combination model.

2.1 Problem Design

During BCI studies, the two main concerns are the channel configuration and frequency band, which are predefined as default for implementing EEG analysis. But, predefining these conditions without deliberations leads to poor performance while executing it in a real scenario due to subject variability in EEG patterns. Hence, an efficient and robust configuration is desirable in case of practical applications.

To model this problem, let us denote the training dataset as $ E_{train} = (x_{i} , y_{i } )_{i = 1}^{N} $, where E _i is the i^th sample and y _i is its corresponding label. The main aim is to find a subset ω c ν, by using a set of all probable preconditions ν, which generates a combination model F by incorporating all sub models trained under condition W_M (W_M ϵ ω) and reducing the misclassification rate on the train dataset E _train, given by,

$$ \upomega = \arg min_{\upomega} \frac{1}{N}\left| {{\text{Ei }}:\,{\text{F}}\left( { x_{i} ,\upomega} \right) \ne \mathop {y_{i} }\nolimits_{i = 1}^{N} } \right| $$

(1)

In the following part of this section, 2 homogeneous problems are modeled in detail and then an adaptive boosting algorithm is designed to solve them.

Spatial Channel and Frequency Band Selection.

For channel selection, the aim is to select an optimal channel set S(S ⊂ U), where U is the universal set including all possible channel subsets for set of channels C so that each subset Um in U satisfies $ \left| {\text{Um}} \right| \, \le \, \left| {\text{C}} \right| $ (here |.| is used to represent the size of the corresponding set), which produces an optimal combination classifier F on the training data by combining base classifiers learned under different channel set preconditions. Therefore, we get,

$$ F\left( {E_{train}; S} \right) = \sum\nolimits_{{S_{m} \in S}} {\alpha_{m} f_{m } \left( {E_{train} ;S_{m} } \right)} $$

(2)

Where F is the optimal combination model, $ f_{m } $ is m^th sub model learned with channel set precondition $ S_{m} $, $ E_{train} $ is the training dataset, and $ \alpha_{m} $ is combination parameter. The original EEG $ E_{i} $ is multiplied with the obtained spatial filter, to obtain a projection of $ E_{i} $ on channel set $ S_{m} $, which is the alleged channel selection. In the simulation work, 21 channels were selected, denoted as universal set of all channels, C = (CP6, CP4, CP2, C6, C4, C2, FC6, FC4, FC2, CPZ, CZ, FCZ, CP1, CP3, CP5, C1, C3, C5, FC1, FC3, FC5), where each one indicates an electrode channel.

For frequency band selection, the spectra denoted as G is simplified as a closed interval, where the elements are all integer points (e.g., G is Hz). Here G is split into various sub-bands B and D as given in [12, 14], which denotes a universal set composed of all possible sub-bands. While selection of optimal frequency band, the objective is to obtain an optimal band set B (B ⊂ D), so that an optimal combination classifier on the training data is produced.

$$ F\left( {E_{train} ;B} \right) = \sum\nolimits_{{B_{m} \in B}} {\alpha_{m} f_{m } \left( {E_{train} ;B_{m} } \right)} $$

(3)

Where $ f_{m } $ is m^th weak classifier learned by sub-band $ B_{m} $. In the simulation study, a fifth order zero phase forward/reverse FIR filter was used to filter the raw EEG signal $ E_{i} $ into sub bands $ B_{m} $.

2.2 Model Learning Algorithm

Here, the models of channel selection and frequency selection are combined to form a two-tuple, $ \upvartheta_{\text{m}} = ({\text{S}}_{\text{m}} ,{\text{B}}_{\text{m}} ) $, it is used to denote a spatial-spectral precondition, and ν is represented as a universal set including all these spatial-spectral preconditions. Lastly, the combination function can be computed as

$$ F\left( {E_{train} ;\upvartheta} \right) = \sum\nolimits_{{\upvartheta_{\text{m}} \in\upvartheta}} {\alpha_{m} f_{m } \left( {E_{train} ;\upvartheta_{\text{m}} } \right)} $$

(4)

Hence, for each spatial-spectral precondition $ \upvartheta_{\text{m}} \in\upvartheta $, the training dataset $ E_{train} $ is filtered under $ \upvartheta_{\text{m}} $. The CSP features are obtained by the filtered training dataset $ E_{train} $ and these features of individual physiological nature were combined using PROB method [1]. Let us denote the N features by random variables $ X_{i} ,{\text{ i}} = 1, \ldots ,{\text{N}} $ having class labels as $ {\text{Y}} \in \, \left\{ { \pm 1} \right\} $. An optimal classifier $ f_{i } $ is defined for each feature i on the single feature space D _i hence reducing the misclassification rate. Let g_i,y denote the density of $ {\text{f}}_{\text{i}} \left( {{\text{X}}_{\text{i}} |{\text{Y}} = {\text{y}}} \right) $ for each i and labels say y = +1 or −1. Then f is the optimal classifier on the combined feature space $ {\text{D}} = ({\text{D}}_{ 1} ,{\text{ D}}_{ 2} , \ldots ,{\text{D}}_{\text{N}} ) $, and X is the combined random variable $ {\text{X}} = ({\text{X}}_{ 1} ,{\text{ X}}_{ 2} , \ldots ,{\text{X}}_{\text{N}} ) $, densities of $ {\text{f }}\left( {{\text{X }}|{\text{Y}} = {\text{y}}} \right) $ is given by g_y, hence under the assumption of equal class prior for $ {\text{x}} = \left( {{\text{x}}_{ 1} ,{\text{ x}}_{ 2} , \ldots ,{\text{x}}_{\text{N}} } \right) \in {\text{D}} $,

$$ f_{i} \left( {x_{i } ; \gamma (\vartheta_{i} )} \right) = 1 \leftrightarrow \hat{f}_{i} \left( {x_{i, } \gamma (\vartheta_{i} )} \right) \text{ := } {\text{log}}\left( {\frac{{g_{i, 1} \left( {x_{i} } \right)}}{{g_{i, - 1} \left( {x_{i} } \right)}}} \right) > 0 $$

(5)

Where γ is the model parameter determined by $ \vartheta_{i} $ and $ E_{train} $, and incorporating independence between the features to the above equation results in an optimal decision function given by,

$$ f\left( {x;\gamma (\vartheta )} \right) = 1 \leftrightarrow \hat{f}\left( {x;\gamma (\vartheta )} \right) = \sum\nolimits_{i = 1}^{N} {\hat{f}_{i} \left( {x_{i, } \gamma (\vartheta_{i} )} \right) > 0} $$

(6)

In this, the assumption is that, for each class the features are Gaussian distributed with equal covariance, i.e., $ X_{i} | Y = y N \left( {\mu_{i,y} , \mathop \sum \nolimits i} \right) $, with $ w_{i} := \sum\nolimits_{i}^{ - 1} {(\mu_{i,1} + \mu_{i, - 1} )} $, then the classifier,

$$ f\left( {x;\gamma (\vartheta )} \right) = 1 \leftrightarrow \hat{f}\left( {x;\gamma (\vartheta )} \right) = \sum\nolimits_{i = 1}^{N} {[w_{i}^{T} x_{i} - \frac{1}{2}\left( {\mu_{i,1} + \mu_{i, - 1} } \right)^{T} w_{i} ] > 0} $$

(7)

Then obtained weak classifier can be rewritten as $ f_{m } \left( {E_{train} ;\upvartheta_{\text{m}} } \right) $, which is trained using the boosting algorithm. Thus, the classification error defined earlier can be formulated as,

$$ \{ \alpha , \vartheta \}_{0}^{M} = min_{{\{ \alpha , \vartheta \}_{0}^{M } }} \sum\nolimits_{i = 1}^{N} {L\left( {y_{i} ,\sum\nolimits_{m = 0}^{M} {\alpha_{m} f_{m } \left( {x_{i} ;\upgamma(\upvartheta_{\text{m}} )} \right)} } \right)} $$

(8)

A Greedy approach [8] is used to solve (8), which is given in detail below,

$$ F\left( {E_{train} , \gamma , \{ \alpha , \vartheta \}_{0}^{M} } \right) = \sum\nolimits_{m = 0}^{M - 1} {\alpha_{m} f_{m } \left( {E_{train} ;\upgamma\left( {\upvartheta_{\text{m}} } \right)} \right) + \alpha_{M} f_{M} \left( {E_{train} ;\upgamma\left( {\upvartheta_{\text{M}} } \right)} \right)} $$

(9)

Transforming the Eq. (9) into a simple recursion formula we get,

$$ F_{m} \left( {E_{train} } \right) = F_{m - 1} \left( {E_{train} } \right) + \alpha_{m} f_{m } \left( {E_{train} ;\upgamma\left( {\upvartheta_{\text{m}} } \right)} \right) $$

(10)

We suppose, $ F_{m - 1} \left( {E_{train} } \right) $ is known, then $ f_{m } $ and $ \alpha_{m} $ can be determined by,

$$ \begin{array}{*{20}l} {F_{m} \left( {E_{train} } \right) = F_{m - 1} \left( {E_{train} } \right)} + \arg min_{f} \sum\nolimits_{i = 1}^{N} {L\left( {y_{i} ,\left[ {F_{m - 1} \left( {x_{i} } \right) + \alpha_{m} f_{m } \left( {x_{i} ;{\gamma (\vartheta }_{\text{m}} \text{)}} \right)} \right]} \right) } \end{array} $$

(11)

The problem in (11) is solved by using a steepest gradient descent [9], and the pseudo-residuals are given by,

$$ \begin{array}{*{20}l} {r_{\pi (i)m} = - \nabla_{F} L\left( {y_{\pi (i)} , F(x_{\pi (i)} )} \right)} \hfill \\ {\;\;\;\;\;\;\;\; = - [\frac{{\partial L\left( {y_{\pi (i)} , F(x_{\pi (i)} )} \right)}}{{F(x_{\pi (i)} )}}]_{{F(x_{\pi (i)} ) = F_{m - 1} (x_{\pi (i)} )}} } \hfill \\ \end{array} $$

(12)

Here, the first $ \hat{N} $ elements of a random permutation of $ \{ i\}_{i = 1}^{N} $ are given by $ \{ \pi (i)\}_{i = 1}^{{\hat{N}}} $. Henceforth, a new set $ \{ (x_{\pi \left( i \right)} , r_{\pi (i)m} )\}_{i = 1}^{N} $, which signifies a stochastically partly best descent step direction, is produced and employed to learn $ \upgamma(\upvartheta_{\text{m}} ) $ given by,

$$ \upgamma(\upvartheta_{\text{m}} ) = \arg min_{\gamma ,\rho } \sum\nolimits_{i = 1}^{{\hat{N}}} {\left[ {r_{\pi (i)m} - \rho f\left( {x_{\pi (i)} ;\gamma_{m} (\upvartheta_{\text{m}} )} \right)} \right]} $$

(13)

The combination coefficient $ \alpha_{m} $ is obtained with $ \gamma_{m} (\upvartheta_{\text{m}} ) $ as,

$$ \alpha_{m} = \arg min_{\alpha } \sum\nolimits_{i = 1}^{N} {L\left( {y_{i} ,\left[ {F_{m - 1} \left( {x_{i} } \right) + \alpha f_{m } \left( {x_{i} ;\upgamma\left( {\upvartheta_{\text{m}} } \right)} \right)} \right]} \right)} $$

(14)

Here, each weak classifier $ f_{m } $ is trained under a random subset $ \{ \pi (i)\}_{i = 1}^{N} $ (without replacement) from the full training data set. This random subset is used instead of the full sample, to fit the base learner as shown in Eq. (13) and the model update is computed using Eq. (14) for the current iteration. During the iteration, a self-adjusted training data pool P is maintained at background, given in detail in Algorithm 1. Then, the number of copies is computed using local classification error and these copies of incorrectly classified samples are then added to the training data pool.

2.3 Algorithm 1: Architecture of Proposed Boosting Algorithm

Input: The EEG training dataset given by $ \{ {\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} \}_{{{\text{i}} = 1}}^{\text{N}} $, L(y, x) is the squared error loss function, number of weak learners denoted by M, and ν is the set of all preconditions.

(1)
Initialize the training data pool $ {\text{Po = E}}_{\text{train}} = \{ {\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} \}_{{{\text{i}} = 1}}^{\text{N}} $,
(2)
for m = 1 to M.
(3)
Generate a random permutation

$$ \{\uppi({\text{i}})\}_{{{\text{i}} = 1}}^{{|{\text{P}}_{{{\text{m}} - 1}} |}} = {\text{randperm}}({\text{i}})_{{{\text{i}} = 1}}^{{|{\text{P}}_{{{\text{m}} - 1}} |}} $$

(4)
Select the first $ {\hat{\text{N}}} $ elements $ \{\uppi({\text{i}})\}_{{{\text{i}} = 1}}^{{{\hat{\text{N}}}}} $ as $ ({\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} )_{{{\text{i}} = 1}}^{{{\hat{\text{N}}}}} $, from Po.
(5)
Use this $ \{\uppi({\text{i}})\}_{{{\text{i}} = 1}}^{{{\hat{\text{N}}}}} $ elements to optimize new learner ${\text{f}}_{\text{m}}$ and its related parameters is obtained in output as,

Output: F is the optimal combination classifier, weak learners obtained as $ \{ {\text{f}}_{\text{m}}\}_{{{\text{m}} = 1}}^{\text{M}} $, where $ \{\upalpha_{\text{m}}\}_{{{\text{m}} = 1}}^{\text{M}} $ is the weights of weak learners and $ \{\upvartheta_{\text{m}}\}_{{{\text{m}} = 1}}^{\text{M}} $ is the preconditions under which these weak learners are learned.

(6)
Input $ ({\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} )_{{{\text{i}} = 1}}^{\text{N}} $, and $ \upvartheta $ into a classifier using CSP, extract features and combine these feature vectors to generate family of weak learners.
(7)
Initialize $ {\text{P}} $, $ {\text{F}}_{0} ({\text{E}}_{\text{train}} ) = {\text{arg min}}_{\upalpha} \sum\nolimits_{{{\text{i}} = 1}}^{\text{N}} {{\text{L}}({\text{y}}_{\text{i}} ,\upalpha )} $
(8)
Optimalize $ {\text{f}}_{\text{m }} \left( {{\text{E}}_{\text{train}} ;\upgamma(\upvartheta_{\text{m}} )} \right) $ as defined in Eq. (10).
(9)
Optimalize $ \upalpha_{\text{m }} $ as defined in Eq. (11).
(10)
Update Pm using the following steps,

A.
Use current local optimal classifier F _m to split the original training set $ {\text{E}}_{\text{train}} = ({\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} )_{{{\text{i}} = 1}}^{\text{N}} $ into two parts $ {\text{T}}_{\text{True = }}\{ {\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} \}_{{{\text{i}}: {\text{y}}_{\text{i}} }} = {\text{F}}_{\text{m}} ({\text{x}}_{\text{i}} ) $, and $ {\text{T}}_{\text{False}} = \{ {\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} \}_{{{\text{i}}: {\text{y}}_{\text{i}} }} \ne {\text{F}}_{\text{m}} ({\text{x}}_{\text{i}} ) $

Re-adjust the training data pool:

B.
For each $ \left( {{\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} } \right) \in {\text{T}}_{\text{False}} $ do.
C.
Select out all $ \left( {{\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} } \right) \in {\text{P}}_{{{\text{m}} - 1}} $ as $ \{ {\text{x}}_{{{\text{n}}({\text{k}})}} , {\text{y}}_{{{\text{n}}({\text{k}})}} \}_{{{\text{k}} = 1}}^{\text{K}} $.
D.
Copy $ \{ {\text{x}}_{{{\text{n}}({\text{k}})}} , {\text{y}}_{{{\text{n}}({\text{k}})}} \}_{{{\text{k}} = 1}}^{\text{K}} $ with d(d ≥ 1) times so that we get total (d + 1)K duplicated samples.
E.
Return these (d + 1) K samples into $ {\text{P}}_{{{\text{m}} - 1}} $ and we get a new adjusted pool $ {\text{P}}_{\text{m}} $. And

$$ {\text{F}}_{\text{m}} \left( {{\text{E}}_{\text{train}} } \right) = {\text{F}}_{{{\text{m}} - 1}} \left( {{\text{E}}_{\text{train}} } \right) +\upalpha_{\text{m}} {\text{f}}_{\text{m }} \left( {{\text{E}}_{\text{train}} ;\upgamma(\upvartheta_{\text{m}} )} \right) $$

F.
end for.
(11)
end for.
(12)
for each $ {\text{f}}_{\text{m }} \left( {{\text{E}}_{\text{train}} ;\upgamma(\upvartheta_{\text{m}} )} \right) $, use mapping $ {\text{F}} \leftrightarrow \vartheta $, to obtain its corresponding precondition $ \upvartheta_{\text{m}} $.
(13)
Return F, $ \left\{ {{\text{f}}_{\text{m}} }\right\}_{{{\text{m}} = 1}}^{\text{M}} $, $ \left\{ {\upalpha_{\text{m}} }\right\}_{{{\text{m}} = 1}}^{\text{M}} $, and $ \left\{ {\upvartheta_{\text{m}} }\right\}_{{{\text{m}} = 1}}^{\text{M}} $.

With the help of Early stopping strategy [23], the iteration time M is determined to avoid overfitting, using $ {\hat{\text{N}}} = {\text{N}} $, doesn’t introduce randomness, hence smaller $ \frac{{{\hat{\text{N}}}}}{\text{N}} $ fraction, incorporates more overall randomness into the process. In this work, $ \frac{{{\hat{\text{N}}}}}{\text{N}} = 0.9 $ and a comparably satisfactory performance is obtained for the above approximation. While adjusting P, the copies of incorrectly classified samples, d is computed by the local classification error, $ {\text{e}} = \frac{{\left| {{\text{T}}_{\text{False}} } \right|}}{\text{N}} $ is given by,

$$ {\text{d}} = { \hbox{max} }(1, \left\lfloor{\frac{{1 - {\text{e}}}}{{{\text{e}}\, + \in }}} \right\rfloor) $$

(15)

Here, the parameter $ \in $ is called as accommodation coefficient, and e is always less than 0.5, and decreases during the iterations, so that large weights on samples will be given which were incorrectly classified by strong learners.

3 Result

The robustness of the designed algorithm was assessed on dataset obtained from BCI competition IV (IIa) dataset [2]. In order to remove artifacts obtained from eye and muscle movements, FastICA was employed [15]. For comparing the performance and efficiency of the designed algorithm, Regularized CSP (RCSP) [13] was used for feature extraction. In this, model parameter λ for RCSP, were chosen on the training set using a Hold Out validation procedure. In case of the four-class motor imagery classification task for dataset II, one-versus-rest (OVR) [21] strategy was employed for CSP. PROB method [1] was utilized for feature combination which incorporates independence between ERD and LRP features. Feature selection was done to select relevant features, since as more features cannot improve the training accuracy. Here feature selection was done using Fisher score (a variant, $ {\text{J}} = \frac{{\left\| {\upmu_{ + } -\upmu_{ - } } \right\|^{2} }}{{\upsigma_{ + } +\upsigma_{ - } }} $) [10], it makes selection by measuring the discrimination of individual feature in the feature vector for classification. Then the features with largest fisher score are selected as most discriminative features. Linear Discriminant Analysis (LDA) [4] which minimizes the expected risk of misclassification rate was utilized for classification.

Here, the most optimal channel using [20] for all four MI movements i.e., left hand, right hand, foot and tongue were CP4, Cz, FC2, and C1. The 2-D topoplot maps of peak amplitudes of boosting based CSSP filtered EEG in each electrode for subject S1 is shown in Fig. 2.

To compute the spatial weight for each channel, the quantitative vector,$ {\text{L}} = \sum\nolimits_{{{\text{S}}_{\text{i}} \in {\text{S}}}} {\upalpha_{\text{i}} {\text{S}}_{\text{i}} } $ [17] was used where $ {\text{S}}_{\text{i}} $ is the channel sets and $ \upalpha_{\text{i}} $ are their weights. The spectral weights were computed as given in [12] and then projected onto the frequency bands. In addition, the temporal information were also obtained and visualized. The training dataset are preprocessed under the spatial-spectral pre-condition $ \upvartheta_{\text{m}} \in\upvartheta $, which results in a new dataset on which spatial filtering is done using CSP to obtain the spatial patterns. Then the first two components obtained by CSP are projected onto the space yielding the CSP filtered signal E_m. The peak amplitude P_mCi for E_m and each channel $ {\text{C}}_{\text{i}} \in {\text{C}} $. Then the P_mCi is averaged over all set of preconditions $ \upvartheta_{\text{m}} \in\upvartheta $, computed as $ {\text{P}}_{{{\text{C}}_{\text{i}} }} = (\frac{1}{|\vartheta |})\sum\nolimits_{{\upvartheta_{\text{m}} \in\upvartheta}} {\upalpha_{\text{m}} {\text{P}}_{{{\text{mC}}_{\text{i}} }} } $ where $ \upalpha_{\text{m}} $ is the corresponding weight for the mth condition, which is then visualized using a 2-D topoplot map. From the topoplot, it can be observed that the left hand and right hand movement resulted in activation over the right and left hemisphere of the brain, the foot movement activated the central cortical area and tongue showed activation in the motor cortex region.

The classification results of the test dataset for the proposed method and the other competing method i.e., Regularized CSP (RCSP) is detailed as follows. In all the subjects the maximum number of iterations, M of the boosting algorithm was set to 180, which was computed using early stopping strategy so as to avoid overfitting, and ϵ was set to 0.05. The cohen’s kappa values for all 9 subjects in the BCI IV(IIa) dataset is shown in Fig. 3. In case of dataset 2, the CSSBP outperformed the RCSP algorithm and showed highest average cohen’s kappa value [3]. From the kappa values, it can be seen that when feature vectors are combined in RCSP algorithm, there was a significant improvement in kappa values in all subjects (except for subjects S4 and S6).

Whereas the proposed method improved the kappa values compared to the above algorithm and moreover when feature vectors were combined, it outperformed CSSBP with single feature when compared with combined feature vectors. The statistical analysis was done using IBM SPSS ver. 23., it showed significant difference between designed method and the other methods used for comparison in a Mann-Whitney U test. For all the cases, the designed method outperformed for level of significance p < 0.05, as shown in Fig. 4.

4 Conclusion

In this work, a boosting based common spatial-spectral pattern (CSSBP) algorithm with feature combination has been designed for multichannel EEG classification. Here, the channel and frequency configurations are divided into multiple spatial-spectral preconditions by using a sliding window strategy. Under these preconditions, the weak learners are trained using a boosting approach. The motive is to select the most contributed channel groups and frequency bands related to neural activity. From the results, it can be seen that the CSSBP clearly outperformed the other method use for comparison. In addition, combining the widely used feature vectors ERD and readiness potentials (RP) significantly improved the classification performance compared to CSSBP and resulted in increased robustness.

The PROB method was utilized which incorporates independence between ERD and LRP features enhanced the performance. This can also be used to better explore the neurophysiological mechanism of underlying brain activities. Feature combination of different brain tasks in feedback environment, where the subject is trying to adapt with the feedback scenario might cause the learning process complex and time consuming, so for that this process needs to investigate further in future online BCI experiments.

References

Blankertz, B., Curio, G., Müller, K.-R.: Classifying single trial EEG: towards brain computer interfacing. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 157–164. MIT Press, Cambridge (2002)
Google Scholar
Brunner, C., Leeb, R., Muller-Putz, G., Schlogl, A., Pfurtscheller, G.: BCI competition 2008-Graz data set A, Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology (2008). http://www.bbci.de/competition/iv/
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article MathSciNet Google Scholar
Dornhege, G., Blankertz, B., Curio, G., Müller, K.-R.: Boosting bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms. IEEE Trans. Biomed. Eng. 51(6), 993–1002 (2004). https://doi.org/10.1109/TBME.2004.827088
Article Google Scholar
Dornhege, G., Blankertz, B., Curio, G., Müller, K.-R.: Combining features for BCI. In: Proceedings of the 15th International Conference on Neural Information Processing Systems (NIPS 2002), pp. 1139–1146. MIT Press, Cambridge (2002). http://dl.acm.org/citation.cfm.id=2968618.2968760
Dornhege, G., Blankertz, B., Krauledat, M., Losch, F., Curio, G., Müller, K.R.: Combined optimization of spatial and temporal filters for improving brain-computer interfacing. IEEE Trans. Biomed. Eng. 53(11), 2274–2281 (2006). https://doi.org/10.1109/TBME.2006.883649
Article Google Scholar
Jerry, J., et al.: Brain-computer interfaces in medicine. Mayo Clin. Proc. 87(3), 268–279 (2012). https://doi.org/10.1016/j.mayocp.2011.12.008
Article Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002). https://doi.org/10.1016/S0167-9473(01)00065-2
Article MathSciNet MATH Google Scholar
Gu, Q., Li, Z., Han, J.: Generalized Fisher Score for Feature Selection. CoRR abs/1202.3725 (2012). http://arxiv.org/abs/1202.3725
Lemm, S., Blankertz, B., Curio, G., Müller, K.R.: Spatio-spectral filters for improving the classification of single trial EEG. IEEE Trans. Biomed. Eng. 52(9), 1541–1548 (2005). https://doi.org/10.1109/TBME.2005.851521
Article Google Scholar
Liu, Y., Zhang, H., Chen, M., Zhang, L.: A boosting-based spatial-spectral model for stroke patients; EEG analysis in rehabilitation training. IEEE Trans. Neural Syst. Rehab. Eng. 24(1), 169–179 (2016). https://doi.org/10.1109/TNSRE.2015.2466079
Article Google Scholar
Lotte, F., Guan, C.: Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms. IEEE Trans. Biomed. Eng. 58(2), 355–362 (2011). https://doi.org/10.1109/TBME.2010.2082539
Article Google Scholar
Novi, Q., Guan, C., Dat, T.H., Xue, P.: Sub-band common spatial pattern (SBCSP) for brain-computer interface. In: 2007 3rd International IEEE/EMBS Conference on Neural Engineering, pp. 204–207 (2007). https://doi.org/10.1109/CNE.2007.369647
Mishra, P., Singla, S.: Artifact removal from biosignal using fixed point ICA algorithm for pre-processing in biometric recognition. Measur. Sci. Rev. 13(1), 7–11 (2013). https://doi.org/10.2478/msr-2013-000
Google Scholar
Pfurtscheller, G., et al.: Event-related EEG/MEG synchronization and desynchronization: basic principles. Clin. Neurophysiol. 110(11), 1842–1857 (1999). https://doi.org/10.1016/S1388-2457(99)00141-8
Article Google Scholar
Ramoser, H., Muller-Gerking, J., Pfurtscheller, G.: Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans. Rehab. Eng. 8(4), 441–446 (2000). https://doi.org/10.1109/86.895946
Article Google Scholar
Shibasaki, H., Hallett, M.: What is the Bereitschaftspotential? Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 117(11), 2341–2356 (2006). https://doi.org/10.1016/j.clinph.2006.04.025
Article Google Scholar
Tomioka, R., Dornhege, G., Nolte, G., Blankertz, B., Aihara, K., Müller, K.-R.: Spectrally weighted common spatial pattern algorithm for single trial EEG classification. Mathematical Engineering (Technical reports) (2006)
Google Scholar
Wang, Y., Gao, S., Gao, X.: Common spatial pattern method for channel selection in motor imagery based brain-computer interface. In: 27th Annual Conference 2005 IEEE Engineering in Medicine and Biology, pp. 5392–5395 (2005). https://doi.org/10.1109/IEMBS.2005.1615701
Wu, W., Gao, X., Gao, S.: One-Versus-the-Rest (OVR) algorithm: an extension of common spatial patterns (CSP) algorithm to multi-class case. In: 27th Annual Conference 2005 IEEE Engineering in Medicine and Biology, pp. 2387–2390 (2005). https://doi.org/10.1109/IEMBS.2005.1616947
Wu, W., Gao, X., Hong, B., Gao, S.: Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL). IEEE Trans. Biomed. Eng. 55(6), 1733–1743 (2008)
Article Google Scholar
Zhang, T., Yu, B.: Boosting with early stopping: convergence and consistency. Ann. Statist. 33(4), 1538–1579 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Fraunhofer First, Intelligent Data Analysis Group, and Campus Benjamin Franklin of the Charite’ - University Medicine Berlin (http://www.bbci.de/competition/iii), and the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology (http://www.bbci.de/competition/iv), for providing the dataset online.

Author information

Authors and Affiliations

GEC, Thrissur, Kerala, India
Rachel Rajan & Sunny Thekkan Devassy

Authors

Rachel Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Sunny Thekkan Devassy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rachel Rajan .

Editor information

Editors and Affiliations

Telecom SudParis, Evry, France
Patrick Horain
Pierre and Marie Curie University, Paris, France
Catherine Achard
Univ. Evry, Paris Saclay University, Evry, France
Malik Mallem

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajan, R., Thekkan Devassy, S. (2017). Improving Classification Performance by Combining Feature Vectors with a Boosting Approach for Brain Computer Interface (BCI). In: Horain, P., Achard, C., Mallem, M. (eds) Intelligent Human Computer Interaction. IHCI 2017. Lecture Notes in Computer Science(), vol 10688. Springer, Cham. https://doi.org/10.1007/978-3-319-72038-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-72038-8_7
Published: 01 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72037-1
Online ISBN: 978-3-319-72038-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics