Keywords

1 Introduction

Brain-computer interface (BCI) is a communication system that is established between the human brain and computers or external devices without relying on the regular brain peripheral nerve and muscle systems [1]. BCI system acquire human brain EEG signals, extract features, classify EEG and translate EEG into machine-readable control commands. The main goal of BCI system is to strengthen the ability of disabled persons affected by a number of motor disabilities. The application of BCI in the medical field mainly includes sensory recovery, cognitive recovery, rehabilitation treatment, and brain-control wheelchairs [2]. In non-medical areas, BCI can be applied to new types of entertainment games, car driving, robot replacements, lie detectors [3], etc. In addition, in the field of aviation and military industry, BCI also has a wide range of applications.

MI-BCI is the BCI application based on MI-EEG, and it is one of the main directions of brain-computer interface research. Many successful MI-BCI relies on subjects learning to control specific EEG rhythms that manifest as EEG potentials oscillating at a particular frequency. The EEG rhythms related to motor imagery tasks consist of mu (8–13 Hz) rhythm and beta (13–30 Hz) rhythm. The energy in mu band observed in motor cortex of the brain decreases by performing an MI task [4]. This decrease is called event related desynchronization (ERD). An MI task also causes an energy increase in the beta band that is called event related synchronization (ERS) [5]. For different MI tasks, the brain motor cortex produces discriminative ERD/ERS. Features are extracted by analysing ERD/ERS, and then a classification algorithm is adopted to construct a MI-BCI. Two main techniques for MI-EEG analysis are feature extraction and classification algorithms. Several feature extraction techniques such as power spectral density (PSD), common spatial pattern (CSP) [6,7,8,9], autoregressive (AR) model, adoptive autoregressive (AAR) model, independent components analysis (ICA) and wavelet transform [10, 11] have been studied. Classifiers such as support vector machine (SVM) [12], k-nearest neighbors (KNN) [13, 14], random forest (RF) [15], linear discriminant analysis (LDA) [16], etc. have been explored for classification of MI-EEG signals.

In recent years, deep learning’s revolutionary advances in audio and visual signals recognition have gained significant attentions. Some recent deep learning based EEG classification approaches have enhanced the recognition accuracy. In a study by An et al., a deep belief network (DBN) model was applied for two class MI classification and DBN was shown more successful than the SVM method [17]. Yousef et al. applied convolutional neural networks (CNN) and stacked autoencoders (SAE) to classify EEG Motor Imagery signals [18, 19]. Schirrmeister proposed a convolutional neural network (deep ConvNets) for end to end EEG analysis. Their study shows how to design and train ConvNets to decode task-related information from the raw EEG without handcrafted features and highlights the potential of deep ConvNets combined with advanced visualization techniques for EEG based brain mapping [20].

In this paper, we propose a framework based on CSP and backpropagation algorithm for MI-EEG analysis. In order to evaluate the proposed framework, we trained and tested with BCI competition II dataset III and BCI competition IV dataset 2a. The remainder of this paper is organized as follows. Section 2 provides a description of the proposed framework. Section 3 describes the experimental studies and results on the evaluation data of the BCI competition II datasets III and BCI competition IV datasets 2a. Finally, Sect. 4 concludes this paper with the results.

2 Methods

The structure of the proposed framework is shown in Fig. 1. The proposed framework consists of 4 stages. The first stage is a band-pass filter for raw EEG data. The second stage performs spatial filtering using CSP algorithm. The third stage consists of the temporal projection of the spatial filtered signal. The last stage is a single-layer neural network that is implemented as a classification layer. The following sections explain the different stages of the proposed framework in detail.

Fig. 1.
figure 1

Diagram of the proposed framework

2.1 Band-Pass Filtering

As described in Sect. 1, there are ERS/ERD when human perform MI tasks. In order to extract the EEG signals in mu band and beta band, the raw EEG data is first filtered by a band-pass filter that covers 8–30 Hz.

2.2 CSP Algorithm

The CSP algorithm is highly successful in calculating spatial filters for detecting ERD/ERS. The main idea is to use a linear transform to project the multi-channel EEG data into low-dimensional spatial subspace with a projection matrix, of which each row consists of weights for channels [21]. This transformation can maximize the variance of two-class signal matrices. The CSP algorithm perform spatial filtering using

$$ Z_{i} = W_{csp}^{T} E_{i} $$
(1)

where \( E_{i} \) is an \( n \times t \) matrix representing the raw EEG measurement data of the \( i \) th trial, \( n \) is the number of channels, \( t \) is the number of measurement samples per channel. \( W_{csp} \) denotes the CSP projection matrix, \( T \) denotes transpose operator. \( Z \) denotes the spatially filtered signal. The CSP matrix can be computed by solving the eigenvalue decomposition problem

$$ S_{1} W_{csp} = (S_{1} + S_{2} )W_{csp} D $$
(2)

where \( S_{1} \) and \( S_{2} \) are estimates of the covariance matrices of the band-pass filtered EEG measurements of the respective motor imagery action, \( D \) is the diagonal matrix that contains the eigenvalues of \( S_{1} \).

However, only a small number \( m \) of the spatial filtered signal is generally used as features. We perform another transform to get the spatially filtered signal. It is given by

$$ Z_{i} = \overline{{W_{csp} }}^{T} E_{i} $$
(3)

where \( \overline{{W_{csp} }} \) represents the first \( m \) and the last \( m \) columns of \( W_{csp} \), the spatial filtered signal \( Z \) is a \( 2m \times t \) matrix.

2.3 Joint Optimization Using Backpropagation

Mathematically, the 3th stage and the 4th stage can be described as follows. Given the spatial filtered signal \( Z \), the temporal projection matrix \( V \), the classifier weights \( W_{c} \) and bias \( b \), we have

$$ S = W_{c}^{T} \log (Z^{2} V) + b $$
(4)

where \( S \) denotes the input that is a vector containing class scores and will be plugged into an activation function. The output of the framework is given by

$$ y = f\left( S \right) $$
(5)

where \( y \) is a vector of probability for the classes and \( f\left( \cdot \right) \) is the activation function that is the softmax function. The softmax function (sofmax regression) is a generalization of logistic regression to the case where we want to handle multiple classes. The softmax output is given by

$$ y_{k} = \frac{{e^{{S_{k} }} }}{{\sum\nolimits_{j} {e^{{S_{j} }} } }} $$
(6)

where \( S_{k} \) is an element for a certain class \( k \) in all \( j \) classes. The cost function is the cross-entropy cost function, which is

$$ E = - \log (p_{{y_{k} }} ) $$
(7)

The free parameters of the 3th stage and the 4th stage are the temporal projection matrix \( V \), the classifier weights \( W_{c} \) and the bias \( b \). The parameters are learned by using back-propagation algorithm. In this method, the labeled training set is fed to the network and the error \( E \)(cost function) is computed. Then the model parameter can be updated using gradient descent method. The error can be minimized by changing network parameters as shown as follows

$$ V = V - \eta \frac{\partial E}{\partial V} $$
(8)
$$ W_{c} = W_{c} - \eta \frac{\partial E}{{\partial W_{c} }} $$
(9)
$$ b = b - \eta \frac{\partial E}{\partial b} $$
(10)

where \( \eta \) denotes the learning rate of the algorithm. \( V \) is initialized to a matrix of all ones, \( W_{c} \) is randomly initialized from a Gaussian distribution. Finally, the trained framework is used for classification of the new samples in the test set.

3 Experiments with BCI Competition Datasets

In this section, we apply the proposed framework to the BCI competition datasets, and the results of the proposed approach on these datasets are presented.

3.1 BCI Competition II, Dataset III

The first dataset is dataset III from BCI competition II. The dataset includes MI task experiments for right hand and left hand movements. EEG signals are recorded at C3, Cz and C4 channels. During acquisition of the EEG signals, at t = 2 s an acoustic stimulus indicating the beginning of the trial was used and a cross ‘+’ was displayed for 1 s. Then, at t = 3 s, the subject was asked to perform the related MI task by displaying an arrow (left or right). There were 280 trials in the dataset, 140 trials for training and another 140 trials for test.

For each EEG trial, we extracted the time interval between 0.5 s to 3.5 s after the cue was displayed. To evaluate our method on the dataset, we used the network shown in Fig. 1 and described in Sect. 2, which consists of a band-pass filter, CSP spatial projection, temporal projection and a single-layer neural network. The framework was trained with 140 trials in the training set and tested on 140 trials in the test set. Stochastic gradient descent (SGD) was used to update the parameters and minimize the error \( E \). For each training epoch, the mini-batch was set to be 1/2 of the training data randomly.

The results of BCI competition II dataset III are shown in Table 1. When learning rate \( \eta \) was fixed to be 0.03, we obtained the best results. The accuracy performance of our method was obtained as 90.0%. The accuracy of the winner algorithm of the competition is 89.3%. We compared our results to some study (CNN and CNN-SAE) where deep learning network is used [18, 19]. The results of CNN and CNN-SAE are 90.0% and 89.3% respectively. The CSP-LR method is the normal method without using deep learning methods for MI-EEG analysis, which use CSP for feature extraction and logistic regression algorithm for classification. We also compared our results to the CSP-LR method. The CSP-LR method got an accuracy of 88.9%. The kappa values of those methods are also in the Table 1. The kappa value is a measure for classification performance removing the effect of accuracy of random classification. Kappa is calculated as

Table 1. The accuracy (%) and kappa results of BCI competition II dataset III
$$ kappa = \frac{{acc - {1 \mathord{\left/ {\vphantom {1 N}} \right. \kern-0pt} N}}}{{1 - {1 \mathord{\left/ {\vphantom {1 N}} \right. \kern-0pt} N}}} $$
(11)

where \( N \) denotes the number of classes. In this dataset \( N \) is 2. As described in Table 1, the accuracy of the proposed method is equal to CNN-SAE, and is better than the winner of competition, CNN method and CSP-LR.

3.2 BCI Competition IV, Dataset 2a

BCI competition IV dataset 2a comprised 4 classes of motor imagery EEG measurements from 9 subjects, namely, left hand, right hand, feet, and tongue. Two sessions, one for training and the other for evaluation, were recorded from each subject. Each session comprised 288 trials of data recorded with 22 EEG channels and 3 monopolar electrooculogram (EOG) channels. Each trial starts with a short acoustic stimulus and a fixation cross. Then, at t = 3 s an arrow indicates the MI task. The arrow is displayed for 1.25 s. Then the subjects have 4 s to imagine the task.

There are 4 classes in dataset 2a that is different from BCI competition II dataset III. When performing the spatial projection, we use OVR-CSP [22] to get the spatial filtered signals. The architecture of framework described in Sect. 2 can be changed as Fig. 2. The number of temporal projection matrices needed to be fine-tuned increase to 4. The 4 temporal projection matrices are initialized to matrices of all ones and will be updated together using back propagation algorithm.

Fig. 2.
figure 2

Diagram of the proposed framework based on OVR-CSP

For each EEG trial, we extracted the time interval between 1 s to 5 s after the cue was displayed. The framework was trained with training data and tested on test data. SGD was used to update the parameters. The Mini-batch was set to be 1/4 of the training data randomly.

The accuracy results of the proposed method and CSP-LR are shown in Table 2. Kappa values of the proposed method and CSP-LR are compared to FBCSP (winner algorithm of competition) [9] in Table 3. With the deep learning method, the proposed method obtained higher accuracies and better kappa values than CSP-LR method for all subjects. For subject 1, subject 2, subject 3, subject 8 and subject 9, our approach has better kappa values than FBCSP. For subject 4, subject 5, subject 6 and subject 7, our approach has worse kappa values. The average kappa value of our approach is 0.583, which is higher than FBCSP (0.569).

Table 2. The accuracy (%) results for the proposed method and CSP-LR
Table 3. The kappa results for the proposed method, FBCSP and CSP-LR

4 Conclusion

In this study, we propose a deep learning approach for MI-EEG analysis. We designed a framework by combining backpropagation algorithm and CSP. We use a band-pass filter for processing the raw EEG data. And CSP algorithm is used for spatial filtering. Then we perform temporal projection and obtain the features which are fed to a single-layer neural network for classification. The free parameters of the framework can be fine-tuned by applying the backpropagation algorithm for the best classification accuracy.

We apply the proposed framework to the BCI competition datasets. Dataset III from BCI competition II and dataset 2a from BCI competition IV were used in this study. The accuracy result of our method on dataset III is 90.0% that is equal to CNN-SAE method. And it is higher than the winner algorithm of competition II and CNN method. On dataset 2a from BCI competition IV, our method obtained average kappa value of 0.583 which is better than FBCSP. Furthermore, on both datasets our method outperformed CSP-LR method that is not using deep learning methods.

Though deep learning methods have achieved great development in computer vision, natural language processing and speech processing, its application in EEG-based BCI is still rare. Our results show that deep learning methods have great potential to be a powerful tool for EEG analysis and EEG-BCI. We believe that the number of further BCI studies using deep learning methods will increase rapidly.