1 Introduction

The concept of “shared paradigm” consists on sharing the control between the wheelchair and its driver. Naturally, this approach requires personalization: Each pathology has its specific requirements (for example Locked-in patients can rely only on intellectual and visual faculties while tetraplegic ones can provide some limited muscular activities). Consequently, several approaches are proposed to ensure the navigation safety.

Vander Poorten et al. [8] proposed a communication channel between the subject and the wheelchair controller. The concept can help the wheelchair user to avoid mode confusions. Ren et al. [6], suggest a map matching algorithm between GPS positions or other sensors on a sidewalk network. This process of map matching in turn assists in making decisions under uncertainty. Peinado et al. [7] present a collaborative wheelchair control. The system estimates how much help the person needs at each situation and provides the correct amount of help based on his skills. Besides, the system combines between the human and the machine control by weighting them according to their local efficiency. Consequently, the better the person drives, the more control he/she is awarded with.

During navigation, as the type of error committed cannot be predicted, this latter could lead to fatal accidents. These systems lack an anticipative behavior that could modify the wheelchair parameters: example decreasing velocity in order to prevent a misbehavior could occur. During our interviews with doctors, experts, occupational therapist and psychologists, they stated that many human factors have a direct effect on the navigation performance. Among them, mental workload [2, 3] and emotions [4] are the most influent. However, these two parameters are not easy to cope with, as they need much more investigations. In the next section, we will focus on the influence of emotions on ElectroEncephaloGraphy (EEG) patterns.

2 Emotion Integration

The detection of emotions is very important as it could be integrated for two purposes: first, it could be a basis for EEG command patterns to be detected (example: when performing a motor imagery task, the EEG pattern manifestation depends on the user mental state). Second, in order to enhance wheelchair navigation, its velocity should be enslaved to the user’s emotions (example, it should decrease if the user is frustrated). But it’s still not evident that emotions can bring the expected results and especially that EEG reliability in this context is still not proved yet. Besides, measuring emotions tends to be very challenging. In this paper, different algorithms used for extraction, selection and classification are compared (Fig. 1).

Fig. 1.
figure 1

Methodology work flow to detect emotion

In order to collect data to supply emotional database, 40 healthy subjects took part in the experiment. Aged from 22 to 55 years old, they were asked to complete and sign a consent form with personal information. Afterwards, they chose five audio/video excerpts sensitive to impact their emotions. Next, the chosen sequences were clipped to 63 s and projected one by one. At the end of each session, the subject gives his rating according to the SAM scale [5].

2.1 Extraction Methods

Welch Method. The welch approach aims at estimating spectral density at different frequencies. Its based on the concept of the periodogram spectrum estimates [9]. It is defined as the result of converting signal from time to frequency domain. Consider \(x_{i}(n),i=1,2,{\ldots },K\), K uncorrelated measurements of a randomized process x(n), over an interval of \(0 \le n < L\). Suppose that successive sequences are decaled by \(D (\le L)\) samples and that each of them is of length L, the i-th sequence is given by:

$$\begin{aligned} x_{i}(n)=x(n+iD),n=0,1,\ldots ,L-1 \end{aligned}$$
(1)

Though, the overlapping quantity between \(x_{i} (n)\) and \(x_{(i+1)} (n) \) is L-D points, and if K sequences cover N data of the signal, then

$$\begin{aligned} N= L + D(K-1) \end{aligned}$$
(2)

This means, if we consider that sequences are overlapped by 50% \((D=\frac{L}{2})\), then we can form \(K={2N}{L-1}\) sections of length L. The Welch method is expressed by:

$$\begin{aligned} \hat{S}_{w}(w) = \frac{1}{KLU} \sum _{i=0}^{K-1}{\vert \sum _{n=0}^{L-1}{w(n)x(n+iD)exp(-jwn)}\vert ^2} \end{aligned}$$
(3)

where \(U=\frac{1}{N}\sum _{i=0}^{N-1}{\vert {w(n)}^2}, N\) is the length of the window w(n). This method reduces the noise in power spectra. However, the resolution R depends on window type and length:

$$\begin{aligned} R = \frac{1}{LT_{s}} \end{aligned}$$
(4)

where \(T_{s}\) the sampling period. The lower L is, the smoother Welch periodogram becomes.

The raw signal was filtered between 1 and 64 Hz (to cover all band wave lengths). Those are classified depending on the band interval limits. In the current study, we focus on: \(\delta \) (up to 4 Hz), \(\theta \) (4 Hz–8 Hz), \(\alpha \) (8 Hz–13 Hz) \(\beta \) (13 Hz–30 Hz) and \(\gamma \) (30 Hz–64 Hz). The power spectral density (PSD) was computed on successive intervals of 1 s per trial per user. The Welch periodogram was computed using 512 points FFT and various Hamming window lengths: 128, 64, 32 and 16 points with a 50% overlapping. Finally, for each band, two parameters were computed: the mean Power (\(P_{m}\)) and the Root Mean Square (\(R_{ms}\)):

$$\begin{aligned} P_{m} = \sum _{k=i}^{j}{S(k)} \end{aligned}$$
(5)
$$\begin{aligned} R_{ms} = \sqrt{\sum _{k=i}^{j}{S(k)}} \end{aligned}$$
(6)

Where S(k) are the sampled values of the periodogram and i and j are the indexes of the higher and lower sampled frequencies for each band.

Discrete Wavelets Transform. Discrete Wavelets Transform (DWT) is defined by the function:

$$\begin{aligned} \psi _{a,b}(t)=2^\frac{a}{2}\psi (2^\frac{a}{2}(t-b)) \end{aligned}$$
(7)

Where a are the scales and b the shifts. To approximate any function, \(\psi (t)\) is dilated with the coefficient \(2^k\) while the resulting function on interval is shifted proportionally to \(2^{-k}\). To obtain a compressed version of a wavelet function, a high-frequency component must be applied while a low frequency one is needed for dilated version. The correlation of the original signal with wavelet functions of different sizes, results in signal details obtained for many scales. These correlations are managed in hierarchical framework, the so called multi resolution decomposition algorithm can proceed by separating the signal into details at different scales from coarser representation named approximation.

2.2 Selection Features Methods

As it was mentioned before, extracted features need to be selected to avoid high dimensionality curse. For this purpose, two selection techniques are introduced.

Principal Component Analysis (PCA). Principal Component Analysis (PCA) transforms correlated variables into sub spaces called principal components. Some of the common applications include data compression, blind source separation and de-noising signals. PCA uses a vector space transformed to reduce the dimensionality of large data sets. Using mathematical projection, the original data set, with its high dimension, can be interpreted in fewer variables (principal components). This is important as it can reduce the processing time during classification and let the user interpret outliers, patterns and trends in the data. The aim of this section is to explain how to apply PCA on the feature vector in each second. The first step is to rescale data to obtain a new vector Z:

$$\begin{aligned} z_{j}^{i}=\frac{x_{j}^{i}-\bar{x}^{j}}{s^{j}} \end{aligned}$$
(8)

Where \(\bar{x}^{j}=\frac{1}{n}\sum _{i=1}^{n}x_{i}^{j}\) is the mean of the \(j^{th}\) variable, \(s^{j}\) is the corresponding standard deviation. The correlation matrix R contains correlation coefficients between each pair of variables. Its symmetric and composed by ones in its diagonal. Its defined as follows

$$\begin{aligned} R= D_{\frac{1}{s}} V D_{\frac{1}{s}} \end{aligned}$$
(9)

Where V is the variance matrix of X and \(D_{\frac{1}{s}} = (\frac{1}{s1},\frac{1}{s2},\ldots ,\frac{1}{sp})\). The next step is to calculate the eigen values of the correlation matrix. In fact, the eigen values contain the projection inertia of the original space on the sub space formed by the eigen vectors associated to each of them. Eigen vectors constitute the loadings of the Principal Component; thats the strength of the relation between the variable and principal component. The following figure gives an example about Eigen values, the associated principal component and the projection inertia (in percent) for each of them.

The selection of principal components axis was made empirically using Kaiser Criterion. The latter consists on retaining only the Eigen values greater than the mean value.

Genetic Algorithm (GA). Genetic Algorithm (GA) is an adaptive search technique [1]. The power of such a method is its ability to fine tune an initially unknown search space into more convenient subspaces. The main issue for a GA problem is the selection of a suited representation and evaluation function. This is very well suited to solve a feature selection problem. In this case, each feature is considered as a binary gene and each individual a binary string representing the subset of the given feature set. For a feature vector X of length s, the feature inclusion or elimination is processed as follow: if \(X_{i} =0\), then the feature is eliminated otherwise, 1 indicates its inclusion. As our purpose is to estimate the number of optimal features to keep, the proposed fitness function on the correlation matrix C is expressed as follows:

$$\begin{aligned} C = \left( \begin{array}{ccc} 1 &{} \ldots &{} c_{1,n} \\ \vdots &{} c_{h,l} &{} \vdots \\ c_{n,1} &{} \ldots &{} 1 \end{array} \right) \end{aligned}$$
(10)

Where \(C_{h,l}\) represents the correlation coefficient between feature h and feature l. This value varies between −1 and 1. If \(C_{h,l}\) tends to 1 or −1, then the features are highly correlated otherwise, if it tends to 0 then there is no correlation. Starting from the initial population which is constituted by the correlation of the pairs of features, the proposed fitness function F could be defined:

$$\begin{aligned} F = min_{h,l} |c_{h,l}| \end{aligned}$$
(11)

For each iteration, the least correlation crossing chromosomes are kept for the next generations.

2.3 Classification

The output of different classification techniques are four classes each of which corresponds to one emotion (stressed, excited, nervous, and relaxed). For this purpose three classification techniques are deployed: the Linear Discriminate Analysis (LDA), the Multi Layer Perceptron (MLP) and Support Vector Machine (SVM).

Linear Discriminant Analysis ( \({\varvec{LDA}}\) ). The LDA combines linearly variables:

$$\begin{aligned} y_{rn}= u_{0}+u_{1} X_{1rn}+ u_{2} X_{2rn}+\ldots +u_{p} X_{prn} \end{aligned}$$
(12)

where: \(y_{rn}\) is the discriminant function for the case n on the group r as well as for \(X_{irn}\) which is the discriminate variable \(X_{i}\) for the case n on the group r, and \(u_{i}\) are the required coefficients. This implies that the number of discriminant functions is determined by the number of considered groups.

Multi Layer Perceptron ( \({\varvec{MLP}}\) ). MLP is divided into three layers: an input layer with length, the selected features of the input vector, a hidden layer with 20 neurons and an output layer with 4 neurons. Sigmoid function is adopted as transfer and test set validation technique for cross-test; the database is separated into 3 sets: 70% for training, 15% for testing and 15% for validation (to avoid over fitting).

Support Vector Machine ( \(\varvec{SVM}\) ). SVM maps input vectors into higher dimensional space to ease classification. Then it finds a linear separation with the maximal margin in the new space. It requires the solution of the following problem:

$$\begin{aligned} min_{w,b,\epsilon }\frac{1}{2}w^T w+ C\sum _{i=1}^{l}{\epsilon _{i}} \; subject\,to\; y_{i}(w^T \varPhi (x_{i})+b) \ge 1-\epsilon _{i}\epsilon _{i} \ge 0 \end{aligned}$$
(13)

where: C is the penalty parameter of the error \(\epsilon _{i}\) with Gaussian radial basis function as kernel. This could be expressed as follows:

$$\begin{aligned} K(x_{i},x_{j}) = e^{-\gamma |x_{i}-x_{j}|^2} \end{aligned}$$
(14)

C and \(\gamma \) are fixed using a cross-validation technique.

2.4 Results

The results summarized in Figs. 2, 3, 4 and 5 present the different crossings between extraction (welch and wavelets), selection (genetic algorithm and PCA) over classification techniques (LDA, MLP and SVM). The comparison includes also the performance between \(R_{ms}\) and \(P_{m}\). The results show that the highest classification recorded is assigned to the combination MLPRms with wavelets as extraction technique and genetic algorithm as selection process with a rate of 93%. This being said, wavelets is suggested to be better than welch method thanks to its duality in time and frequency domains. Genetic algorithm is better than PCA. The latter suffers from subjectivity especially when choosing the adequate number of eigen values to be hold.

Fig. 2.
figure 2

PCA/Welch periodogram classification with different window lengths

Fig. 3.
figure 3

GA/Welch periodogram classification with different window lengths

The 7% of misclassified data could be explained as follows: it is assumed that the rate given by the subject reflects his emotion during the whole visualization process which is not taken for granted. In fact, emotion could trigger from state to state at a certain period (which could be limited to a limited amount of time as well as it could start at the beginning or at the end of the session).

Fig. 4.
figure 4

PCA/Wavelets classification performance

Fig. 5.
figure 5

GA/Wavelets classification performance

3 Conclusion and Perspectives

In this project, an investigation about the influence of emotions on cerebral activity was assessed. For this purpose, different techniques were deployed combining extraction, selection and classification phases. The overall performance showed that the combination between wavelets, genetic algorithm and MLP provides the highest classification rate. Those approaches were applied only on a simulated environment but not in real world. This could make the biggest shortage as challenges imposed by real wheelchair navigation differ to challenges faced in a simulated one. For example, in a simulated environment, all objects coordinates are already known as they are communicated by the software but this is not the case in real world. Another problem is that vibrations, wheelchair real velocity and synchronization issue biases much more the results. What’s more, emotion study wasn’t precise enough to account many other emotion cases; this is done intentionally because the chosen emotions were representative for each quadrant in order to not complicate the study. But in reality for each quadrant, many emotions could be situated in the same region which can create some confusion during classification.