1 Background and Literature Review

1.1 Nature of Emotion and Connection to Physiological Measures

Emotion is a complex phenomenon, often difficult for humans, not to mention machines, to recognize and respond appropriately [9, 10]. “Valence and arousal” [11] are two dimensions widely used and validated in neuro-physiological research. Arousal is conceptualized as “a unitary force that intensifies motivated behavior [20].” As superordinate dimensions of emotion, arousal is a dimension of emotion that maps intensity, while the dimension of the emotion being pleasant versus unpleasant (i.e., the direction of emotion) is dedicated to valence [17]. Russell’s circumplex model of affect is arguably the most popular model depicting the arousal and valence dimensions and their relation to emotional state [39]. Research states that most people’s neutral (baseline) state is located near point (0, 0) in Russell’s model [39] (Fig. 1).

Fig. 1.
figure 1

Russel’s circumplex model of emotion

This study considers valence and arousal as having “Low” and “High” states without concerning ourselves with the granular details of Russel’s model. Therefore we will try to classify between high and low valence, and high and low arousal.

The most common ‘measures’ of emotion are done using self-report surveys, such as the Self-Assessment Manikin (SAM) or the Positive and Negative Affect Scale (PANAS) survey instruments [12]. The SAM uses a non-verbal pictorial assessment technique that directly measures the pleasure, arousal, and dominance associated with a person’s affective reaction to a wide variety of stimuli. In this work we will focus on each of these factors as it relates to the physiological readings.

1.2 Physiological Manifestations of Emotion

Electrodermal Activity on Emotion.

Arousal can be measured as difference in skin conductance, as a result of increased activity in eccrine sweat glands on palm or sole [17]. The activity of eccrine glands is thought to be related to both central and peripheral nervous system. A meta-analysis on a series of studies found that brain regions that are thought to be involved in generating emotional response are also involved in eccrine glands activity [21]. Also, eccrine glands is innervated by the sympathetic branch of autonomic nervous system in peripheral nervous system. Thus, higher activity in skin conductance is thought as an index of higher arousal.

The relationship between body and brain is thought to be interactive and influence each other [9]. This embodiment view of body and mind in human emotion found that brain activity is influenced by afferent and efferent signals from the body [13, 14]. Under this perspective, arousal is related to sympathetic activation in the autonomic branch of the peripheral nervous system, which is observed in skin conductance and heart rate [1, 15, 16]. In this experiment, Skin Conductance was measured in a form of electrodermal activity (EDA) and heart rate was measured as electrocardiogram (ECG).

The connection between heart rate and cognition has been studied in literature [17]. There have been a multitude of studies that attempt to translate the arousal aspect of emotion to EDA [18, 19]. There have been limitations in using ECG data to detect arousal due to the confounding effect of arousal and attention involved in heart rate [1, 2, 16]. By using machine learning technique, this study intends to seek if we can singulate an index for arousal from ECG by combining it with EDA.

Heart Rate on Emotion.

Heart Rate is also known to be related to emotional response on stimuli [1]. However, a confounding effect in heart rate makes it complicated to use it as an index for arousal, such that sympathetic nervous system activation is associated with arousal and faster heartbeat while parasympathetic nervous system activation that is associated with cognitive effort usually takes place as well for stimuli [17]. Although majority of studies [1, 3, 5, 9, 17] focused on the relationship between heart rate and arousal or attention, a more recent study found that the height of R-wave was lower when the stimuli had more arousal potential [22]. This finding suggests that analyzing other artifacts of the PQRST waveform in ECG may reveal direct relationships between ECG artifacts and arousal.

1.3 Machine Learning Efforts on ECG, EDA Data

ECG and EDA data are interesting as physiological responses despite being single channel time series data, there is a variety of features that can be derived from them (SCR, SCL, Heart Rate, Heart Rate Variability). They also represent correlations with emotional and neuronal activity as mentioned above. Methods such as sliding window, time shifting/warping [22] and symbolic representations [23] are used in analysis of time series data. Selvaraj et al. [30] used Bayesian classifier, Regression tree, k-Nearest Neighbor and fuzzy k-Nearest Neighbor on ECG data to classify emotional state. As discussed earlier, One of the most sensitive markers for emotional arousal is Electrodermal Activity (EDA). EDA reflects the amount of sweat secretion from sweat glands triggered by emotional stimulation. To equally assess both the quality of an emotion and its intensity, it is worth combining cognitive data with skin conductance measures to obtain higher accuracy than the individual measures. There have been research into combining EEG data with ECG data in literature [31]. Even combining EEG with other physiological sensors such as Eye Tracking [32] and Electromyography (EMG) [33]. However there is limited literature on combining fNIRS and GSR data [34]. This work incorporates cognitive data from fNIRS and ECG data along with EDA to obtain higher Emotion classification accuracy.

1.4 Functional Near Infrared Spectroscopy

The fNIRS device uses light sources in the wavelength range (690–830 nm) that are pulsed into the brain (Fig. 2).

Fig. 2.
figure 2

Light is pulsed into the cortex, and detectors measure the light reflected back out of the cortex.

Deoxygenated hemoglobin (Hb) and oxygenated hemoglobin (HbO) are the main absorbers of near-infrared light in tissues during hemodynamic and metabolic changes associated with neural activity in the brain [40]. These changes can be detected by measuring the diffusively reflected light that has probed the brain cortex [40, 41].

fNIRS has been used to classify various cognitive states while computer users complete tasks under normal working conditions [35]. It has also recently been successfully used to classify emotional state of the user [36] and, as noted above, one purpose of the current study is to complement the utility of fNIRS measurements in the classification of emotion.

2 Experiment

2.1 Stimuli Selection

Music Video Segments.

A subset of music video segments from the Database for Emotional Analysis using Physiological Signals (DEAP) mentioned previously [6] were selected as stimuli to elicit participant emotions. The DEAP research team selected the videos using a semi-automated method of user-tagged videos and subjectively rejected videos where the tag was not reflective of the emotion induced (e.g. a “happy songs about sad topics”) [6]. The original DEAP dataset stimulus materials consisted of 40 music videos that had been found to induce consistent self-report scores in the outer boundaries of the four quadrants of the circumplex model.

Multi Attribute Task Battery (MATB).

Psychology and neuroscience showed that emotions are connected to high-level reasoning; they are tightly linked to decision-making processes [38]. Also researchers have found connections between workload and emotion [37]. Therefore the different levels of workload are expected to elicit different emotional response. The Multi-Attribute Task (MAT) Battery provides a benchmark set of tasks for use in a wide range of laboratory studies of operator workload. The battery also provides a high degree of experimenter control and freedom to use diverse test subjects [28]. In this experiment, the MATB sessions were customized into 3 different workload levels (Low, Medium, High) in order to elicit different levels of workload in the user. This was achieved by presenting different frequencies of tasks to the user.

Tetris.

The Tetris version [29] was customized to elicit different difficulty levels similar to MATB above. The difficulty levels were changed by adjusting the time between steps in Tetris

  • 200 ms between Tetris steps-hard

  • 300 ms between Tetris steps-medium

  • 400 ms between Tetris steps-easy

2.2 Equipment Setup

Electrocardiogram (ECG) electrodermal activity (EDA) and was measured using Biopac MP150 system. The sample rate was set up as 1000 kHz using AcqKnowledge 4.2 software. For ECG, Mason-Likar lead placement was employed [24] (Fig. 3).

Fig. 3.
figure 3

Application of Mason-Likar lead placement for ECG (marked as x) [25]

fNIRS was setup to take concurrent measurements of cognitive data. The equipment used was a Hitachi ETG-4000 with 3X11 probe configuration (Fig. 4).

Fig. 4.
figure 4

Equipment setup on participant

2.3 Protocol

9 college age subjects took part in this experiment. 6 were men and 3 were women.

The subjects were provided instructions on how to do the different activities. Then they ran through a demo of each task to get familiar. After that, the ECG and EDA electrodes were placed on their body. And the fNIRS was placed on their head. Next the subject was presented with the tasks in a pseudo randomized block design (Fig. 5).

Fig. 5.
figure 5

Block design of the experiment

Before each stimuli, the subject was presented with a REST screen for 30 s. This was to get the subject to baseline level before the start of stimulus. Each task was one minute long.

2.4 Post Task Surveys

Subjects were given a survey after each task to gage their perceived Arousal, Valence, Dominance, Liking and workload levels (Fig. 6).

Fig. 6.
figure 6

Post task survey presented to participants

2.5 Data Analysis

Preprocessing.

The raw data for ECG signal was presented in millivolt (mV), and EDA was in micro Siemens. By using ‘ECG interval extraction’ function in AcqKnowledge 4.2, PQRST waveform was analyzed into variables: RR interval, heart rate, R height, P height, QRS interval, PRQ interval, QT interval, corrected QT interval, and ST interval (Fig. 7).

  • RR interval: Time difference between R spikes of each heartbeat

  • Heart Rate: Number of heartbeats in a minute

  • R height: The height if R spike artifact

  • P Height: The height if P spike artifact

  • QRS interval: time difference between Q and S spikes of a heartbeat

  • PRQ interval: time difference between P and R spikes of a heartbeat

  • QT interval: time difference between Q and T spikes of a heartbeat

  • corrected QT interval: standardization of QT interval corrects shortened/lengthened QT interval in faster/slower heart rate to fit that of heart rate of 60

  • ST interval: time difference between S and T spikes of a heartbeat

The data set contains 11 columns of timestamped data. During preprocessing, the 33 trials data was extracted from the data set and the Mean, Standard Deviation, Maximum and Minimum was calculated for each of the columns.

Fig. 7.
figure 7

Heart rate waveform [42]

This resulted in 11 * 4 = 44 attributes and 33 feature vectors for each subject. The survey data was divided into high and low around the average for each person. This was to ensure an even split between high and low data points.

  • Machine Learning.

ECG Data.

The labels thus created were used as the ground truth and fed into the classifier. Which was evaluated by a 10 fold cross validation scheme. The results for each individual and average results for the whole group are given in the results section.

Fusion of ECG and fNIRS Data.

The fusion of ECG and fNIRS data was done using a simple label weighting method. The labels obtained by each classifier were combined linearly to get final labels. The weighting of the classifications has only two possibilities.

Case 1: Weight assigned to ECG data prediction < Weight assigned to fNIRS data prediction.

Case 2: Weight assigned to fNIRS data prediction < Weight assigned to ECG data prediction (Fig. 8).

Fig. 8.
figure 8

Fusion of ECG and fNIRS classifiers

This prediction was compared with the binary survey label to get the accuracy of prediction.

3 Results

3.1 Valence Classification Using ECG Data

See Table 1.

Table 1. ECG classification accuracy for valence

3.2 Arousal Classification Using ECG Data

See Table 2.

Table 2. ECG classification accuracy for arousal

3.3 Results from Combination of Cognitive and ECG/EDA Data

Case 1: Weight assigned to ECG data prediction < Weight assigned to fNIRS data prediction.

Classification accuracy of combined ECG and fNIRS data using Naive Bayes classifier = 85.85 %.

Classification accuracy of combined ECG and fNIRS data using SVM classifier = 66 %.

Case 2: Weight assigned to fNIRS data prediction < Weight assigned to ECG data prediction.

Classification accuracy of combined ECG and fNIRS data using Naive Bayes classifier.

Accuracy = 93.93 %.

Classification accuracy of combined ECG and fNIRS data using SVM classifier = 61.27 %.

The Naive Bayes classifier gave higher accuracy when combining ECG and fNIRS data whereas SVM accuracy was unchanged.

4 Conclusion

The results show that the combination of Electrocardiogram/Electrodermal Activity and cognitive data (fNIRS) provides higher accuracy than Electrocardiogram/Electrodermal Activity alone. A further explication of this work would be to apply it to a larger data set as well as cross subject data. However this work does emphasize the need for more research into fusion of physiological data to boost the accuracy levels.