Keywords

1 Introduction

In recent years, the unmanned aerial vehicle (UAV) applications domain has seen a rapid growing interest in the development of systems able to assist human beings in critical operations [1,2,3]. Examples of such applications include security and surveillance, monitoring, search and rescue, disaster management, etc. [4].

Systems able to flexibly support different levels of autonomy (LOAs) according to both humans’ cognitive resources and their performance in accomplishing critical tasks, may be exploited to determine situations in which system intervention may be required [5,6,7]. The human’s cognitive resources and the ability of the system to dynamically change the LOA according to the considered context are generally termed as “cognitive or mental workload” [8] and “adjustable or sliding autonomy” [9], respectively.

In literature, several criteria have been investigated to evaluate human’s cognitive load. The main measurement techniques have been historically classified into three categories: physiological, subjective, and performance-based [10]. Physiological measurements are cognitive load assessment techniques based on the physical response of the body. Subjective measurements are used to evaluate humans’ perceived mental workload by exploiting rankings or scales. Performance or objective measurements are used to evaluate humans’ ability to perform a given task.

By moving from the above considerations, the aim of this paper is to build a classification and prediction model of UAV operators’ mental workload to support the design of an adaptive autonomy system able to adjust its level of autonomy accordingly. An ElectroEncephaloGram (EEG) signals was used as physiological technique for assessing operators’ mental workload and a Support Vector Machine (SVM) was leveraged as learning and classification model [11,12,13].

A 3D simulation framework was exploited in this work to both experiment different flying scenarios of a swarm of autonomous drones flying in an urban environment and test the operator’s performance in UAV-traffic management. A user interface was also used to show the 2D visualization of experimented environment and allow human operators to interact with UAVs by issuing flight commands.

A user study was carried out with several volunteers to both evaluate operators’ performance in accomplishing supervision tasks of a growing number of drones and gather different workload measurements under critical conditions.

The rest of the paper is organized as follows. In Sect. 2, relevant works concerning workload measurements are reviewed. In Sect. 3, the device exploited in the study is described. Sections 4 and 5 provide an overview of the overall simulation framework and report details of the user interface considered in this work, respectively. Sections 6 and 7 introduce the methodology that has been adopted to perform the experimental tests and discuss data analysis and the classification procedure. Lastly, Sect. 8 discusses obtained results and concludes the paper by providing possible directions for future research activities in this field.

2 Related Work

Many studies have investigated the relationship between tasks performed by an individual and its cognitive load. In literature, different techniques have been proposed for mental workload assessment [10].

For instance, concerning subjective measurements techniques, [14, 15] have exploited the NASA-TLX questionnaire to evaluate users’ perceived workload in gaze-writing and robotic manipulation tasks, respectively. Similarly, Squire et al. [16] have investigated the impact of self-assessed mental workload in simulated game activities.

Despite, these measurements have been proved to be a reliable way to assess humans’ mental workload [17], they often require annoying or repetitive interactions to the users by asking them to fill different rankings or scales.

In parallel to these studies, other works have evaluated physiological measurements as mental workload assessment techniques. As a matter of example, Wilson et al. [18] exploited EEG channels, electrocardiographic (ECG), electrooculographic (EOG), and respiration inputs as cognitive workload evaluation in air traffic control tasks. Functional Near-Infrared Spectroscopy (fNIRS) and Heart Rate Variability (HRV) techniques were exploited in [19] and [20] to assess the human’s mental workload in n-back working memory tasks and ship simulators, respectively. Besserve et al. [21] studied the relation between EEG data and reaction time (RT) to characterize the level of performance during a cognitive task, in order to anticipate human mistakes.

Although these studies have provided evidences to improve accuracy in workload measurements, they traditionally exploit bulky and expensive equipment virtually uncomfortable to use in real application scenarios [22]. Data about suitability of alternative devices in physiological measurements are actually required in order to properly support next advancements in the field. Some activities in this direction have been already carried out. For instance, Wang et al. [12] have proved that a small device, as a 14-channel EMOTIV®Headset, can be successful used to characterize the mental workload in a simple memory n-back task.

The goal of the present paper is to study on results reported in [12] a different application scenario exploiting EEG signals to build a UAV operators’ mental workload prediction model in drones monitoring tasks.

3 Emotiv Epoc Headset

This section briefly describes the brain wearables devise EMOTIV Epoc+®Footnote 1 considered in this study by illustrating its hardware and software features. More specifically, the EMOTIV Epoc+ (Fig. 1a) is a wireless Brain Computer Interface (BCI) device manufactured by Emotiv. The headset consists of 14 wireless EEG signal acquisition channels at 128 samples/s (Fig. 1b). The recorded EEG signal is transmitted to an USB dongle for delivering the collected information to the host workstation. A subscription software, named Pure\(\cdot \)EEG is provided by Emotiv to gather both the raw EEG data and the dense spatial resolution array containing data at each sampling interval.

Fig. 1.
figure 1

Emotiv EPOC headset (a) and its 14 recorder positions (b).

4 Simulation Framework

The basic idea inspiring the design of the present framework is to test different UAV flying scenarios in an urban environment. Such scenarios simulate potentially critical situations in which drones could be involved in. The logical components that were assembled to implement the proposed framework are illustrated in Fig. 2. By digging more in details, the UAVs Simulator is the module responsible for simulating swarm of autonomous drones flying in the 3D virtual environment. It consists of three different modules, namely: Autopilot, Physics Simulation and Ground Control Station (GCS).

Fig. 2.
figure 2

Logical components of the simulation framework.

The Autopilot module is responsible for running drones flight stability software without any specific hardware. More specifically, it exploits the Software-In-The-Loop (SITL)Footnote 2 simulator to run the PX4 Autopilot FlightcodeFootnote 3 - an open source UAV firmware of a wide range of vehicle types. The Physics Simulation module is the block devoted to load the 3D urban environment and execute the drone flight simulation in it. GazeboFootnote 4 physics engine was exploited in this block for modeling and rendering the 3D models of drones with their physic properties, constraints and sensors (e.g. laser, camera). In particular, Gazebo runs on Robot Operating System (ROS)Footnote 5, which is a software framework developed for performing robotics tasks. Then, the Ground Control Station (GCS) module contains the software used for setting drones’ starting locations, planning missions and getting real-time flight information. The communication between the Autopilot Flightcode and the GCS module is provided by the Micro Air Vehicle ROS (MAVROS) node with the MAVLink communication protocol (Fig. 2).

Since drones communicate or transmit information through the network, low bandwidth coverage areas could lead to loss of communication and thus to potentially critical conditions. Hence, a Bandwidth Simulator is developed to estimate, in the experimented city, the maximum amount of data the network can transmit in the unit of time. The network transmission rate is assumed to depend on population density of the city sites (parks, stadiums, schools, etc.) and the network coverage.

Lastly, the Alert Module is the block devoted to determine the level of risk (later referred to as “Alert”) of each drone by gathering data from both UAVs and Bandwidth Simulators. Specifically, as in [23, 24], the UAVs Simulator provides drone information regarding both their battery level and their distance from obstacles (e.g. buildings). The Bandwidth Simulator sends the estimated network transmission rate in the areas around drones’ positions. The mapping between these parameters and each drone’s “Alert” is performed through a function defined as follows: \(y=(b-1)^{-1}*(o-1)^{-1}*(n-1)^{-1}\), where b represents the drone’s battery level, o is its distance from obstacles, n is the estimated bandwidth coverage around its position and y is its level of risk. Three different “Alert” levels are proposed in this work, namely: “Safe”, “Warning” and “Danger”.

5 User Interface

In this section, the user interface devised for showing the 2D visualization of experimented environment and useful information allowing human operators to interact with UAVs is presented.

Fig. 3.
figure 3

Monitoring interface (a), UAVs summary (b) and control buttons (c). (Color figure online)

As illustrated in Fig. 3a, a wide region of the operator’s display is covered with the 2D map of the city in which the real-time drones’ locations are shown. A colored marker is used to depict the drone’s GPS position as well as its current status. Three different colors are used to illustrate the drone’s level of risk: green (“Safe”), yellow (“Warning”) and red (“Danger”). On the right side of the interface an extensive visual summary for each drone regarding its unique name, its battery level, the bandwidth coverage of the area around its location and its flying altitude, is shown (Fig. 3b). Right below the map five buttons allowing operators to issue flight commands or show general information about the map or drones are placed (Fig. 3c). More specifically, the “Start” button is used to run the 3D simulation, whereas the “Options” button to show or hide the bandwidth coverage of the city and the drones’ paths. The other three buttons are used by the human operator to land, hover or change the drone’s path, respectively. In this scenario, it is worth observing that EEG signals could be affected by the movement of human operators for pressing the above buttons. Thus, an artifact removal stage is needed in order to remove all undesired signals as detailed in Sect. 7.1.

6 User Tasks

The goal of this paper is to exploit EEG signals to build a prediction model of the UAV operators’ mental workload in order to train a system able to autonomously predict operators’ performance in UAVs monitoring operations. To this aim, an SVM classification algorithm was exploited to learn the ability of operators to carry out assigned drone-traffic-control tasks in different flying scenarios. Four monitoring tasks were experimented in this work, namely: M1, M2, M3 and M4. In particular, M1 consisted of a single flying drone whose path was designed for avoiding obstacles on its route. No operator’s action was necessary to successfully complete the mission. M2 was meant to evaluate the operator’s performance in monitoring two drones at risk of colliding. Collisions were specifically designed distant over time in order to allow the operator to be virtually able to deal with them by keeping the effort to complete the mission relatively low. Mission M3 consisted of five drones, three of which at high risk of colliding. This mission was intentionally created to be very difficult to complete even though theoretically still manageable. Lastly, M4 consisted of six drones, each of which required operator’s interventions to successfully complete the mission. It was devised to be hardly to complete.

Furthermore, a mission is considered “successfully completed” when all drones landed in the intended positions or “failed” when at least one drone crashed. The number of drones in each mission was also defined relying on a preliminary experiment which proved no significance difference in operators’ mental workload in monitoring three or four UAV. Data collected during mission M1 were used as a mental workload baseline whereas those recorded in M4 as high mental workload reference.

7 Data Analysis and Classification

This section details the data analysis and classification procedure performed in this work. It entails the following steps: data pre-processing, feature extraction and classification.

7.1 Pre-processing

The EEG consists of recording electric signals produced by the activation of thousands of neurons in the brain. These signals are gathered by electrodes located over the scalp of a person. However, some spurious signals may affect the EEG data due the presence of noise or artifacts. In particular, the artifacts which are signals with no cerebral origin can be divided in two groups. The first group is related to physiological sources such as eye blinking, ocular movement and heart beating. The second group consists of mechanical artifacts, such as the movement of electrodes or cables during data collection [25]. Thus, a pre-processing stage is needed to remove all undesired signals and noise. It consists of three different phases, namely: filtering, offset removal and artifact removal. The EEGlab toolbox under the Matlab environment [26] was exploited in this phase.

Since the EEG signals frequencies are within 0.5 and 45 Hz, the filtering phase implements a Finite Impulse Response (FIR) passband filter to remove signals with high frequencies and increase signal to noise ratio. The offset removal phase eliminates potential offset residues after the filtering phase. The last stage exploits the Artifact Subspace Reconstruction (ASR) algorithm for artifact removal [27].

7.2 Feature Extraction

Given the preprocessed data, relevant features have to be extracted to train the classification model. For this purpose, temporal ranges of the signals containing relevant events to be analyzed are defined. In this work, the signal was split in different time windows as follows: 15 s after the start of the EEG recording and 15 s before the first failure, divided in 5 s windows. Data recorded during the idle drone’s takeoff phase was ignored to avoid exploiting related mental workload measurements as baseline reference in the UAV monitoring experiment. Data in the range just before and after the first failure were not recorded since they may be affected of biases due to the operator’s frustration for failing the assigned task. For each window the following features were calculated channel by channel: Power Spectral Density, Mean, Variance, Skewness, Kurtosis, Curve length, Average non-linear energy and Number of peaks [12]. These features were then concatenated in order to make each window corresponds to a row of features appearing in order of channel. Each row was then assigned to a label that states whether the operator failed or not the task for that particular mission.

7.3 Classification

The aim of this step is to train the classification system considered in this study with the operators’ mental workload for predicting their performance in UAVs monitoring operations. Three different models were exploited in this work: two classifiers for predicting the outcome of each mission for each single subject; in the third one, overall data gathered from all operators were used, in order to understand whether a generalized model may be also employed.

A procedure dealing with feature scaling, hyperparameter optimization, results validation and learning model design, was proposed in order to judge the model considered from the point of view of accuracy.

Feature Scaling. An important issue in signal processing field, and in particular with the EEG data is the high variability of the features extracted from each subject thus their different ranges. An appropriate scaling method is needed in order to normalize all data into the same range. A z-score scaler was used as normalization method for subtracting mean values from all measured signals and then dividing the difference by the population standard deviation [28].

Hyperparameter Optimization and Validation Methodology. Since the aim of the classification methodology is to have a good accuracy on unseen data, an appropriate validation method becomes necessary in order to measure the generalization error of the implemented model. For this purpose, a k-fold cross validation technique was used to both find the best model with the optimal parameters and test its performance on new unseen data. It consists of samples subdivision in k folds, where k − 1 are used in each iteration to train the model, and the remaining one is used to evaluate the results.

According to this validation methodology, data were divided into three different groups, namely training set, validation set, and test set as follows: \(20\%\) as test set, and the other \(80\%\) as training and validation sets. A ten-fold cross validation is then performed on training and validation sets as follows: samples are divided in ten folds, nine of which are used in each iteration to train the model, and the other one is used to evaluate the results. This procedure is then iterated until all folds are used one time as validation set. The training accuracy is then evaluated as the mean of all the obtained results in the different iterations. The parameters leading to the best model performance called “Hyperparameters” are then selected [29]. Lastly, the model is evaluated using the test set.

Learning Model. A Support Vector Machine (SVM), which is a learning model able to infer a function from labeled training data, is exploited in this phase to deduce from the operator’s EEG workload his ability to succeed or not a mission. It is implemented with two different kernels: linear and Radial Basis Function (RBF). The former is used to find the best hyperplane separation in binary classification problems by tuning the regularization parameter C. The latter is generally used in problems that are not linearly separable and require to find also the best value of the \(\gamma \) parameter [13].

The C parameter is used to regularize and control the bias variance trade-off. The \(\gamma \) parameter is used to define the variance of the Radial Basis Function (RBF). A grid search using powers of ten from \(10^{-2}\) to \(10^2\) was used to tune the C parameter through the cross-validation phase. For the \(\gamma \) parameter, powers of ten from \(10^{-4}\) to 10 were used by considering that bigger values lead to adjust better the model to the training set but bring possible problems of variance or over-fitting. Smaller values may bring bias or under-fitting problems.

8 Results and Discussion

As anticipated, the goal of this paper is to build a UAV operators’ mental workload prediction model in order to train a system able to autonomously predict operators’ performance in UAVs monitoring operations. To this aim, mental workload data have been collected through a user study.

The study involved 10 participants (8 males and 2 females, aged between 19 to 24), selected from the students of Politecnico di Torino. After a brief training, participants were invited to perform the four tasks M1, M2, M3 and M4 in sequence through the user interface. Such tasks have been specifically designed to test operators’ performance in UAVs monitoring operations with an increasing drones’ level of risk. Each task, whose length was strictly depending on the operator’s piloting choices, took from 2 to 7 min. During each experiment (i.e., all tasks performed), physiological measurements gathered by the EEG signal through the EMOTIV Epoc+®Headset were recorded. The EEG signal was split in different time windows as detailed in Sect. 7.2. For each window, the following features were calculated: Power Spectral Density, Mean, Variance, Skewness, Kurtosis, Curve length, Average non-linear energy and Number of peaks. These features were then concatenated in order to make each window correspond to a row of features appearing in order of channel. Each row was then assigned to a label that states whether the operator failed or not the task for that particular mission. This procedure was performed to generate an heterogeneous population in order to build a classifier able to autonomously predict the label from operators’ mental workload measured by EEG signals.

Table 1. Results concerning the accuracy of the classification algorithm for the individual and overall models.

Results obtained in terms of classification algorithm accuracy are reported in Table 1 specifying the hyperparameters used to train each single model. The first ten rows of the table represent the obtained results in the individual model trained using single subject data. The last row shows the overall results using all the collected data. By digging more in details, as shown in Table 1, the fifth and seventh rows present corrupted data that have been discarded for the validation purpose. In those cases, participants only completed one mission successfully, making it very difficult to train the model due to class skewness. As a result, no individual model was trained using those data. However, they were used in the overall model.

The accuracy scores obtained with the ten-fold cross-validation phase (Sect. 7.3) are reported in Table 1 as “Accuracy (Validation set)”. The obtained accuracy with new unseen data is reported as “Accuracy (Test set)”. It is worth observing that the accuracy scores in these two columns for the same row are not largely different. This observation allows to conclude, that the proposed model is not affected by problems of variance thus performs well if tested with other participants under the same conditions.

Results regarding the accuracy of the test sets show that the linear kernel always perform better or equal than the RBF kernel for individual models. On the contrary, the RBF kernel performs better than linear kernel for the overall model. Specifically, the SVM with the linear kernel is able to predict the operator’s performance outcomes thus the level of his/her mental workload with an average accuracy equal to \(95.8\%\) and \(83.9\%\) when the model is trained on a single user and on all collected data, respectively. Whereas, an accuracy equal to \(94.1\%\) and \(85.6\%\) is reached with the SVM - RBF kernel when the model is trained using the single user and overall data, respectively. This may be reasonably due to the fact that individual models trained using single subject data are simpler classification problems than those with all collected data.

In this work, the data analysis and classification procedure was performed offline on the data collected through the user study. Future works will be aimed to address alternative procedures in order to allow online evaluation of the data.