Keywords

1 Introduction

Flight-crew’s cognitive and physical well-being is a critical factor on flight safety. International Civil Aviation Organization’s (ICAO) statistics indicate that cognitive factors account for 26 % of the incidents and accidents in civilian flight operations [1]. Although modern avionics systems assist the flight crew in important ways to reduce their cognitive and physical load, piloting is still a cognitively demanding task where pilots’ are expected to maintain a high level of situational awareness, actively monitor the flight instruments, engage in the planning of flight legs, communicate with the air traffic controller (ATC) and the co-pilot, perceive and remember instructions provided by the ATC, stay vigilant to anticipate and avoid possible issues etc. Such aspects of piloting typically put a significant burden on the attentional and working memory resources of the pilots, especially in the case of critical flight episodes such as take off and landing, as well as unexpected events such as emergencies and instrument failures.

Despite the critical importance of pilots’ cognitive and physical well-being for the success of flight operations, there are currently no operational pilot health monitoring systems in today’s airliner cockpits. Due to the technological advances in biomedical sensors, real-time monitoring of flight crew’s health status with ambient and body sensors have become increasingly feasible. In the context of a European Union 7th Framework Programme for Research project, the Advanced Cockpit for the Reduction of Stress and Workload (ACROSS) consortium has been exploring the use of multiple sensors such as eye/body trackers, facial recognition software, heart rate sensors, and optical brain imaging sensors in real time for this purpose in an effort to improve the safety and the efficiency of flight operations.

Monitoring the level of mental workload is a critical component of such real-time flight-crew health monitoring applications. In the aviation domain, related studies have explored the use of various sensors such as electroencephalography (EEG), electrocardiogram (ECG), electromyogram (EMG), electrooculogram (EOG) and galvanic skin conductance (GSR) to relate measures such as brain activity, heart rate variability, eye blink frequency, pupil dilation, eye fixation, muscle contractions and electro-dermal activity with pilot’s mental workload [2]. These studies reported that an increase in pilot’s mental workload tends to be associated with an increase in heart-rate variability, an increase in the rate of respiration, a decrease in the rate and duration of eye blinks, and an increase in gaze fixation durations. However, the fact that such measures can be influenced by physical factors that are unrelated to mental workload makes it difficult to use a single physiological sensor for this purpose. For instance, changes in cockpit lighting or the illumination from flight displays may elicit similar changes at the ocular level, whereas heart variability and the rate of respiration may increase due to muscle fatigue rather than an increase in mental workload [3].

These challenges motivated the use of sensors such as EEG that can monitor brain activity directly in the cockpit. EEG monitors the changes in electric potentials due to neural activity via electrodes distributed over the scalp. EEG studies investigating the changes in the level of alertness and mental workload of pilots during simulated flight missions primarily focused on fluctuations in the power of EEG signals in the theta (4–8 Hz), alpha (8–12 Hz) and beta (12–18 Hz) bands [2]. For instance, decrease in vigilance and deterioration in performance are associated with increased EEG power spectra in the theta band together with a change in EEG alpha power, whereas slips in attention and drowsiness modulate alpha waves [4]. The main limitation of the EEG approach is the difficulties involved with sensor placement and ensuring good conductivity to ensure data quality, which makes it challenging to employ the EEG method in the cockpit.

Functional near-infrared spectroscopy (fNIR) provides an alternative approach for the real-time monitoring of brain activity in the cockpit. The optical nature of the fNIR method allows the design of portable, wearable and durable sensors that offer practical advantages particularly for neuroergonomic applications [5]. In a very recent application of fNIR on real-time mental workload assessment, Gateau et al. succeeded in distinguishing high versus low workload situations in a controlled flight simulator environment by using a support-vector machine algorithm [6]. Moreover, in the context of a task that progressively elicits more mental effort, Herff et al. demonstrated that single trial discrimination of workload can be accomplished with an accuracy up to 78 % with the help of a linear discriminant classifier [7]. These findings suggest that fNIR can be a viable option for real time monitoring of flight-crew’s mental workload.

In this study we aim to build on this line of work by investigating the use of fNIR for real-time monitoring of pilot’s mental workload during simulated flight scenarios. The study was conducted as part of the ACROSS project, which aims to bring together multiple remote and wearable sensors into a commercial grade flight simulator for synchronous monitoring of multiple measures related to the level of vigilance, drowsiness, emotional/physical well-being, mental workload and situational awareness of the flight crew in real-time. We developed a linear discriminant analysis (LDA) classifier to distinguish low, moderate and high levels of mental workload during realistic flight simulation scenarios. The flight scenarios were designed to elicit different levels of workload by incorporating regular flight operations such as take off, climb, en-route, approach, descend and landing episodes as well as unanticipated complications such as engine/instrument failure or executing a go-around. Our study differs from existing applications of fNIR in the aviation context in terms of the realism of the flight scenarios used, the number of workload levels considered, and the presence of other IR sources in the environment. We aimed to explore the potential of fNIR for real-time mental workload assessment in a realistic flight simulation environment.

The rest of the paper is organized as follows. The next section provides an overview of the optical brain imaging method employed in this study as well as a description of the data collection and processing stages. This is followed by a description of the LDA model, the training data and an evaluation of the model’s predictions on other pilots’ performance in the same environment. The paper concludes with a discussion of the findings and implications for future work.

2 Methodology

2.1 Functional Near-Infrared Spectroscopy

Functional near-infrared spectroscopy (fNIR) is a neuroimaging modality that enables continuous, noninvasive, and portable monitoring of changes in blood oxygenation and blood volume related to human brain function [8]. Neuronal activity is determined with respect to the changes in oxygenation since variation in cerebral hemodynamics are related to functional brain activity through a mechanism which is known as neurovascular coupling [9]. Over the last decade, studies in the laboratory have established that fNIR spectroscopy provides a veridical measure of oxygenation and blood flow in the brain [8, 9]. fNIR is not only non-invasive, safe, affordable and portable, it also provides a balance between temporal and spatial resolution which makes fNIR a viable option for in-the field neuroimaging.

Several neuro-imaging modalities such as fMRI, PET and fNIR are based on methods for monitoring the hemodynamic changes in the brain due to neuronal activity. Neurons require energy to get activated, which is supplied by the metabolization of glucose via astrocytes [10]. The metabolization process requires oxygen which is supplied by the hemoglobin molecules present in the capillary beds within the vascular system. When a group of neurons fire, they initially consume the oxygen present in their vicinity, which will produce an initial increase in the concentration of deoxy-hemoglobin (HbR) and a dip in the concentration of oxy-hemoglobin (HbO). In the order of 4–6 s, the vascular system responds to this local energy need by supplying more oxygenated blood towards that location, which increases the concentration of HbO and washes away the HbR. As the neural population returns back to its baseline activity level, the concentrations of HbR and HbO also come back to their baseline levels. The change in relative concentrations of HbR and HbO due to neuronal activity is called the hemodynamic response.

fNIR technology uses specific wavelengths of light, introduced at the scalp, to enable the non-invasive measurement of changes in the relative ratios of HbR and HbO in the capillary beds during brain activity. Typically, an optical apparatus for fNIR Spectroscopy consists of at least one near infra-red light source and a detector that receives light after it has interacted with the tissue. Near-infra red light is known to diffuse through the intact scalp and skull, which makes it suitable for tracing relative changes in the concentration of specific chromophores in the neural tissue with non-invasive, spectroscopic methods [11]. Whereas most biological tissue (including water) are relatively transparent to light in the near infrared range between 700 to 900 nm, hemoglobin is a strong absorber of light waves in this range of the spectrum. Figure 1 below shows the absorption characteristics of elements present in biological tissue. Within 700 to 900 nm, HbO and HbR are among the highest absorbers of infra-red light. Moreover, within this range, the absorption characteristics of these molecules criss-cross each other, which makes it possible to separate the two chromophores from each other. This provides an optical window into neural tissue where one can approximate relative changes in the concentration of HbO and HbR based on how infra-red light is attenuated in neural tissue.

Fig. 1.
figure 1

The banana shaped path followed by the photons in the tissue (left). The optical window in which HbO and HbR have the strongest absorption characteristics in the IR range (right). (Color figure online)

Photons that enter tissue undergo two different types of interaction: absorption and scattering [9, 12]. Two chromophores, HbO and HbR, are strongly linked to tissue oxygenation and metabolism. The absorption spectra of HbO and HbR remain significantly different from each other allowing spectroscopic separation of these compounds to be possible by using only a few sample wavelengths. Once photons are introduced into the human head, they are either scattered by extra- and intracellular boundaries of different layers of the head (skin, skull, cerebrospinal fluid, brain, etc.) or absorbed mainly by HbO and HbR. If a photodetector is placed on the skin surface at a certain distance from the light source, it can collect the photons that are scattered and thus have travelled along a “banana shaped path” (Fig. 1) from the source to the detector, which carry important information about the optical properties of the diffused neural tissue. By using the Modified Beer Lambert Law, this information is converted into estimations of changes in relative concentrations of HbO and HbR [8].

2.2 Experiment Setup

Participants. Our sample include 8 pilots who have commercial/military flight hours in the range 3500–17000 h (M = 10712, SD = 5057). Pilots had normal or corrected to normal vision, and normal hearing. Participants did not report any history of psychiatric disorders. This study was approved by the Middle East Technical University Human Subjects Research Ethics Committee.

Materials, Apparatus and Software. The experiments were run at the premises of Thales Aviation (Cergy, France). In the context of the ACROSS project, several sensors including fNIR, SmartEye eye trackers, a Microsoft Kinect body tracker, a seat sensor and a heart rate sensor were installed on a Thales Airbus A320 simulator (Fig. 2). In this paper we focus only on the fNIR data collected in this setting. In order to shield the fNIR sensor from other infra-red sources such as the eye trackers and the Kinect body tracker, we used multiple layers of aluminium foil sawn inside a cloth cover that was installed over the fNIR sensor pad.

Fig. 2.
figure 2

A participant piloting the Thales Airbus 320 Simulator

While the pilots flew the scenarios the neural activity in their prefrontal cortex was monitored by a functional near-infrared spectroscopy (fNIR) system developed at Drexel University (Philadelphia, PA), manufactured and supplied by fNIR Devices LLC (Potomac, MD; www.fnirdevices.com). The real-time mental workload monitoring application consists of four modules; a flexible head-piece (sensor pad), a control box for hardware management, a computer that runs the COBI Studio software [12] for data acquisition, the DAQ Station module of fNIRSoft [13] for real-time low level processing of fNIR data, and a software application for real-time classification of mental workload level of the participant (Fig. 3).

Fig. 3.
figure 3

The flexible sensor pad and its installation on the forehead (left), the fNIR data acquisition box (center), anatomical locations for the 16 optodes.

The fNIR sensor holds 4 light sources and 10 detectors, which obtains oxygenation measures at 16 optodes on the prefrontal cortex. The sensors have a source-detector separation of 2.5 cm, which allows for approximately 1.25 cm penetration depth to reach the cortical surface. This system can monitor changes in relative concentrations of HbO and HbR at a temporal resolution of 2 Hz. The locations of the regions on the cortical surface monitored by these two different sensors are displayed in Fig. 3, which correspond to Broadmann areas 9,10,44 and 45. Existing neuroimaging studies suggest that the prefrontal cortex has a special role in the processing of higher order cognitive functions such as working memory management, sequential processing of sensory and memory input, as well as response inhibition and decision making [8]. fNIR can monitor regions including left/right dorsolateral prefrontal cortex (dlPFC), left/right dorsomedial prefrontal cortex (dmPFC) and frontopolar cortex which are known to be associated with the abovementioned higher order cognitive processes.

In the scope of ACROSS project, the fNIR sensor is used to monitor the mental workload induced on pilot flying (PF) during the aviate-navigate-communicate tasks. The linkage between the mental workload monitoring and the fNIRS technology is based on the fact that the workload has direct relationship with the hemodynamic response and it can be measured by fNIRS technology.

Flight Scenarios. Pilots performed 4 flight scenarios during the experiment. The first scenario involved a free play task that took about an hour including sensor installation and demonstration of the flight simulator. After a rest period, the second scenario was run, which started in cruise mode and ended with normal workload landing. After the second scenario there was a lunch break. The third scenario also included cruise mode flight followed by a landing which was diverted to a different airport by the ATC, with the aim to increase the mental workload level. The final scenario simulated a high workload landing by having the ATC to initiate a sudden go-around due to late aircraft incursion on runway. During the climb an instrument failure (e.g. flap or engine failure) was initiated in order to further increase the mental workload level.

Data Processing. The fNIR sensor collects raw optical measures from 16 locations at 3 wavelengths (i.e. 730 nm, 805 nm, 850 nm) at 2 Hz. Raw optical signals are sampled in 60 s long blocks. A script executed by the DAQ station converts raw signals into HbO and HbR measures by using the Modified Beer Lambert Law by considering the first 10 s as a baseline. The script also computes the mean, standard deviation, slope, minimum, maximum and range values for HbO and HbR signals. The script streams the processed oxygenation measures to the mental workload classification application for every 5 s. The package sent by the script contains the oxygenation measures obtained for the last 60 s, together with a feature vector including mean, standard deviation, slope and range measures for HbO and HbR signals respectively. The features are fed into two discriminant functions to compute the distance between the feature vector and the centroids for mental workload categories. The application returns the closest centroid as the mental workload level estimate (Fig. 4).

Fig. 4.
figure 4

The temporal progression of the employed sliding window method. Features used for mental workload estimation were sampled from a 60 s long window, which is updated every 5 s.

3 Mental Workload Estimation Model

3.1 Linear Discriminant Model

Discriminant analysis is a popularly used multivariate statistics method for the classification of neuroimaging signals [7]. The method is based on the eigenvalue decomposition of a high dimensional input space. Dimension reduction is accomplished by focusing on the discriminant functions that are a linear combination of several features, given a categorization as training input. The discriminant analysis method aims to find the most discriminating directions in the high dimensional vector space to achieve a comparable discriminating power among the existing data points.

Our LDA model uses 92 dimensional feature vectors to discriminate 3 different mental workload levels. The model is based on a single subject’s fNIR data recorded during the simulated flight scenarios described above. The model parameters are estimated by using IBM SPSS v22. Due to high level of noise observed at optodes 8,10 and 16, the discriminant model was constructed on the remaining 13 optodes.

Fig. 5.
figure 5

The distribution of data points in the 2D space defined by the two LDA functions (Color figure online)

Prior to training, we prepared a training dataset by performing a qualitative analysis of the video files to judge the level of mental workload experienced by the subject during each of the three scenarios. Pilots’ self-assessments of their mental workload which is collected after the experiment and the differences between the scenarios in terms of the presence of unexpected events such as failures are used as additional cues while manually marking the episodes for low, moderate and high mental workload. Once a mental workload assignment is made for each scenario, the annotated data is used for training a LDA model.

The discriminant analysis method computed 2 functions that significantly discriminate the three mental workload categories, where the first and second functions account for 73 % and 27 % of the total variability in the data. Wilk’s lambda statistic suggest that these two functions significantly discriminate the three workload categories, χ 2(208) = 1047.32, p < .001. The group centroids in the 2D space defined by discriminant functions 1 and 2 are plotted in Fig. 5. When the origin is taken as a reference, this plot suggests that function 1 distinguishes between lower (i.e. 0) versus higher (i.e. 1 and 2) workload cases, whereas function 2 distinguishes high workload cases (i.e. 2) from lower workload cases (i.e. 0 and 1). Altogether, the functions predicts the mental workload categories with an accuracy of 91 %. Standardized LDA coefficients suggested that features obtained from optodes 1, 3, 5 on the left PFC and 13, 15 on the right PFC are the strongest contributors. Especially optodes 1, 3 and 13, 15 correspond to bilateral dorsolateral PFC region, which were also implicated in Gateu et al.’s [6] findings as strong contributors to mental workload estimation.

3.2 Model Evaluation

In order to evaluate the performance of the model, we sampled 69 episodes from the flight videos of 8 pilots that include routine flight episodes where we expect low mental workload as well as moments where events such as a flap or engine failure occurred that are expected to elicit higher levels of mental workload (Fig. 6). Table 1 summarizes the cross tabulation of predicted versus expected mental workload levels for the test cases. Of the 69 cases we analyzed, in 68.1 % of the cases there was a perfect match between the predicted and expected mental workload levels. The highest number of mismatches occurred when the model predicted a high MW case, whereas the expectation was low MW. As indicated under specific instances above, such cases happened due to fluctuations in the raw oxygenation measures due to excessive head motion or ambient noise in some of the optodes that contribute to the model and workload predictions.

Fig. 6.
figure 6

One of 69 episodes sampled from the entire dataset. The sample episode shows the change in mental workload estimation from low to high following an engine failure alert.

Table 1. The distribution of predicted and expected mental workload levels

4 Discussion and Future Work

In this study we developed and evaluated a preliminary LDA based classifier that aim to predict low, moderate and high mental workload states of pilots in real-time based on a set of features computed over a moving window of HbO and HbR measures obtained from 16 locations distributed on the prefrontal cortex. The initial classifier was trained over a single pilot who ran through all three flight-scenarios with an accuracy of 92 %. This model was then used to predict the mental workload levels of the remaining 7 pilots in real-time while they were running the scenarios. A qualitative analysis of 69 events sampled from these simulated flights showed that the model trained over a single pilot could predict the expected workload level in 68 % of the cases. We also found that false positive predictions may arise due to excessive head motions of the pilot and the interference from other IR sources in the cockpit. In such cases the classifier tend to overestimate the workload level. The IR sources especially affected the raw signals obtained from the bottom row of optodes, which explains the lower standardized discriminant coefficients observed for these optodes. The strongest contributors to the discriminant functions were optodes 1,3 of left dlPFC and optodes 13,15 of right dlPFC, consistent with Gateu et al. [6].

In general, establishing a robust relationship between physiological measures and psychological states of subjects is a challenging undertaking in cognitive neuroscience research. Although each individual is unique in terms of their cognitive and behavioral attributes, years of neuroscience research has identified some common brain activity patterns that are valid across individuals during specific cognitive tasks. Such commonalities correspond to rough generalizations of cognitive function attributed to a number of anatomical areas. However, they may be useful for practical applications such as monitoring mental workload in real time. Findings in cognitive neuroscience and psychology have pointed out that attention is a scarce resource despite the immense computational power of the approximately 20 billion neurons inside the brain that can process information in parallel. When attention needs to be divided between multiple tasks or when the subject is confronted by a sudden, challenging problem, existing studies point out that there is an increased oxygen demand in prefrontal areas that are believed to be responsible for orchestrating the coordination between multiple neural resources distributed in the brain. Such novel forms of challenging stimulus that require conscious deliberation tend to activate a network of cortical areas known as the multiple demand system [14], which also includes areas in the bi-lateral prefrontal cortex that we relied on while developing a real-time mental workload assessment application. Although we still don’t exactly know how the brain functions, such generalized findings in cognitive neuroscience based on imaging and lesion studies provide the necessary theoretical background that suggest the feasibility of an approach based on real-time monitoring of blood oxygenation in the cortex.

Although our mental workload estimation algorithm aims to capitalize on neural activation patterns that are assumed to be valid and consistent across participants, research on brain-computer interfaces (BCI) suggest that models that are customized for an individual provide more precise predictions about their intended behavior based on their brain activity. This is especially important in the BCI context, because subjects need to learn how to act with such novel interfaces, and there are important individual differences in how people acquire new skills. In our case, the information obtained from the brain is passively assessed without providing any feedback to the monitored participant. Moreover, we worked with expert pilots, so we did not focus on learning aspects or novice vs. expert contrasts. These factors allowed us to focus more on commonalities than individual differences while designing our mental workload estimations algorithm.

There are a number of issues to be addressed to improve the reliability of the mental workload estimate obtained via fNIR sensors. First of all, proper sensor placement and appropriate shielding of sensors from other IR sources is important for collecting useful information from the brain. Secondly, the model should be trained over a larger sample of subjects to construct a more robust, generalizable prediction model for workload assessment. Moreover, the algorithm design can be improved in several ways. A range of additional features such as skew, kurtosis, time to maximum for each sliding window could be used to better parametrize the distribution of the HbO and HbR signals. The effect of choosing different window size options on prediction accuracy should also be carefully studied. Filters that can be used to minimize the influence of head movements and saturated channels would also positively contribute to the prediction accuracy of the model. Finally, there are additional machine learning approaches such as deep learning networks, support vector machines and hidden markov models that can be exploited for better predictions. Some of these methods can even be fused with other sensors in the cockpit, such as the eye trackers, for improved accuracy.

Our discriminant function analysis showed that the strongest contributors to our workload assessment algorithm were optodes located in the left and right dorsolateral prefrontal cortex. Since these areas cover relatively a smaller part of the forehead, estimating mental workload with a smaller fNIR sensor seems to be feasible. The current device already supports a “split sensor”, which can monitor 4 optodes located over left and right dorsolateral prefrontal cortex as well as left and right dorsomedial prefrontal cortex. Using a smaller sensor may improve crew acceptability as it is much more light-weight and easier to install. Moreover, the smaller size of the sensor also makes it relatively easier to shield from outside IR sources like eye-trackers or Kinect cameras. In the near future we expect to see further refinements in sensor design, which may allow us to use small patch like thin sensors wireless connected to the control box.