Keywords

1 Introduction

Power generation and petrochemical plants rely on procedures extensively [1, 2]. Traditionally, procedures were written on paper, and remain so in many plants. However, there are significant limitations of voluminous paper procedures due to complexity and mental demand with equipment and operations [2,3,4,5,6]. For example, Kontogiannis [3] concluded that paper procedures were inadequate in presenting complex instructions, handling cross-references, tracing suspended or incomplete steps, and monitoring procedural progress. Ockerman and Pritchett [2] also found that paper procedures can be too heavy, delicate, immobile and difficult to follow, preventing operators from executing procedures efficiently.

Computerized procedure systems (CPs) are being developed to resolve the limitations of paper procedures [3, 4, 7,8,9,10,11]. CPs are digital versions of paper procedures that may include additional capabilities to support the operators in executing procedures. These capabilities range from hyperlinks connecting different parts of a procedure, dynamic displays presenting parameters or controls relevant to procedural steps being executed, automatic checking of preconditions, or automatic execution of control commands [12,13,14]. CPs can aid process plant operators in reducing operation time and errors while alleviating overall workload. For example, Huang and Hwang [7] showed that average operation time and errors for executing decision and action tasks to deal with alarm signals were significantly reduced with CPs compared to paper procedures.

The benefits of CPs may come with the risk of out-of-the-loop (OOTL) performance problems, the decreased ability of the human operator to intervene or assume manual control when automation fails [10, 15, 16]. Specifically, relieving operators from manually checking pre-conditions and executing control actions to reduce workload may lead them to lose track of procedural steps and misjudge plant state if they haphazardly accept any recommendation of the CPs [12]. Consequently, operators may not abort an inappropriate procedure when the CPs are incorrect, or take inappropriate actions due to wrongly assumed plant state. Taking an inappropriate action might include the control room operator calling field operators to fix equipment that is in an unsafe state because the CPs have changed the equipment setting without operator awareness. For example, when a return-to-normal alarm is reset automatically by CPs, operators may not be aware that such an alarm had sounded, hindering the operator’s comprehension and prediction of the plant state [17].

Adaptive automation [18,19,20,21] has been proposed as a solution for balancing risk of experiencing OOTL and workload problems. Specifically, real-time assessment of workload can be used to determine appropriate amount of tasks, thereby keeping operators engaged and preventing the OOTL problem [19, 22]. Thus, CPs adaptive to operator workload on monitoring and controlling process plants may reduce the risk of excessive workload and OOTL problem.

As the first step towards developing CPs adaptive to operator workload, this study investigated the use of eye-gaze metrics for assessing operators’ workload in monitoring process plants. Specifically, we examined the relationships between eye-gaze measures with respect to a subjective rating scale of workload and task performance. Further, we examined which eye-gaze measures would be most sensitive to manipulation of task difficulty that impacts workload.

1.1 Continuous Indicator of Workload with Eye-Tracking

Eye-tracking can provide nonintrusive, continuous indicators of mental workload experienced by process plant operators, whose tasks involve substantial visual (monitoring) and cognitive processing (diagnosis and self-regulation) [23]. Eye movements are motor responses that are regulated by the cortical and subcortical brain system [24], providing information on the distribution of attention in terms of what stimuli are attended to, for how long, and in what order [25]. Substantial research indicates a correlation between human cognitive workload and eye activity measures, including fixations, saccades and blinks [26,27,28,29,30,31,32].

Lin et al. [33] argued that eye fixation and pupil diameter parameters are sensitive indicators to access mental workload. New information is mainly acquired during fixations [24, 34], as suggested by the eye-mind hypothesis postulating that what is being fixated by the eyes indicates what is being processed in the mind [35]. Only under limited special circumstances can new information be acquired during saccades [36, 37]. Larger number of fixations implies a large magnitude of required information processing and hence higher workload. Longer fixation duration suggests more time spent on interpreting, processing or associating a target with its internalized representation and thus higher workload [33, 38]. Marquart et al. [30] reviewed and concluded that dwell time, the period for a contiguous series of one or more fixations within an area of interest (AOI), can be an indicator of mental workload. Dwell time tends to increase with increasing mental task demands. Pupil diameter usually increases in response to increased difficulty levels of tasks translating to another common indicator of mental workload [6, 26, 39, 40].

1.2 Overview of This Study

Empirical research on eye-tracking for workload assessment in process control appears insufficient for developing adaptive CPs. For this reason, we conducted an experiment involving human participants performing monitoring tasks to provide further empirical evidence on whether eye-gaze measures can be effective, continuous workload indicators.

For monitoring process plants, we hypothesize that eye-gaze measures would be able to reveal the types of monitoring tasks imposing different workload. Also, these eye-gaze measures would reveal difference in task load for the same type of tasks. We further hypothesize that the NASA TLX, a validated subjective workload measure [41, 42], would correlate positively with number of fixations, average fixation duration, dwell time and pupil diameter.

2 Method

2.1 Participants

This experiment recruited 45 Virginia Tech graduate and undergraduate students (age range 18–26, 29 females and 28 males). All participants had normal or corrected-to-normal vision. Participants were compensated $10/h for about 1.5 h of their time.

2.2 Experimental Apparatus

The experiment was conducted in a quiet room, with a computer workstation presenting an overview display of a nuclear power plant on a 24″ LED monitor with 1920 × 1200 resolution at 60 Hz. Further, the computer workstation collected eye-gaze and heart rate data with the following equipment:

  1. 1.

    SensoMotoric Instruments (SMI) Remote Eye-tracking Device (REDn) recorded eye-gaze data at 60 Hz sampling rate. The REDn sensor was physically attached to the bottom of the monitor and connected to the computer workstation that had the SMI iVIEW software installed for data collection.

  2. 2.

    Shimmer3 ECG Sensors (ECG) recorded electrocardiogram (ECG), the pathway of electrical impulses through the heart muscle, sampling at 1000 Hz. The ECG was wirelessly connected to the computer workstation through Bluetooth. ECG analysis is beyond the scope of this paper.

2.3 Experimental Manipulation

The participant tasks were to identify parameters deemed out-of-range on an overview display of a fictional nuclear power plant (Fig. 1). The contents of the overview display consisted of tanks, pumps, heat exchangers and valves associated with various process parameters such as level (i.e., %), flow rate (i.e., gpm), temperature (i.e., ℃), and pressure (i.e., psig & KPph), The locations of these process parameters were the same for all trials but the values of these process parameters were updated for each trial. The Question Box prompted the participants to complete two types of monitoring tasks.

Fig. 1.
figure 1

Nuclear control room monitoring simulation platform presenting the overview display and monitoring tasks.

The two types of monitoring task were target-driven and series-driven verification of process parameters to represent common activities specified by procedures of industrial plants.

Target-driven verification (Fig. 2 left column). Participants were instructed to check specific targets (e.g. TC3, SG1 or VD5) per question or monitoring task. The target-driven task included either one or two targets of parameters to represent low and high task load, respectively.

Fig. 2.
figure 2

Examples of question boxes on nuclear control room monitoring simulation, demonstrating manipulation of monitoring task types and task loads.

Series-driven verification (see Fig. 2 right column). Participants were instructed to check all values for a specific type of parameters (e.g. gpm, kPph, psig or %) per question. The series-driven task included either one or two series of parameters to represent low and high task load, respectively.

2.4 Procedure

Participants were welcomed with a brief introduction about the study in front of the computer workstation. Then they were asked to give consent and complete a health history questionnaire. The experimenter provided instructions of the control room monitoring task and answered participants’ questions.

The participants completed four blocks of control room monitoring tasks for all combination of task type and load conditions: two task types (i.e., target-driven vs series driven) and two task loads (i.e., low vs. high). At the beginning of each block, the participants first completed REDn 9-point eye-gaze calibration. Participants completed a NASA-TLX questionnaire at the end of each four blocks. For each trail of the monitoring task, participants responded by clicking the corresponding out-of-range parameter(s) on the display with a mouse and then clicked the ‘Answer’ button (see Fig. 1) to submit their responses and proceed to next trial. The experimenter stopped REDn recording at the end of each block.

After completing four blocks of trials, the experimenter helped participants to take off the physiological instruments. Participants were given the opportunity to ask any further questions and $15 for compensation at departure.

2.5 Experimental Design

The experiment was a 2 × 2 within-subjects design with two treatments: (1) task type (singular targets or series of targets) and (2) task load (low and high). Four blocks of control room monitoring tasks were assigned in a random order across participants. Each block consisted of 3 min of monitoring tasks. Participants performed tasks at their own pace, leading to different numbers of completed trials in one block.

2.6 Measures

Participants were assessed on three categories of measures: task performance, NASA-TLX, and eye-related measures.

Response Accuracy.

The response accuracy was used to assess the task performance. This measure was defined as the percentage of trials for which participant submitted the correct answer by identifying all the out-of-range parameters.

TLX Total Score.

The NASA-TLX questionnaire was used to assess the subjective ratings of workload, using a 10-point visual analog scale. This questionnaire is a multidimensional instrument that consists of 6 subscales: mental demand, physical demand, temporal demand, performance, effort, and frustration. The TLX total score was computed by a combination of the six dimensions, resulting in an overall workload scale between 0 and 60.

Eye-Gaze Measures.

Number of fixations, fixation duration, dwell time and pupil diameter were used as continuous indicators of workload. Area of interest (AOI) was defined as display area covering the graphic and numerical reading of the parameter(s) that should be monitored in each trial. The AOIs varied between trials depending on the monitoring task type and load. For example, a square was marked as the AOI for the trials with the one-target driven verification task, while eight squares were marked as the AOI for trials with one of the series driven monitoring task. Fixation-based metrics on AOI were extracted to indicate workload. All eye-gaze metrics were computed with SMI BeGaze software. Four metrics were selected for comparison: the total number of fixations on AOIs, average duration of a fixation on AOIs, dwell time (total fixation durations on AOIs), and pupil diameter for fixations on AOIs.

3 Results

The experiment yielded 180 observations (45 participants x 4 experimental blocks), of which twelve were removed due to participants performing the monitoring tasks incorrectly. We further failed to collect NASA TLX for an additional participant. Thus, except for NASA TLX, Pearson-product moment correlation statistics were computed to examine relationships between measures across the 168 observations and two-way analysis of variance (ANOVA) were conducted to examine differences between four experimental conditions. Statistics associated with NASA TLX only contains 167 observations.

Response accuracy was correlated with number of fixations (r = −0.313; p < 0.001) and dwell time (r = −0.265; p < 0.001). However, only pupil diameter significantly correlated with NASA TLX (r = −0.186; p < 0.05). Between eye-gaze measures, dwell time significantly correlated with all three other eye-gaze measures, including number of fixations (r = 0.877; p < 0.001), fixation duration (0.458; p < 0.001), and pupil diameter (r = 0.173; p < 0.05) (Table 1).

Table 1. Correlation matrix of the five measures

Experimental effects on response accuracy and TLX total score were examined with the nonparametric Kruskal-Wallis rank sum test because their error residuals were not normally distributed.

The nonparametric test results confirmed the hypotheses in revealing that series-driven monitoring tasks significantly hindered response accuracy (χ2(1) = 31.864, p < 0.001, N = 168). Further, the nonparametric test also revealed that task load marginally decreased response accuracy (χ2(1) = 2.854, p = 0.091, N = 168) and significantly increased subjective workload (χ2(1) = 4.748, p = 0.029, N = 167) (Fig. 3).

Fig. 3.
figure 3

Mean and standard error plots of response accuracy (left) and subjective workload rating score (right) for each combination of task type and task load.

All eye-related measures were analyzed in two-way ANOVAs. The main effect of task type was also significant on the number of fixations (F(1, 159) = 110.634, p < 0.001) and dwell time (F(1, 159) = 49.117, p < 0.001). Similarly, the main effect of task load was significant on both number of fixations (F(1, 159) = 31.5963, p < 0.001) and dwell time (F(1, 159) = 14.320, p < 0.001). Furthermore, the interaction effect of task type and load was significant on both number of fixations (F(1, 159) = 11.997, p < 0.001) and dwell time (F(1, 159) = 6.162, p = 0.014). In other words, increased task load had significantly more impact for performing series-driven than target driven monitoring tasks. However, average duration per fixation and pupil diameter did not reveal any significant effect (Fig. 4).

Fig. 4.
figure 4

Mean and standard error plot of the number of fixations (left) and dwell time (right) for each combination of task type and task load.

4 Discussion

The significant main effect of task type and load on response accuracy and NASA TLX confirmed our hypotheses, indicating that the two experimental manipulations were effective at manipulating workload. Thus, we can confidently interpret the eye-gaze metrics with respect to the response accuracy and NASA TLX measures. The number of fixations on AOIs and dwell time on AOIs showed the same main effects as response accuracy and NASA TLX, indicating the sensitivity of these two eye-gaze measures to experimental manipulations. However, these two measures were not sensitive to subjective workload because they did not correlate with NASA TLX. Pupil diameter failed to reveal any significant effects but correlated with NASA TLX. In other words, pupil diameter was sensitive to subjective workload but not to the effect of task type and load. The average fixation duration per AOI did not appear to be a sensitive measure, failing to reveal any significant correlations and experimental effects.

The results of this experiment illustrate how careful consideration is needed in selecting eye-gaze metrics for indicating workload in monitoring process plants. None of the eye-gaze measures showed significance to both correlation with NASA TLX and experimental manipulations (i.e., task type and load), so there is no clear contender of an eye-tracking measure for indicating workload. (Dwell time and number of fixations only showed significant correlation with response accuracy.)

These eye-gaze results must be also be interpreted with respect to the monitoring tasks designed for this experiment. Specifically, there are more targets for the series-driven than target-driven task type, and for the high than low task load. Thus, the number of fixations may be higher inherently due to the task characteristic of more targets rather than higher mental workload. For this reason, dwell time on AOIs might be a more robust indicator than number of fixations because dwell time is bounded by the allotted time for the block (i.e., 3 min). In the context of this study, the issue on number of targets probably does not present a significant problem for two reasons. First, having more targets is intrinsically linked to the demand of the monitoring tasks, so the results should still be representative for monitoring process plants. Second, dwell time revealed the same experimental effects as NASA TLX, lending empirical support that the experimental manipulations affect dwell time and mental workload similarly.

Another notable result is the weak, positive correlation between dwell time and response accuracy, indicating that eye-gaze behaviors could contribute to task performance. Thus, dwell time might also offer modest and continuous indication of operator engagement with system operations.

The overall empirical results indicated that dwell time could be an effective alternative to NASA TLX as a workload indicator in the context of monitoring parameters prescribed by procedures. Dwell time can be collected in a less invasive manner than NASA TLX while providing continuous indication of workload and engagement. That is, NASA TLX requires interruption of the work tasks whereas the remote eye-tracker can continuously estimate dwell time without any interference. NASA TLX might simply reflect operator initial and final impression of the given task as opposed to their level of cognitive processing as they perform the given task. Once again, the generalization of the study results is limited to task load driven by number of targets.

This research represents the early effort to integrate the concept of adaptive automation into CPs. The results of this experiment highlight the potential of various eye-gaze measures as a continuous indicator of workload to support adaptive features in CPs for control room operators monitoring process plants. Valid and reliable eye-gaze metrics of workload can support continuous, unobtrusive assessment of workload as well as adaptive aiding for display design in the main control room. Future work can examine the use of regression-based machine learning methods on multiple eye-gaze measures to indicate workload while monitoring process plants (see [43]).