1 Introduction

Learning can sometimes be ineffective due to a training pace that is ill adjusted to individual learners. For complex tasks, reducing cognitive load by splitting the task into sub-components (fractionation) could be beneficial according to cognitive load theory. However, past research [1] has shown fractionation to be ineffective since learning time-sharing skills across sub-tasks is also important. Indeed, cognitive load theory suggests that cognitive load can be divided into three types of load: extraneous, intrinsic and germane, where germane load is associated with the construction and automation of schemas [2]. When learning, particularly complex tasks, the aim of part task training (PTT) is to allow more space for germane load, by reducing the intrinsic and extraneous load. A way to do it would be to split a complex task into sub-components. There are three different approaches to PTT: segmentation, fractionation and simplification. Segmentation consists in a partition on temporal or spatial dimensions. Fractionation can be used when sub-tasks must be performed simultaneously. Finally, simplification consists in decreasing the difficulty of the whole task by adjusting specific characteristics to it [3].

In addition, for segmentation and fractionation, a specific PTT approach can be addressed with different schedules, i.e. how sub-tasks are combined together. The most common schedules are pure (each sub-task in isolation, then all combined), progressive (two sub-tasks in isolation, then added together), and repetitive/cumulative (one sub-task added to the previous one) [3]. The choice of a schedule has important consequences on the learning process, since it affects the learning of time-sharing skills.

The results regarding the use of PTT are heterogeneous. Studies suggest that PTT was a beneficial method [4, 5], whereas others stated that PTT had no or limited positive effects on learning [1], or was only useful on memory dependent tasks [6]. These results highlight the necessity of considering the approaches separately, as well as taking schedule choices into account when evaluating the efficiency of a method. Moreover, depending on the task on which the experiment is conducted, the results obtained with a specific method will vary.

The present work investigates a cumulative part-task training method that builds up task complexity adaptively based on individual learner states. It is hypothesized that splitting a complex task into sub-tasks and creating a stepwise training session, starting with one sub-task and adaptively adding sub-tasks one-at-a-time, will improve performance in a full-task test. A successful implementation of this method has the potential to improve learning efficiency by reducing training time and/or increasing performance.

In cumulative part-task training (also referred to as repetitive part task training), identifying the optimal trigger for the addition of a sub-task is a key challenge. Indeed, an early trigger might overload the learner, while a late trigger might unnecessarily extend training time. By using two different dimensions, namely workload and performance, we hypothesise that it should be possible to create an integrative rule able to differentiate between an effective learning state and a state where further practice is required. In the present work, six potential rules associated with short-term trends in performance and workload are investigated to select the most promising triggering strategy for adaptive training progression.

Measuring learner progress is another key challenge when quantitative measurements of performance are unavailable. In such cases, physiological measurements might be used as proxy for mental workload and performance, a technique used in previous research during flight simulation [7, 8]. By using task independent metrics focused on bio-behavioural measurements, and by predicting changes in performance and workload, we aim to develop models usable in different training contexts, an approach previously used in different domains such as entertainment technologies [9], emergency management [10, 11] and aerospace [7, 8].

2 Method

2.1 The Space Fortress Game

The present experiment uses a research-oriented video game entitled “Space Fortress” [12] (Fig. 1), a type of serious game [13] which is a task originally developed by cognitive psychologists at the University of Illinois [14]. This task has shown transfer of training to real-world performance in aviation [15]. Wayne et al. [16] studied the acquisition of complex strategies, showing how to capture the explorations, trials, errors, and successes of the learning task in Space Fortress. The aim when playing Space Fortress is to score as many points as possible while controlling a spaceship in an environment where it can be attacked by different opponents (Fortress located at the center of the screen, and mines appearing at random locations). The main task can be split into four distinct sub-tasks that are described in Table 1.

Fig. 1.
figure 1

The Space Fortress game. The ship of the player can be seen in turquoise assaulting the fortress that is always at the center of the screen. A mine can also be seen moving toward the ship of the player. A “$” symbol can be seen and is part of the ammo management task. The score of the player, ammo, bonus points, and mine indications are displayed at the edge of the screen.

Table 1. Summary of each sub-task

2.2 Experimental Design

The experiment compares two conditions using a between-group design: (1) a full task (FT) condition and (2) a fixed four-step cumulative part-task training (CPT). There are three main types of measurements. First, the game score of each trial is recorded allowing performance comparisons between participants and conditions. Second, ocular, cardiac, and respiratory activity are recorded in order to develop models for inferring the mental state of the participants. Finally, the subjective mental state of participants is assessed using a questionnaire at the end of each trial.

2.3 Participants

Participants were recruited from Université Laval database of volunteers and mailing list. For the FT condition there were 36 participants aged 19 to 46 (M: 25, S: 7) of which 17 were women. For the CPT condition there were 30 participants aged 21 to 50 (M: 26, S: 7) of which 15 were women. The ACPT condition is ongoing with a target of 30 participants.

2.4 Material

Material for all the conditions is identical. It comprised of the game Space Fortress V5 [17], running on a Windows PC, played with a PC controller. A Zephyr BioHarness 3 chest strap, shown on Fig. 2, is used to record the participants’ cardiac activity and respiration rate with a raw sampling frequency of 250 Hz and 25 Hz respectively. It also contains a inertial measurement unit reporting acceleration and posture reported at 1 Hz A Tobii Pro Glasses II eye tracker, shown on Fig. 3, is also used to capture eye movements data. Those include gaze position, pupil size and blink occurrences that are sampled at a frequency of 50 Hz. A Thales developed software is used for synchronising signals and computing advanced features from raw bio-signals such as heart rate variability, eye fixations and spectral density. Logs from the game are used to derive the game metrics such as scores for the different sub-tasks and overall performance on each trial. Finally, participants had to rate after each trial the six statements below selected from previously validated questionnaires [18, 19] on a 5-point Likert scale [20] to record participants’ self-reports of engagement and workload.

Fig. 2.
figure 2

(available from BIOPAC.com)

Zephyr™ BioModule sensor and strap

Fig. 3.
figure 3

Tobii Pro Glasses II eye tracker

  • I was committed to my goals.

  • I have been concerned about achieving my goals.

  • It was important for me to perform at this task.

  • I have put a great deal of effort into this task.

  • I was overwhelmed by this task.

  • I was under-stimulated by this task.

2.5 Experiment Protocol

For all conditions, the duration of the experiment was approximately 2-h long. Upon arrival, the participants were given a brief overview of the project. The Zephyr BioHarness 3 was installed as well as the Tobii Pro Glasses II. Participants were then asked to read a tutorial to learn the controls as well as the mechanics of the different sub-tasks of the Space Fortress game. They then played four three-minute sessions to familiarize with each sub-task. Participants then completed 24 three-minute trials that differed in the following ways across the three conditions:

  1. 1.

    In the FT condition, participants completed 24 three-minute trials where all the sub-tasks were presented at once, hence they were playing with all four sub-task activated all 24 trials.

  2. 2.

    In the CPT condition, the tasks were cumulatively added at fixed trial numbers. The first 5 trials were comprised of only the navigation sub-task. The trials 6 to 10 were comprised of the navigation, and assault sub-tasks. The trials 11 to 15 were comprised of the navigation, assault and mine sweeping sub-task. The remaining trials were comprised of all four sub-tasks.

The final four trials for each condition are labeled as tests. Indeed, they are identical in all conditions and are used for evaluating learning outcomes.

3 Results

This section presents results from the FT and CPT conditions.

3.1 Learning Rate, Engagement and Workload

First, data was normalized to ensure that conditions could be compared against each other. Normalization is done by participants using the performance score value of the last four trials of the same sub-task group, i.e. the average of the last four trials of comparable score is zero and the standard deviation of the last four trials is unitary for each participant. This is done so that the learning rate can easily be compared between participant and sub-task groups.

A paired t-test on standardized task performance for trials 1–4 vs 20–24 showed a significant effect of the quantity of training sessions on performance in the FT condition (start-end score comparison not possible in the CPT condition), with an average improvement of 2.8 standard deviations, t (139) = −11.655, p < .001. Figure 4 shows the normalized score for the FT and CPT condition.

Fig. 4.
figure 4

Performance score for the FT condition and the CPT condition. The different colors represent different sub-task combinations. (Color figure online)

The CPT condition showed performance scores similar to that of the FT condition as shown in Fig. 5. An independent sample t-test on standardized performance showed that the CPT method (M: 0.064, S: 0.945) did not improve learning compared to the FT training method (M: 0.132, S: 0.908), t (254.66) = −0.595, n.s. It is important to note that it did not decrease performance either. There is therefore ample room for improving learning efficiency, as is expected to occur in the ACPT condition.

Fig. 5.
figure 5

Final performance scores comparison between the FT condition and the CPT condition.

Engagement (on a scale from 0–20) did not significantly differ across the FT (M: 16.73, S: 3.45) and CPT condition (M: 16.69, S: 3.26), t (1542) = 0.2277, n.s. Engagement for the final four trials did differ across the FT (M: 16.22, S: 3.98) and CPT condition (M: 17.60, S: 2.98), t (260) = −3.127, p < 0.01.

Workload (on a scale from 1–5) did differ across the FT (M: 2.97, S: 1.43) and CPT condition (M: 3.21, S: 1.31), t (1542) = −3.419, p < 0.001. Workload for the final four trials did also differ across the FT (M: 3.02, S: 1.32) and CPT condition (M: 3.81, S: 0.99), t (260) = −5.35, p < 0.001.

3.2 Trigger Rule Selection for the ACPT Condition

By reducing the number of trials in each sub-task group, we hypothesize that it should be possible to reduce overall training time to achieve a comparable performance level to the FT and CPT conditions. To this end, an adaptive training method should be able to detect when the participant has sufficiently learned a sub-task and is ready to move on to the next step cumulatively adding another sub-task. The proposition is that based on performance improvement and workload indication, it is possible to find an optimal trigger rule. Here we investigated a set of six trigger rules that might achieve that. These are, in order of complexity:

  1. 1.

    One trial with a stable or decreasing workload.

  2. 2.

    One trial with an increased performance.

  3. 3.

    Two successive trials with an increased performance.

  4. 4.

    Two successive trials with a stable or decreasing workload.

  5. 5.

    One trial with a stable or decreasing workload and an increase in performance.

  6. 6.

    Two successive trials with a stable or decreasing workload and an increase in performance.

To evaluate objectively the potential effectiveness of these rules, we computed the correlation between the (simulated) number of triggering occurrences for each rule for each participant with their final score on the CPT condition. This condition was chosen because it includes the same sub-tasks and therefore similar difficulty build-up, which will influence workload. The Pearson correlation and corresponding p value is shown in Table 2. As can be observed in the table, only one rule stands out statistically, that is rule number 5. As well as being statistically significant in relation to the final score, this rule stands out as being a middle ground between triggering too often or not often enough as compared to the other rules which occurs more or less often. Rule 5 was therefore selected as the triggering rule for the ACPT condition.

Table 2. Pearson correlation between triggering occurrences and final score order by the Pearson correlation

A simulation of the ACPT condition using the CPT results and the selected trigger rule can be done in order to estimate the training efficiency gain. While remaining a theoretical evaluation (participants still played all the trials instead of progressing early to the next sub-tasks group), the potential efficiency gain can nevertheless be estimated to hypothesize about expected gains in the ACPT condition. Indeed the number of trials that the participants would play can be computed. For the selected rule (5), the distribution of total played trials are presented in Fig. 6. The average number of completed trials is 13 out of a maximum of 20, as in the CPT condition and a minimum of 8 (2 for each sub-task group). There is therefore an average potential saving of seven trials. This represents the best-case scenario for this rule, as this supposes that the participants are attaining the same level of performance with fewer trials. The real ACPT condition will determine if the participant attained the same level of expertise (final score) as the FT and CPT condition with the lesser number of trials. If they do not exhibit the same level of expertise after their test trials, we will be able to determine at which point they reach the same level as the FT condition as the player will still play the remaining trials for a total of 24 trials. This will allow the computation of the learning efficiency gain of the method by finding the number of trials avoided for the same level of expertise.

Fig. 6.
figure 6

Distribution of participants by the number of trials played for the simulated ACPT condition according to the rule (5) “One trial with a stable or decreasing workload and an increase in performance”. The maximum possible number of played trials is 20 and the minimum possible number of trials is 8.

3.3 Physiological Measurement as Proxy for Performance and Workload

As a primary sign that at least some of the physiological measurements are indicative of performance, the 7 features with the highest correlation with the score are shown in Table 3. This table also shows in the same way the 7 features with the highest correlation with the reported workload. The following abbreviations are used: Heart Rate (HR), Heart Rate Variability (HRV), Amplitude (Ampl.), Standard Deviation (Std), Acceleration (Acc.). Features are computed on the full-length signal of each trial. Each feature names beginning with the Δ symbol signifies that this value is the difference between the current trial value and the reference resting state value. The HRV Short Window Power Band refers to the spectral power density between 0.05 Hz to 0.15 Hz for windows of 100 s of the Heart Rate signal. The Involuntary Fixation Ratio Long Window is an eye-movement derived feature that is the ratio of time spent under involuntary fixation over the last 60 s. Sagittal, lateral and velocity are features derived from the inertial measurement unit of the Zephyr BioHarness.

Table 3. Pearson correlation between features computed from physiological signals and the trial score ordered by the Pearson correlation

From this table, it can be observed that Heart Rate features appear to be more correlated with score than body movements and eye movements. Conversely, body and eye movements appear to be more correlated with reported workload intensity. While the correlations are low, they are still significant with p value mostly under 1%.

4 Discussion

In line with previous research [1], the CPT method did not show benefits (nor costs) compared to a baseline FT approach. Self-reported engagement did not change overall but slightly increased for the final four trials between the two FT and CPT condition. Workload increased between the FT and the CPT condition, both overall and for the last four trials. This suggests that the perceived workload was higher for the participant of the CPT condition. This is perhaps because they experienced lower workload in the early phase and therefore affected their reports of the latter, harder, trial workload.

Testing those two methods served as two control conditions to assess the impacts of the ACPT condition. Present results allowed selecting a potentially viable trigger rule for dynamic adaptation, namely a stable or decreasing workload and an increase in performance across two trials. This rule is supported by a positive correlation with learning outcomes (performance in the final test trials), and therefore expected to be successful in detecting the correct moment to trigger the next phase of the training procedure. The ongoing data collection for the ACPT condition will help test the hypothesis that this adaptive procedure may improve training efficiency (either increasing learning outcomes or accelerating the attainment of a same proficiency level). The expected result is that similar score to the FT and CPT condition will be attained in earlier trials, up to an average of seven trials early.

The observed correlations between physiological features and performance score as well as with workload shows promise for training models to detect in real-time performance and workload changes based on bio-behavioural signals. Since different features appear to be correlated with performance and workload, it seems that they may be capturing distinct information about learner state.

Learning retention has not been studied in this experiment. Indeed, each condition might influence learning retention differently over longer periods. Context is also important in adaptive training, while the task presented is this paper is highly controlled with no distractions, training in real-world scenarios might trigger adaptation at inopportune moments if context is not taken into account [21].

Future work includes a second ACPT condition (assuming the first one provides significant benefits), where the trigger rule is based on the output of models built on the bio-behavioural signals instead of the self-reported workload and game score. Indeed not all tasks lend themselves to performance measures and subjective workload ratings throughout a training session. A proxy for those measures based on bio-behavioural signals would therefore make the ACPT method useful for a larger set of training contexts. Inference models have been previously developed for assessing operator functional state [11] a concept that integrates individual human factor dimensions such as workload and stress to assess one’s ability to perform current tasks in a nominal fashion [22]. Means for assessing team states have also been proposed [23]. A learner functional state assessment model could thus be similarly useful in training contexts [24]. As such, the main expected impact of this work is to improve training efficiency in simulators and in the field, in avionics and possibly other related contexts requiring the development of skills and strategies to manage a complex mix of psychomotor, attentional and mnemonic subtasks [25]. Future work will further explore the use of multiple state dimensions, namely workload, performance, engagement and fatigue, to improve the next generation of adaptive training methods.