Keywords

1 Introduction

Each time a digital device, such as a smart phone or GPS, proactively provides information, it is competing for the user’s attention and possibly interrupting ongoing tasks. Interruptions occur when the user is forced to shift attention away from the primary task. However, interruptions can be detrimental to accomplishing a primary task. Interruptions could increase the time required to accomplish the primary task, cause more errors, and elicit increased feelings of stress and anxiety (Adamcyzk and Bailey 2004). In addition, several characteristics of interruptions have been shown to be disruptive, including how closely the interrupting and primary tasks are related (Cutrell et al. 2001) and how much one has control over the interruption engagement (McFarlane 2002). The results of this work can help inform a context-aware framework that can more appropriately provide information to users proactively, particularly focusing on the modalities of the task that the user is engaged in, and the modalities of the interruption.

2 Background

2.1 Context-Aware Computing

Context-aware computing might be a way to mitigate the effects of interruptions that decrease task performance. According to Wickens, a multiple resource framework can be used to assess a situation, internal and external to the operator, to evaluate the potential for interference in multi-task scenarios and even multi-modal scenarios (Wickens 2002). Such a framework can be embedded into a context-aware system to help decide how to present the information to the operator to allow for efficient use of resources since some information presented during an interruption may not be relevant to an operator’s current set of primary tasks. Instead, signals are filtered, categorized, prioritized, and subsequently acted upon. Context-aware computing might be a way to mitigate the effects of interruptions that decrease task performance.

Through this context-aware computing system, the most relevant information can be presented when appropriate, through the modality (visual, auditory, haptic) that least interferes with the primary task and across the most useful interface or presentation (Abowd et al. 1999). A driving situation is a common and relevant situation in which we can see the effects of these challenges. In order to design a system to meet the level of complexity required of a context-aware computing system, various factors must be understood such as (1) how much information (relevant and irrelevant, e.g., GPS directions to the destination vs. a text about making plans next week) can be processed while driving; (2) the speed at which new information can be processed; and (3) through what modalities or channels should information be presented in order to mitigate overload while simultaneously allowing for greater information handling. This will be further explored in the Discussion to help define the parameters of a context-aware framework based on the results of this experiment.

In terms of context-aware systems, the relevant context is equally as important as understanding the level of situational awareness the operator has developed both in that instance and over time through repeated exposure. Employing a context-aware computing system and framework to a driving scenario in particular means that interruptions can be correctly prioritized and handled by systems and operators, leading potentially to fewer accidents and better understanding of upcoming hazards (Alghamdi et al. 2012).

2.2 Multiple Resource Theory

Multiple resource theory (MRT), as defined by Wickens (1984, 2002) states that there are different pools of resources available that can be leveraged at the same time depending on the nature of the task. Issues occur when multiple tasks pull from the same pool of resources; performance can drop, time-to-completion for the tasks can extend, and less information may be processed. Further, as tasks become more difficult, performance will start to vary depending on the types of resources required to process and prioritize between the different tasks (Wickens 1984). As a result, if one modality is being utilized heavily, then presenting a signal across a different modality may result in better task performance.

The principles behind MRT suggest that input from haptic displays will not interfere with inputs from auditory or visual displays. The haptic modality has neither been incorporated nor studied as extensively as the visual and auditory modalities. The MRT model has been investigated mainly in the visual and auditory modalities and it is unclear whether the same principles apply to the haptic modality. However, Scerra and Brill (2016) did look at a primary counting task in the tactile modality and presented participants with a secondary task in the visual, auditory, and tactile modalities. They found evidence supporting the inclusion of the modality in MRT. In addition, Grane and Bengtsson (2011) found that a haptic interface reduced the visual load needed to enable effective multitasking and agreed that the Wickens’ MRT model held true. Therefore, it is likely that lower response times will occur when the interrupting modality is different from the modality of the primary task. For example, an interrupting haptic modality will have a lower response time when the primary task leverages the visual modality than when it leverages the haptic modality.

2.3 Interruptions While Operating a Vehicle

Interruptions while driving can lead to increased errors and impair the ability to safely operate a vehicle (Moray 1988). The reason is that driving is a complex activity that requires high attentional resources and interruptions can exceed the cognitive capacity of the decision maker.

As a way to reduce workload and increase the safety of the driver, there has been a recent surge of interest in automated driving. Automated driving is not necessarily the answer to decreasing errors and reaction times. Highly automated systems can initiate actions on their own but must notify drivers of those actions. Drivers must continue to monitor the environment and determine if and when certain situations call for driver takeover. This role still places emphasis on the driver processing mainly visual information. A driver’s situation awareness (SA) is a critical piece in interruption management, since attentional resources may need to be devoted elsewhere. Poorly designed warnings have the potential to disturb driving and distract driving (Fagerlonn 2010; Wiese and Lee 2004). For example, advanced driver assistance systems (ADAS) send audible sounds when parameters such as a driver’s speed exceeds a given threshold. Findings show that the abrupt onset of beeping startles drivers, causing them to take their foot off the accelerator and momentarily deviate from the correct trajectory within a lane (Biondi et al. 2014). As a result, there is a need to understand the best modality and time to present a notification so that a driver is not overloaded with too much information.

2.4 Hypotheses

To increase our understanding, we designed a study that specifically explores modality and response time. By understanding the effects of interruptions on response times, we can define logic for a supporting framework to better allocate incoming information and decide when and how to interrupt a user.

  • H1: Our first hypothesis is based on Wickens’ MRT where we believe that reaction time will be reduced for the haptic modality when the interrupting modality is visual or auditory.

  • H2: Although the primary tasks were spread across modalities, we hypothesize that visual interruptions would perform worse than the auditory and haptic modalities since the driving scenario was more visually taxing.

3 Methods

3.1 Design

Our experiment was a 3 × 3 within-subjects design. The within-subjects factors were the Primary Task (PT) with three modes (Visual, Auditory, Haptic) and Secondary Task (ST) with three modes (Visual, Auditory, Haptic). A power analysis indicated that a sample of 30 participants was sufficient to detect large effects on outcome measures with a probability of at least 0.80.

3.2 Participants

We recruited thirty-one participants (17 male, 14 female) from Craigslist. The average age of the participants was 27.8 years with a range from 19 to 42 years. The participants had an average of 67 months (5 years, 7 months) of driving experience, which ranged from 2 to 168 months.

3.3 Protocol

We chose a driving scenario for our experimental design. Driving is a dual-task situation since drivers have to concentrate on high levels of cognitive functions (route to destination) while paying attention to immediate concerns (avoiding pedestrians). In addition, driving is a real-time task, and will provide a realistic understanding of the time it takes to react to interrupting stimuli.

We designed the experiment to present stimuli that a driver would typically encounter while driving. During the experiment, participants watched a first-person video of someone driving, while they were presented with background stimuli that required low level attention: visually, participants were presented with a driving scene; in the background, participants heard music playing softly and street noise; and the haptic stimuli consisted of gentle, constant vibrations beneath the seat pan of a participant’s chair.

The stimuli for the PTs were designed to engage the participants. The primary visual task (PT-V) was a vigilance task where participants had to detect numeric information from road signs. The primary auditory task (PT-A) presented numeric information through spoken GPS guidance (e.g., “Turn left at Main street and drive point five miles.”). The primary haptic task (PT-H) consisted of receiving 3 or 5 bursts of vibro-tactile stimuli over 3 s on the left wrist and providing feedback about what participants felt.

The STs were issued during the presentation of the PTs. The secondary task visual (ST-V) stimuli were green numbers that turned red and then back to green after one second in the lower left corner of the computer screen. The secondary auditory task (ST-A) was a 1-second beep sound. The secondary haptic task (ST-H) presented a 1-second vibro-tactile stimulus on the right wrist. For every PT and ST presentation, participants were instructed to press pre-assigned keys to indicate what they detected.

PT-A was different from the other PTs because the numeric information was presented after the ST was presented. Although this was originally a concern, there were no significant differences in response times (RTs) between PT-A and PT-V/H (Fig. 1).

Fig. 1.
figure 1

In this example screenshot, a green number is in the periphery (called out by the arrow here) while participants drove through the scenario (Color figure online).

3.4 Materials

For both training and main experiment videos, participants “drove” through similar city settings. Ambient visual and auditory signals emitted from the video clip, and haptic vibrations were provided using a vibration actuator attached beneath the seat pan to simulate the vibration of a vehicle. Headphones were used for the audio task and covered the entire ear but did not occlude noise unrelated to the experiment. Vibration actuators were attached to both wrists.

Participants were instructed to press pre-assigned keys on a computer keyboard for each of the PTs and STs for each modality (6 keys total). The single experimental video presented all combinations of visual, haptic and audio for the primary and secondary tasks. After the experiment, participants were asked to rate the difficulty on a 5-point Likert scale (1: very easy, 3: normal, 5: very hard) for each of the STs for a given PTs (STs × PTs) and provide reasons.

3.5 Procedures

Participants signed a consent form when they first arrived for the study. They received experimental instructions for the driving simulator and experimental task. Once participants understood the instructions, they underwent a training session where one-third of the PTs were interrupted by STs with one stimuli combination for each modality (visual, auditory, and haptic) was presented (9 PT × ST combinations). Upon completion of the training, participants were given the opportunity to ask questions before starting the main experiment. In the main experiment, participants were presented with 12 primary tasks in each modality for a total of 36 total PTs. All participants were given a questionnaire after completing the main experiment.

3.6 Measures

We collected data on response time to the stimuli (i.e., elapsed time between presentation and when the participant presses the correct key), errors, misses, and subjective rating during the study. Errors measured whether the participant incorrectly matched the assigned keys to stimuli. Misses were recorded when the subject did not provide an answer to an ST. Although the participant was not given a set amount of time to answer, stimuli were presented quickly and the participant’s attention could have been directed towards pressing a key for the next stimulus presentation. A qualitative subjective rating was collected during the questionnaire at the end, which rated the difficulty of ST when PT was presented.

4 Results

Three Repeated Measures Anovas (RMANOVAs) were performed. A 3 (PT-V, PT-H, PT-A) × 3 (ST-V, ST-H, ST-A) RMANOVA assessed differences in the response times (RTs) of the secondary task for each primary task that was interrupted and primary task response times without interruptions. Significance for tests involving a repeating factor used Huynh-Feldt corrections for degrees of freedom with an alpha level of .05 and a p value less than .05. The PTs that were interrupted were significant, F(1.71, 13.66) = 12.2, partial η2 = .604. The PTs that were not interrupted were significant as well, F(1.50, 43.58) = 52.13, partial η2 = .20. The group means are plotted in Fig. 2 and means and standard deviations are presented in Table 1. Response times for the primary tasks are affected by the secondary task interruption. However, results of the primary task without interruption suggest that the primary tasks have varying response times depending on modality.

Fig. 2.
figure 2

Response time means for PT × ST combinations and PT only

Table 1. Stimuli ordered from fastest average response time to slowest average response time excluding the misses along with subjective ratings from the questionnaire.

The PT × ST interaction was significant for secondary task F(3.83, 114.86) = 8.18, partial η2 = .214. Response times for the secondary task were significantly different depending on which primary task was interrupted, indicating that secondary task response time is dependent upon the primary task modality.

The results show that primary tasks are affected by the secondary tasks. Subjective ratings indicate that participants found it difficult to detect ST-V compared to the other modalities (1 = very easy, 5 = very hard).

The number of misses and errors for PT × ST combinations and PT only are presented in Table 2. In general, participants missed more of the visual tasks than the other two modalities.

Table 2. Number of total misses and response errors across participants (N = 31).

5 Discussion

Context-aware computing systems present an opportunity for leveraging multiple complex factors during complex tasks like driving. Driving is an example of a large class of use cases that involves cognitive functioning, situational awareness, and immediate concerns. Understanding situational awareness and interruptibility across and within modalities is a step forward to understanding how to leverage task and user information to improve user performance.

5.1 Hypotheses and Results

We did not find support for our first hypothesis that response times would be lower for the haptic modality when the interrupting modality was different from the primary task. However, visual and auditory modes did align with the MRT principles. Both were faster for the incongruent stimuli, rather than the congruent stimuli. We did not find that MRT applies to haptic signals in this experiment, since response time was the fastest for the PT-H ST-A combination but then PT-H ST-H was the second fastest. This was surprising since Scerra and Brill (2012) found that participants performed significantly worse in tactile-tactile dual task conditions.

The ST-H combinations were the fastest for PT-A and PT-V. One explanation for these results could come from Van Erp and Van Veen (2004). They found that drivers may benefit from haptic information, however haptic vibrations primarily provided on/off information since more complex information is difficult to convey through haptic signals. The same could be true in our study since participants did not have to remember how many buzzes they felt, merely that haptic signals were present. The finding is important because presenting interrupting haptic signals should be further investigated with respect to Wickens’ MRT. We cannot incorporate multi-modal displays containing haptics without understanding the impact of haptics on visual and auditory primary tasks.

The ST-V combinations were the slowest with participants reporting that they had more difficulty detecting ST-V compared to all other PTs and STs except for PT-A. This supports the hypothesis that visual interruptions would produce slower response times than auditory and haptic modalities since driving is primarily a visual modality. Saccadic suppression could help explain that the participant’s main focus was on the ongoing PT-V task, temporarily rendering them blind to other changes in the visual field (Peterson and Dugas 1972; Bridgeman et al. 1975; Burr and Ross 1982). In addition, when looking at the primary task only, the visual task had the slowest response time compared to the other two tasks. As a result, attentional resources may not have been available to notice a one second change occurring on the edge of the screen while focusing on driving.

5.2 Interruptions and Design Implications

Research has shown that distracted drivers experience “inattention blindness” where their field of view narrows (Maples et al. 2008), and they tend to look at, but not necessarily register the information in their driving environment (Strayer 2007), resulting in missing visual cues that are important for safe driving (Jacobson and Goston 2010). Inattention blindness could help explain why the ST-V was largely missed across modalities and was responded to the slowest. However, there is the possibility that since the focus was on looking for a numerical road sign, goal-directed attention and attentional priority could have been directed to a certain area on the screen, far enough away from the secondary visual stimulus to create a delay in processing (Egeth and Yantis 1997).

This finding supports research that says that even reading a text while driving is detrimental for driving (Drews et al. 2009; Hoffman et al. 2005). Based on these findings and results from the present study, we believe that autonomous cars should not warn drivers of complex decisions in a visual format. Highly automated driving allows the driver to take over at any time but especially in emergency situations; therefore drivers still need to pay attention to their environment. The findings from our study can be applied to determining how, when, and which modality information should be presented, depending on the situation, importance of the information, driver state, etc.

5.3 Multiple Resource Theory

A recent study found that presenting redundant, multi-modal signals to drivers had a positive influence on response time, with little added frustration or other negative effects (Biondi et al. 2017). In fact, they found that multi-modal presentation (auditory and haptic) at the same time resulted in faster brake and response times for drivers than in using auditory or haptic warnings individually. The more a context-aware system is able to adapt to different requirements based on driver expertise and experience, in addition to the physical and time constraints within different environments, the better it will be able to support future drivers. For example, results from Table 1 indicate that participants found the haptic modality easy to detect, however it is more difficult to use it to convey rich information than in the visual and auditory modalities.

5.4 A Context-Aware Framework

In the context of the growing challenge of information overload, we propose a theoretical framework which describes how context-aware computing technology can be strategically combined with multi-modal displays in order to provide users with the information they need, when they need it, and in a way in which they can utilize it to make decisions. In the case of a driver, the context framework would work to ensure that information from auxiliary digital devices, such as a smartphone or GPS device, is presented at the appropriate time. Results from our study indicate that there is an interaction between primary task modality and secondary (interruptive) task modality with respect to reaction time. As a result, a supporting context-aware framework should account for the modes in which a user is currently engaged (PT) and the proposed modes through which the system is considering interrupting the user (ST) when determining the timing of the interruption or the presentation of additional information.

The details of an interruptibility algorithm are a topic that merits further investigation; however, the current results do lead to some working hypotheses. Because participants in many cases failed to acknowledge ST-Vs across all three primary task modalities, interrupting a user with a visual cue should be a low probability event. Algorithmically, the effectiveness of a visual interruption should have a very low weight compared to interruptions in other modalities. As a result, the calculated cost to the user of an interruption in the visual modality should be high. This means that if the interruption is urgent, the suitability of an auditory or haptic modality should be considered. Additionally, if the information is only well-suited to the visual modality, the cost of delaying the information presentation until a time when it is not interrupting an existing PT may be lower than when compared to the cost of interrupting that task. The parameters of these cost and delay variables are a topic of future study.

Figure 3 is an example of the response time pairings informing the first-order rules for a context-aware model. This model will start to inform a framework which will come together from individual framework pieces. These guidelines serve as groundwork for developing the framework to better characterize and quantify the costs and benefits of signal combinations.

Fig. 3.
figure 3

Average response times for given interruptions (secondary tasks) across primary tasks. Lower response times indicate better pairings for information presentation.

5.5 Limitations

This study was able to explore interruptions across and between modalities. While major factors contributing to the effect of the interruption on the response time were controlled for to the best of our ability, a few limitations provide opportunities for further exploration. A driving simulation has several challenges including limited physical, perceptual, and behavioral fidelity (Evans 2004), which limits high levels of experimental control. Exploring response time in a higher-fidelity driving simulation, or in a real-world driving task may alter the effect of the signals since we may find slightly different results when the scenario is more realistic.

Another limitation may have been the placement of the green number on the bottom left of the screen. Perhaps a number that flashed in the middle of the screen would have been more salient than numbers in the lower left corner. This could increase the performance of ST-V but distract from PT-V.

Finally, this study was limited to a single interruption, categorized as a secondary task. We did not explore the effects of the importance of information, which may shift a secondary task to a primary task. The experiment could also be extended to multiple interrupting signals to assess their combined impact on response time.

5.6 Summary

The present study investigated the effects of visual, auditory, and haptic interruptions during a driving scenario. Haptic interruptions need to be further studied with regards to Multiple Resource Theory. Participants responded to the ST-V interruptions the slowest for PT-H and PT-V which suggests that interruptions during driving should not be presented visually since driving is mostly a visual task. These results inform components of a larger context-aware computing system for the purpose of distributing oncoming signals across modalities during the performance of complex tasks, such as driving.