Humans are remarkably good at pursuing complex goals that require frequently alternating between the tasks necessary for goal achievement and rapidly exchanging the cognitive representations of the tasks to be performed (e.g., task sets; Rogers & Monsell, 1995) in working memory (WM; Oberauer, Souza, Druey, & Gade, 2013). However, changing task sets and thereby ensuring successful goal achievement comes at a cost (i.e., higher reaction times [RTs] and larger error percentages [PE] for sequences of type BA than for sequences of type AA, with A and B standing for different tasks). This task-switch cost have been successfully investigated in research employing the task-switching paradigm, in which participants have to flexibly shift between two simple cognitive classification tasks, indicated by a task cue (Jost, De Baene, Koch, & Brass, 2013; Kiesel et al., 2010; Koch, Poljac, Müller, & Kiesel, 2018; Vandierendonck, Liefooghe, & Verbruggen, 2010).

Whereas some authors assume that a time-costly reconfiguration of the cognitive system occurs in the case of a task switch (Monsell, 2003; Rogers & Monsell, 1995; Vandierendonck et al., 2010), other researchers interpret the observed task-switch cost not as indicative of executive reconfiguration processes but as a reflection of episodic-memory traces of the just-performed task that interfere with establishing the new task set. This account has been termed the task-set inertia (TSI) account and was proposed by Allport, Styles, and Hsieh (1994; see also Waszak, Hommel, & Allport, 2003). These two opposing views (i.e., control processes for task-set reconfiguration vs. memory inertia and proactive interference) have been debated, and reviews agree that both types of processes are involved, but to different degrees (see Kiesel et al., 2010; Vandierendonck et al., 2010).

One reason for the assumption that executive control processes are involved in task switching through the triggering of reconfiguration processes is the widespread idea that tasks are represented verbally (Gruber & Goschke, 2004), and the involvement of language is assumed to go along with more controlled processing (Chrysikou, Weber, & Thompson-Schill, 2014). Following this assumption, it has been proposed that people self-instruct before starting a task (Goschke, 2000). This ability to rely on language as a tool to guide performance and fulfill goal-directed actions has been a subject of great interest in developmental psychology (Luria, 1961; Vygotskij, 1962). Importantly, beneficial effects of language use to aid performance have been found to help performance not only in children and older people (Kray, Eber, & Karbach, 2008) but also in student populations (Baddeley, Chincotta, & Adlam, 2001; Emerson & Miyake, 2003; Goschke, 2000; Miyake, Emerson, Padilla, & Ahn, 2004). In the above-mentioned studies, participants were asked to switch between two simple cognitive tasks (i.e., color and shape classification of a stimulus) that were explicitly cued (i.e., by arbitrary shapes, such as a triangle for color classification) or followed a preinstructed sequence (e.g., AABBAA). When self-instruction is rendered difficult because of a secondary articulatory suppression task (i.e., saying the days of the week repeatedly), performance dropped, as indicated by an overall increase in RTs and, most importantly, an elevated switch cost (see, e.g., Baddeley et al., 2001; Emerson & Miyake, 2003; Goschke, 2000; Saeki & Saito, 2004a). The increase in switch cost under articulatory suppression has been taken as evidence for the elimination of the beneficial effect of language in retrieving and implementing the now-relevant task set in task switching (Mayr, Kleffner-Canucci, Kikumoto, & Redford, 2014). In addition, it has been argued that language is involved in the processes necessary for cue interpretation, such as through reducing interference by translating arbitrary cues into task names (Gade & Koch, 2014; Gade & Steinhauser, 2018; Miyake et al., 2004), and as such reflects higher-order cognitive control.

However, at a second glance, the role of language in task-switching performance is not as unequivocal as it seems. When participants use preinstructed task sequences, language seems to aid mostly in keeping track of the task sequence (Bryck & Mayr, 2005; Saeki & Saito, 2004a) and retrieving the now-relevant task set (Mayr et al., 2014). As regards cued task sequences, some studies (Miyake et al., 2004) have revealed an increased switch cost for arbitrary cues (i.e., letters), but not for task name cues, further corroborating the idea that retrieving task labels aids the successful implementation of a now-relevant task (but see Saeki & Saito, 2009). However, these results have not been found in all experiments (see, e.g., Saeki & Saito, 2004a, 2009).

A study by Liefooghe, Vandierendonck, Muyllaert, Verbruggen, and Vanneste (2005) even showed a reduced (rather than an increased) switch cost. This reduction in switch cost was observed irrespective of cue type. In their study, Liefooghe and colleagues used specific, highly transparent cues, employing the same color and shapes as both cues and stimuli, as well as nonspecific cues (i.e., new colors/shapes as cues) that required a translation process to retrieve the indicated task. Most importantly, the reduction in switch cost in Liefooghe and colleagues’ study was brought about by increased RTs in repetition trials. Therefore, these authors concluded that articulatory suppression interferes with maintenance of the relevant task set in the phonological loop of the WM model of Baddeley (1986), thereby producing a benefit for repetition of the task set. Please note that this interpretation is not unambiguous, because one could also argue that, aside from maintenance processes in WM, a reduced ability to form episodic-memory traces in a given trial would also produce a benefit in a subsequent task repetition. This points to a second prominent theory commonly evoked to explain performance costs in the case of task switch—namely, lingering episodic memory bindings that are beneficial in the case of a task repetition but harmful when the task switches (Allport et al., 1994; Allport & Wylie, 1999; Koch & Allport, 2006; Koch, Frings, & Schuch, 2018).

Aside from such ambiguous results, summarized in Table 1, it occurred to us that, since articulatory suppression and rhythmic foot tapping have typically been deployed as additional task requirements in the studies to date, the impact of using nondynamic secondary tasks (e.g., requiring only static muscle involvement) has not been investigated. Furthermore, although most studies have revealed a larger decrement in performance with articulatory suppression than with foot tapping, those differences could also have been due to the different degrees of complexity of a simple repetitive movements (e.g., tapping the foot in synchrony with a metronome) versus a more complex articulation that had to follow a sequence (e.g., reciting the days of the week, in the study by Baddeley et al., 2001).

Table 1 Overview of published studies employing a secondary task in addition to a task-switching paradigm

It is therefore remarkable that only those studies in which complexity was equated between foot-tapping and articulatory suppression conditions—namely, by asking participants to articulate only a single word (Bryck & Mayr, 2005; Liefooghe et al., 2005)—did not show the predicted increase in switch cost. In addition, all of the secondary tasks involved had a dynamic component, whereas it is possible that simply performing a secondary task in addition to a task-switching paradigm may alter performance, because of the need to keep in mind several task sets, even if one of the tasks is fairly simple.

In our study, first we examined the degree to which comparable complexity of the to-be-performed secondary task affects task-switching performance for both vocal and manual secondary tasks. Second, we were interested in the degree to which the interference brought about by the additional demands of a secondary task is related to its dynamic nature. To investigate these questions, we asked participants to perform repeated movements of their hands or to repeatedly utter a nonword. We compared those dynamic conditions to a static motor task (i.e., pressing down a number of keys or holding a spattle between the teeth) that involved the same effectors, yet had no dynamic component. Finally, we contrasted two specific predictions that allowed us to advocate the task-set reconfiguration account versus the episodic-memory account of switch costs. Whereas the reconfiguration account predicted increased switch costs, because secondary-task demands would interfere with implementation of the currently relevant task set in WM, the episodic-memory account predicted reduced switch costs, because task repetitions would no longer be beneficial if memory traces could not be properly formed in the preceding trial.

To accomplish the goals outlined above, we designed a primary task requirement that consisted of two simple digit-classification tasks (judging magnitude and parity), as indicated by a cue. In parallel to this primary task requirement, participants were asked to perform one of four different secondary tasks. These tasks differed in their dynamics (dynamic vs. static) as well as in the effector system involved (manual vs. oral). Responses to our primary task were given using two custom-made foot pedals. For the dynamic secondary tasks, we asked participants to utter a German nonword (“duerb”) in synchrony with a metronome or to perform a gesture with both hands in a regular movement; these tasks were thus comparable in complexity to those in most of the published studies investigating the impact of secondary tasks on task-switch costs. Furthermore, we added two static secondary tasks. These static secondary tasks involved holding a marble that was taped to a wooden, sterile spattle in the mouth at a constant angle (0°), or pressing down nine keys on a keyboard (including the space bar with both thumbs). We opted for these static tasks in order to be able to compare the impact of articulation or repeated sequential movements (i.e., dynamic secondary tasks) to that from largely static occupation of the two effector systems only.

Experiment 1

Method

Participants

Twenty-four participants (19 women, five men; mean age = 22.9 years), all students from the Catholic University of Eichstaett Ingolstadt, received either course credit or €8 for participation. Twenty-three of the participants were right-handed, and all reported normal or corrected-to-normal vision.

Apparatus, stimuli, and tasks

The experiment was conducted in a single session with one participant at a time. Participants sat in front of a computer screen (15 in.) connected to an IBM-compatible PC at a viewing distance of approximately 50 cm. The stimuli were the digits 1 to 9, excluding 5, and measured approximately 1 cm in height and 0.5 cm in width.

Participants had to switch between two digit-classification tasks (denoted as the primary task requirement). They had to judge a digit either as being larger or smaller than 5 or as being odd or even. The task they had to perform was indicated by a geometrical shape. A triangle indicated the magnitude task, and a square indicated the parity task. The cues were about 3.5 cm in diameter and surrounded the target digits, which were presented centrally on the screen. Participants responded on two custom-made external pedals with their left and right feet. The external keys measured 6×6 cm and were separated by 24 cm. All four stimulus–response mappings were used, counterbalanced across participants.

Participants were asked to perform, as secondary tasks, either the two static motor tasks or the two dynamic tasks. For the first static task, we asked our participants to press down nine keys on the keyboard arranged in an ergonomically sensible way (the second row of a standard German QWERTZ keyboard). For our second static task, every participant received a new, wooden, sterile spattle upon which a marble was fixed with tape, and they had to keep this spattle straight between teeth and lips. The marble was fixed to the spattle so as not to make the task too hard. The marble had a diameter of 1 cm and a weight of 6 g. For the dynamic tasks, we asked participants to perform a nonsense gesture (i.e., thumbs and middle fingers touching each other and hands cycling around each other) continuously, whereas in the second dynamic task participants were asked to utter the German nonword “duerb” in pace with a metronome set at 60 bpm. Both dynamic tasks were thus performed repeatedly, roughly once per second. Additionally, all secondary tasks were monitored by the experimenterFootnote 1 (see the Procedure section).

Procedure

The experiment was run in a single session with one participant at a time. The instructions were presented on the screen in written form and also explained orally. The experimenter answered additional questions when required. Participants were told about the additional tasks that had to be performed in the experiment along with the primary task digit-judgment tasks. A picture with the relevant stimulus–response mapping for the digit-judgment tasks was placed below the monitor, and these tasks were explained first. The instructions encouraged both speed and accuracy. Cues were presented 500 ms prior to the target (cue–target interval, CTI) and remained on the screen until a response had been given or until 5 s had elapsed. After 500 ms the stimulus was presented at the center of the screen. Once a response had been given, a blank screen was shown for 500 ms before the next trial started (response–cue interval [RCI]). In the case of an error, the German word Fehler (“error”) appeared and remained on the screen for 500 ms. After each block, participants received feedback about their mean RT and error rate and were offered a little rest.

After completing the first two blocks without any secondary task requirement (thus, performing the digit-judgment tasks only), participants started by practicing the first secondary task. Whereas the digit-judgment tasks remained the same and did not change over the course of the experiment, this did not hold true for the secondary tasks. These were practiced right before their use, up to the point at which smooth performance had been reached (i.e., ten successful repetitions without pausing in case of the dynamic secondary tasks). In the case of a secondary task error (i.e., pausing or losing pace with the metronome, as indicated by the experimenter), participants were reminded to perform the secondary task continuously. Next, participants carried out two blocks combining the primary digit-judgment tasks with the practiced secondary task. Feedback on mean RTs and PEs was given after each block, and a short rest was once again offered. The order of the four secondary-task conditions was counterbalanced across participants. After performing eight blocks, two for each secondary task, participants performed a last block without any secondary task, in order to control for practice effects later on in the analysis. For the analysis we averaged across the performance in the second block and the last block, to control for learning effects during the course of the experiment. Overall, the experiment was completed in less than 60 min for 11 blocks of 72 trials.

Design

The independent within-participants’ variables were task transition (switch vs. repetition trials) and secondary-task condition (control without any secondary task, dynamic secondary task with manual effectors [gesture], dynamic secondary task with oral effector [nonword], static secondary task with manual effectors [keys], or static secondary task with oral effectors [spattle]). RT and PE were the dependent variables. PEs were arcsine-transformed (Steinhauser & Gade, 2015), and analyses were performed using R (version 3.4.1; R Core Team, 2007) and the afex package (Singmann, Bolker, & Westfall, 2015).

Results

The data were trimmed by excluding all errors in the primary (7.8%) and secondary (0.1%) tasks, all trials following a primary task error, and the first trial in each block, since this trial could not be classified as either a switch or a repetition trial. We also excluded the first block, as practice. Furthermore, we removed direct stimulus repetitions from the analysis (12.8%), to control for priming effects. Next we excluded RTs that exceeded three standard deviations from the mean in each cell of the analysis. Overall, we removed 27.6% of the collected trials after the first block had been excluded. Overall, at least 54 trials per participant and cell entered the analysis. First, we conducted a 2 (Task Transition: switch vs. repetition) × 5 (Secondary-Task Condition: control vs. dynamic secondary task with manual effectors vs. dynamic secondary task with oral effectors vs. static secondary task with manual effectors vs. static secondary task with oral effectors) repeated measures analysis of variance (ANOVA). The degrees of freedom were Greenhouse–Geisser-corrected; rounded degrees of freedom will be reported, for readability. We found a main effect of task transition, F(1, 23) = 26.85, p < .001, ηp2 = .54, reflected in an overall switch cost of 51 ms. We also found a main effect of secondary-task condition, F(3, 61) = 8.77, p = .0001, ηp2 = .28.

To account for this influence of secondary task on mean RT, we contrasted the control condition to each of the secondary-task conditions: Participants were slowed in their mean RT for the dynamic secondary task with manual effectors (i.e., gesture; 54 ms), t(23) = 2.31, p = .028, d = 0.47, and were not impaired (3 ms) for the dynamic secondary task with oral effectors (i.e., nonword), t(23) = 0.15, p = .87, d = 0.03. In the condition in which participants had to press down the keys on the keyboard (static secondary task with manual effectors), they even speeded up by 46 ms relative to the control condition, t(23) = 3.02, p < .001, d = 0.62. Although this pattern of a secondary-task benefit was similar for the static secondary task with oral effectors (i.e., with the marble; – 19 ms), the latter effect was not significant, t(23) = 1.44, p = .16, d = 0.29.

Most importantly, the switch cost was modulated by secondary task, as indicated by an interaction, F(4, 86) = 4.30, p = .004, ηp2 = .16, reflecting a smaller switch cost for the secondary-task conditions than for the control condition (see Fig. 1). To follow up this interaction, we ran 2×2 ANOVAs in which we compared the control condition to each secondary task condition individually. A reduced switch cost relative to the control condition was found for the dynamic condition in which the gesture had to be performed repeatedly, F(1, 23) = 11.37, p = .002, ηp2 = .33, for the interaction. Likewise, a reduced switch cost was also found when comparing the dynamic nonword condition to the control condition, F(1, 23) = 12.62, p = .002, ηp2 = .35, for the interaction. Finally, a significant reduction of switch cost occurred for the static secondary task condition in which participants pressed down the keys on the keyboard, F(1, 23) = 5.66, p = .03, ηp2 = .20, whereas only a numerical trend toward a reduction of switch cost when participant had to balance the marble on the spattle (i.e., static secondary task with oral effectors), F(1, 23) = 2.53, p = .13, ηp2 = .10 (see Table 2).

Fig. 1
figure 1

Reaction time (RT) data for task transition and secondary-task condition in Experiment 1. Error bars denote 95% within-participants confidence intervals (Baguley, 2012; Cousineau, 2007)

Table 2 Mean RT (SD) in milliseconds and error percentage (SD) for secondary task condition and task transitions in Experiment 1

In post-hoc t tests, we found a switch cost of 77 ms for the control condition, t(23) = 6.78, p < .001, d = 1.4, whereas this cost was reduced to 36 ms, t(23) = 3.05, p = .006, d = 0.62, for the manual dynamic secondary-task condition (i.e., performing the gesture), yet still showing a significant switch cost. A reduction of switch cost relative to that in the control condition was observed for the dynamic secondary-task condition with oral effectors (i.e., uttering the nonword), 37 ms, which was still significant, t(23) = 2.68, p = .013, d = 0.55. In the two static secondary-task conditions, we observed a switch cost of 50 ms, t(23) = 4.78, p < .001, d = 0.97, for the condition in which participants had to press down the keys of the keyboard, and a switch cost of 58 ms, t(23) = 4.20, p < .001, d = 0.86, for the condition with the spattle and marble.

Moreover, confirming the evolving picture of a stronger impact of dynamic secondary tasks, we also ran a post-hoc ANOVA to explore the differential impacts of dynamic and static secondary-task conditions. We therefore grouped the two dynamic tasks as well as the two static tasks and analyzed their effects on switch costs in a 2×2 (Dynamics of Secondary Task × Task Transition) ANOVA. The switch cost in the dynamic secondary-task conditions was significantly smaller (36 ms) than those in the static secondary-task conditions (54 ms), as indicated by the interaction, F(1, 23) = 5.15, p = .03, ηp2 = .18.

To assess the influence of the secondary-task effectors (i.e., manual vs. oral), we compared the two tasks involving manual effectors (gesture and key-press conditions) to the two tasks involving oral effectors (nonword and spattle conditions) using a repeated measures ANOVA with transition (switch vs. repetition) and secondary-task effectors (manual vs. oral) as two within-participants factors. We found an overall effect of secondary-task effectors, F(1, 23) = 11.22, p = .003, ηp2 = .33. Post-hoc t test showed that participants were overall 37 ms slower with secondary tasks involving the manual effectors than with those involving the oral effectors, t(23) = 3.37, p = .003, d = 0.68. However, the secondary-task effector did not modulate the switch cost: F(1, 23) = 0.07, p = .80, ηp2 = .03, for the interaction.

In a final analysis, we tested whether the reduction in switch cost was related to differences in the mean RT level, since interference effects such as switch cost have been shown to be sensitive to the mean RT level (Steinhauser & Hübner, 2008, 2009). That is, the observed reduction in switch cost is often canceled out when controlling for mean RT differences. To compute the proportional switch cost, we divided participants’ mean task-switch RTs for each control and secondary-task condition by the mean task-repetition RT in the very same cell and submitted those to paired t tests, to assess differences in switch cost relative to the control condition. The switch cost was significantly reduced for the gesture condition, t(23) = 4.14, p < .001, d = 0.85, as well as for the nonword condition, t(23) = 4.32, p < .001, d = 0.88. No significant switch-cost reduction was observed for the spattle condition, t(23) = 1.65, p = .11, d = 0.34. However, the condition with the keys did yield a significant reduction in switch cost, t(23) = 2.38, p = .025, d = 0.49. When comparing the proportional switch cost for active and passive conditions, we replicated the smaller switch cost for dynamic than for static secondary-task conditions, t(23) = 2.97, p = .007, d = 0.62. As in the mean RT data, no difference in secondary-task effectors was observed for proportional switch cost, t(23) = 0.10, p = .92, d = 0.02.

Overall, participants committed 7.8% errors in the primary task. For the arcsine-transformed error data, the same 2×5 ANOVA as had been performed on the RT data yielded main effects of task transition, F(1, 23) = 31.60, p < .001, ηp2 = .58, and secondary-task condition, F(4, 92) = 17.18, p < .001, ηp2 = .43. Participants exhibited an error switch cost of 2.9% overall. The second main effect, of secondary-task condition, reflects worse performance in the dynamic secondary-task conditions (10.3%) than in the other conditions (6.5%). The interaction between task transition and secondary-task condition was not significant, F(4, 92) = 0.75, p = .54, ηp2 = .03; see Table 2.

With respect to secondary-task performance, we found that overall participants’ performance in each secondary task was at ceiling (less than 1% erroneous responses). As we noted, the experimenter was advised to start the secondary-task blocks only once participants were able to keep pace with the metronome for the nonword condition and to perform continuously in the gesture condition. For the nonword condition, we recorded participants’ utterances using Audacity and submitted the obtained files to the R package VoiceExperiment (Nett, 2017) in order to extract metronome and speech onsets. Unfortunately, due to a malfunction, the microphone-extracted data mainly mirrored the metronome, and the utterances were not distinguishable from noise for the program. However, listening to the sound files suggested a high degree of coupling among the metronome and the utterance of all participants.

Discussion

Overall, we found that secondary tasks reduced switch cost, especially when a dynamic secondary task was performed in parallel with a primary task requiring switches between digit classifications. This reduction was also found when proportional switch cost were analyzed, to account for mean RT differences due to secondary-task demands. However, a potential drawback of our first experiment was that participants performed the control condition without any secondary task in the beginning and in the very last block of the experiment, and it is thus conceivable that the reduction in switch cost with the secondary tasks was brought about by prolonged training with the primary task. To rule out this alternative explanation, we ran a second experiment in which we counterbalanced the order of the control and secondary-task conditions across participants. Furthermore, we increased the numbers of secondary-task as well as of control blocks to three, and we tested two groups of 18 participants for each secondary-task effector (oral vs. manual).

Experiment 2

Method

Participants

Thirty-six new participants (34 women, two men; mean age = 22.1 years) took part in Experiment 2 and received either course credit or €6 for participation. Thirty-three of the participants were right-handed, and all reported normal or corrected-to-normal vision.

Apparatus, stimuli, and tasks

Those were the same as in Experiment 1.

Procedure

No changes were made to the procedure, except for the following. Eighteen of the participants performed the control condition, the dynamic manual gesture condition, and the static manual keypress condition, whereas the remaining 18 participants performed the control condition, the dynamic oral nonword condition, and the static oral spattle-with-marble condition. All conditions were counterbalanced in order and were performed for three consecutive blocks. All participants performed only the digit-judgment tasks in the first block; this block was excluded as a practice block from the analysis. Overall, the participants had to complete ten blocks of 80 trials, which took about 45 min. The secondary tasks were practiced before their blocks began, until synchrony with the metronome (for the nonword condition) or a continuous movement (for the gesture condition) had been achieved, and secondary-task performance was again monitored by the experimenter or recorded via Audacity, in the case of the nonword condition.

Design

The independent within-participants’ variables were secondary-task condition (none, dynamic, or static) and task transition (switch vs. repetition trials). Secondary-task effector (oral vs. manual) was included as a between-participants variable. RTs and PEs were the dependent variables. The analyses were performed as in Experiment 1. Degrees of freedom were again Greenhouse–Geisser-corrected, but rounded degrees of freedom are reported here for better readability.

Results

Data trimming was done as in Experiment 1. Thus, the entire first block and all first trials were excluded, as were all errors (8.5%) and the trials following errors, as well as all direct stimulus repetitions (11.6%). The RT data were trimmed by removing all RTs above or below three standard deviations from the mean for each condition, defined by effector (oral vs. manual), secondary-task dynamics (dynamic vs. static), and task transition (switch vs. repetition). Overall, we removed 27.2% of collected data and were left with, on average, 41 observations per participant in each cell of the design. Our omnibus ANOVA comprised the within-participants factors task transition (switch vs. repetition) and secondary-task condition (dynamic, static, and control), as well as the between-participants factor secondary-task effector (oral vs. manual). In this ANOVA, we found a main effect of task transition, F(1, 34) = 79.73, p < .001, ηp2 = .70; overall, participants showed a switch cost of 113 ms. Next, we found a main effect of secondary-task condition, F(2, 61) = 8.73, p = .001, ηp2 = .20. Participants were slowest in the dynamic secondary-task conditions (858 ms) [t(35) = 3.39, p = .002, d = 0.56, for the comparison against the control condition], slightly faster in the static secondary tasks (796 ms) [t(35) = 2.10, p = .004, d = 0.35, for the comparison against the control condition] and fastest in the control condition (755 ms). To assess the impact of secondary-task dynamics, we compared performance with the dynamic secondary tasks to that with the static secondary tasks across both groups. The dynamic secondary-task conditions differed significantly from the static secondary-task conditions in terms of mean RT level, 858 versus 796 ms, t(35) = 2.40, p = .002, d = 0.4, replicating the impact of secondary-task dynamics from Experiment 1 in increasing mean RT level. Interestingly, we found an impact of secondary-task effector on secondary-task mean RT levels, F(2, 61) = 4.29, p = .02, ηp2 = .11. Participants showed the largest RT increase for the dynamic manual secondary-task condition, whereas the static manual secondary-task condition was comparable to the static oral secondary-task condition (see Fig. 2).

Fig. 2
figure 2

Reaction time (RT) data for task transition and secondary-task condition in both secondary-task effector groups of Experiment 2. Error bars denote 95% within-participants confidence intervals (Baguley, 2012; Cousineau, 2007)

The switch cost was modulated by secondary-task condition, F(2, 68) = 3.01, p = .07, ηp2 = .08, but unlike in Experiment 1, to a nonsignificant degree. The switch costs were 86 ms for dynamic secondary tasks, 113 ms for static secondary tasks, and 133 ms in the control condition. Switch cost was thus reduced once again mostly in dynamic secondary-task condition, and largest in the control condition, without any secondary task.

In follow-up 2×2 ANOVAs, we compared all secondary task conditions to their (respective) control condition, as we had in Experiment 1. For the dynamic oral condition, in which participants repeated a nonword paced by the metronome, the 2×2 ANOVA revealed no main effect of secondary task, F(1, 17) = 0.58, p = .46, ηp2 = .03, but a significant reduction of switch cost, as indicated by the interaction, F(1, 17) = 5.62, p = .03, ηp2 = .25 (see Fig. 2 and Table 3). The switch cost was reduced from 125 to 70 ms, but still remained significant, t(17) = 2.75, p = .013, d = 0.67.

Table 3 Mean RT (SD) in milliseconds and error percentage (SD) for secondary task condition and task transitions in Experiment 2

For the static secondary task in the group with oral effectors, we found no main effect of secondary-task condition, nor a modulation of switch cost from holding the spattle, F(1, 17) = 0.01, p = .98, ηp2 < .01, and F(1, 17) = 0.05, p = .82, ηp2 = .03, for the interaction. The switch cost was 130 ms, t(17) = 5.49, p < .001, d = 1.33.

The comparable 2×2 ANOVAs contrasting the control condition to dynamic and static secondary-task conditions were conducted for the group performing the manual secondary tasks. When performing the gesture, participants were significantly slower than in either the control or the key condition, as indicated by the main effect of secondary task, F(1, 17) = 18.91, p < .001, ηp2 = .53. However, we found that the reduction of switch cost was not significant, F(1, 17) = 1.02, p = .33, ηp2 = .06, when compared to the control condition. The switch cost was 101 ms when performing the gesture, t(17) = 4.30, p < .001, d = 1.40. For the static secondary task in the manual group, we found a main effect of secondary task, F(1, 17) = 7.34, p = .01, ηp2 = .30, showing longer RTs than in the control condition, 834 versus 752 ms. The switch cost was again modulated numerically for the group with static manual secondary task, F(1, 17) = 2.76, p = .12, ηp2 = .14; it was 104 ms, t(17) = 6.72, p < .001, d = 1.63.

To investigate the impact of secondary-task dynamics, we ran the same ANOVAs as in Experiment 1. For the ANOVA with dynamic secondary tasks compared to the control condition for each secondary-task effector, we could establish the interaction, F(1, 34) = 4.21, p = .05, ηp2 = .11; the switch cost decreased from 133 ms in the control condition to 86 ms in the conditions with dynamic secondary tasks. However, we also found a significant modulation of this switch-cost reduction by secondary-task effector, as indicated by the three-way interaction between task transition, secondary-task condition (control vs. dynamic), and secondary-task effector (oral vs. manual), F(1, 34) = 6.42, p = .02, ηp2 = .16. The dynamic manual secondary task led to a smaller reduction in switch cost as did the dynamic oral secondary task. For the static secondary-task conditions, however, the reduction in switch cost from 142 to 117 ms was not significant, F(1, 34) = 1.03, p = .32, ηp2 = .02. Thus, as in Experiment 1, the dynamics of the secondary task had a large impact on the observed reduction in switch cost when compared to the control condition (see Fig. 2).

When comparing dynamic to static secondary tasks, as we did in Experiment 1, we found a main effect of secondary task, F(1, 34) = 5.60, p = .02, ηp2 = .14, reflecting longer mean RTs for the dynamic than for the static secondary task. Furthermore, we found a numerical trend toward a significantly smaller switch cost with the dynamic secondary task, similar to the effect found in Experiment 1, F(1, 34) = 2.67, p = .11, ηp2 = .07.

As regards the influence of secondary-task effector on the reduction of switch costs for the dynamic-versus-static secondary-task contrast, we found that the group with the manual secondary tasks was slower overall, 833 versus 770 ms, F(1, 34) = 3.22, p = .08, ηp2 = .09, but no effect on switch cost was found, F(1, 34) = 0.01, p = .94, ηp2 < .001, for any secondary-task effector. Furthermore, we did not find any differential influence on the modulation of switch cost by secondary-task effector: F(1, 34) = 2.26, p = .14, ηp2 = .06, for the three-way interaction of task transition, secondary task dynamics, and secondary task effector. In sum, as in Experiment 1 we found no effect of secondary-task effector, although our secondary-task conditions with manual effectors did lead to an overall increase in mean RT.

We also assessed whether the proportional switch cost was affected when controlling for mean RT differences because of the secondary-task demands, given the larger impact of secondary-task effector in Experiment 2 than in Experiment 1. Again, we used the task-repetition RT as the baseline, averaging across secondary-task effectors in a first analysis. As in the post-hoc ANOVAs, we found a significant reduction in proportional switch cost for the dynamic secondary-task conditions, t(35) = 3.02, p = .005, d = 0.50, but not for the static secondary-task conditions, t(35) = 1.09, p = .28, d = 0.18. When split by secondary-task effectors and examining only the task repetition RTs of the respective conditions, we found that the reduction for the dynamic secondary task was driven mainly by the dynamic oral secondary task, t(17) = 3.00, p = .007, d = 0.71, whereas in the dynamic manual secondary-task condition, the switch-cost reduction did not attain significance, t(17) = 1.72, p = .10, d = 0.40. In the condition with the oral effector and static secondary task of holding the spattle, t(17) = 0.68, p = .50, d = 0.16, the proportional switch cost was not significantly affected. However, we found that the manual static secondary task did lead to a significant reduction in proportional switch cost, t(17) = 2.12, p = .05, d = 0.50 (see Table 3).

Overall, participants committed 8.4% errors. We performed the same omnibus ANOVA also on the arcsine-transformed error rates. We again found main effects of task transition, F(1, 34) = 44.98, p < .001, ηp2 = .57, and secondary-task condition, F(2, 68) = 36.32, p < .001, ηp2 = .52. Participants exhibited an error switch cost of 3.1% and performed worse in the dynamic secondary-task (11.4%) than in the static secondary-task (8.3%) conditions, and were best in the control conditions (6.4%; see Table 3). However, we found no interaction between task transition and secondary task, and hence no indication of a modulation of error switch cost by secondary task, F(2, 68) = 0.18, p = .80, ηp2 = .005. Secondary-task effectors also had no differential effect, F(1, 34) = 0.23, p = .64, ηp2 = .007, and did not influence switch cost differentially, F(2, 68) = 1.19, p = .31, ηp2 = .03.

Overall, participants’ performance of the secondary tasks was at ceiling (less than 1% erroneous responses). As in Experiment 1, with the secondary tasks the experimenter was advised to start the task-switching blocks only once participants were able to keep pace with the metronome in the nonword condition and to perform a smooth movement of the gesture. For the nonword condition, we recorded participants’ utterances using Audacity and submitted the files obtained to the R package VoiceExperiment (Nett, 2017), to extract metronome and speech onsets. To analyze synchrony with the metronome, we extracted voice/sound onsets and subtracted the end of the metronome sound from the start of the utterance. The mean temporal distance from metronome to utterance was calculated as 474 ms (SD = 343). Overall, these data suggest, in line with experimenter’s supervision, that participants performed the secondary tasks extremely well.

Discussion

Experiment 2 replicated the main findings of Experiment 1—namely, reduced switch costs in the case of dynamic secondary-task conditions. Static secondary-task conditions, while still reducing the switch cost, did so to a nonsignificant degree. In addition, Experiment 2 also ruled out prolonged practice with the primary task requirement as a potential explanation for the observed reduction in switch costs with our dynamic secondary tasks in Experiment 1.

Failure of episodic memory? Exploratory analysis

The results of Experiments 1 and 2 are consistent with the episodic-memory account of switch cost that we outlined, which states that task-repetition benefits are the main underlying cause of the observed switch cost, because repetitions benefit from recently formed episodic memories (Koch & Allport, 2006; Sohn & Anderson, 2001; see also Kiesel et al., 2010). To provide further corroborating evidence for our account that the observed reduction in switch cost was due to a reduced task-repetition benefit, we compared the task-repetition RTs for the control and dynamic secondary-task conditions. Please note that our account predicts that RTs in repetition trials should be (significantly) increased, whereas the switch RTs for control and dynamic secondary-task conditions should not be statistically different. In Experiment 1, for the gesture condition relative to the control condition, we found significantly increased RTs in task-repetition trials, t(23) = 3.49, p = .003, d = 0.73, but no significant difference for task-switch RTs, t(23) = 1.32, p = .198, d = 0.28. When comparing the nonword condition to the control condition in Experiment 1, neither task-repetition RTs, t(23) = 1.47, p = .15, d = 0.31, nor task-switch RTs, t(23) = – 0.71, p = .15, d = – 0.15, were increased. In Experiment 2, we did find a significant difference in RTs (relative to the control condition) for task-repetition trials in the dynamic manual condition with the gesture, t(17) = 6.24, p < .001, d = 1.51; however, unlike in Experiment 1, task-switch RTs were also significantly increased, t(17) = 2.77, p = .013, d = 0.67. For the nonword condition, as in Experiment 1, neither task-repetition RTs, t(17) = 1.73, p = .10, d = 0.42, nor task-switch RTs, t(17) = 0.06, p = .956, d = 0.01, differed significantly from the RTs obtained in the control condition. In sum, increases in task-repetition RTs were found for our dynamic manual secondary tasks in both experiments, whereas such evidence was not observed in the dynamic oral secondary-task condition. Thus, only tentative evidence for our claim of failures to build episodic-memory traces for later use could be found in this auxiliary analyses, and further (experimental) evidence seems needed to substantiate our claims.

Analysis of task congruency effects

In addition to this auxiliary analysis of increases in repetition RTs, we analyzed congruency effects. Congruency effects refer to the performance difference between stimuli that require the same response in both tasks, given a stimulus–response-mapping (i.e., 3 would be a “congruent” stimulus when “smaller” and “odd” were mapped onto the same response keys, contrary to “incongruent” digits, which would require different responses dependent on the task context). Congruency effects are usually taken as an indication that both task sets (the relevant and irrelevant ones) are active (see van ’t Wout, Lavric, & Monsell, 2015, for an example of this logic). Commonly, the effects of congruency are overall faster performance and reduced switch costs and error rates for congruent as compared to incongruent trials. Analyzing congruency effects, van ’t Wout and colleagues found no evidence for concurrent task-set activation in a larger set of tasks, in contrast to other studies that had reported congruency effects when dealing with distractor tasks (Kim, Kim, & Chun, 2005). One important difference between the studies of van ’t Wout and colleagues and Kim and colleagues might be the number of possible tasks. The design in the study by Kim and colleagues was more comparable to our study, with one main task (Stroop) and additional WM load, whereas the study by van ’t Wout and colleagues asked participants to alternate among a varying number of classification tasks and a three-choice response set. One important difference between our study and Kim and colleagues’, however, might be the degree of concurrent response activation, which might be stronger in a Stroop task than in the task-switching paradigm. Likewise, van ’t Wout and colleagues employed at least three tasks, which also renders the congruency variable more complex (by including partially congruent and incongruent items). Overall, we expected to find congruency effects; however, we did not expect to see a modulation of congruency effects by secondary task, since the response modalities (i.e., the effectors used in the primary and secondary tasks) were different. Furthermore, our secondary task did not require response selection processes. Congruency effects have been hypothesized to arise because of activated long-term memory (Meiran & Kessler, 2008). These long-term memory contents should have already formed in the first block of our experiment, without any secondary task, and then have activated automatically during the course of the experiment.

The analyses yielded main effects of congruency for both experiments (see the Appendix, Tables 4 and 6): F(1, 23) = 54.23, p < .001, ηp2 = .70, for Experiment 1, and F(1, 34) = 34.64, p < .001, ηp2 = .50, for Experiment 2. Congruency modulated switch cost significantly in Experiment 1, F(1, 23) = 6.11, p = .021, ηp2 = .21, and numerically in Experiment 2, F(1, 34) = 3.75, p = .10, ηp2 = .061. However, the reduction of switch cost in congruent trials was not modulated by the secondary-task condition in either experiment:, F(3, 65) = 0.48, p = .68, ηp2 = .02, for the three-way interaction between task transition, secondary task condition, and congruency in Experiment 1, and F(2, 62) < 0.01, p = .99, ηp2 < .001, for the same interaction in Experiment 2 (see the Appendix, Tables 5 and 7, for numerical values).

To conclude, we observed significant congruency effects and a reduction of switch cost in both RTs and error rates (see the Appendix, Tables 4 and 6) for congruent as compared to incongruent trials. Overall, these data fit well with recent theorizing on the origin of congruency effects as arising from activated long-term memories, which were probably built up in our first block and not further influenced by secondary-task demands once they were stored in memory. Furthermore, our finding of no influence of secondary-task demands on congruency effects also suggests that the representations underlying congruency effects might differ from the episodic-memory traces used in the case of task repetition, which according to our results lead to task-switch costs.

General discussion

In two experiments we investigated the impacts of dynamic and static secondary tasks on task-switching performance. We found reduced switch costs when participants simultaneously performed a dynamic secondary task, and this reduction was more pronounced when the dynamic secondary task had to be performed with the oral effector system rather than the manual effector system. In comparison, static secondary tasks did not lead to a significant modulation of switch cost. Importantly, performance in all secondary tasks was at ceiling, suggesting that participants complied well with the instructed secondary tasks.

With these experiments we aimed to contrast two main accounts of the origin of switch costs. First, the reconfiguration account, which postulates that an active control process implements the newly relevant task set in WM (e.g., Monsell & Mizon, 2006; Rogers & Monsell, 1995), and second, the episodic-memory account of task switching, which presumes an inert system that benefits from the reuse of formerly acquired episodic-memory traces (Altmann & Gray, 2008; Koch & Allport, 2006; Waszak et al., 2003).

Although we obtained significantly smaller switch costs for our dynamic secondary-task conditions, the causes underlying this reduction remain unclear. The observed increase in task-repetition RTs for the dynamic manual-effector conditions in both experiments supports our account of the loss of an episodic repetition benefit due to the dynamic secondary-task conditions. However, in the dynamic oral-effector condition with a nonword, we found a reduced switch cost in the absence of a significant increase in task-repetition RTs. The reduction of the switch cost was more prominent in the secondary task with oral effectors that had to be synchronized with a metronome than in a dynamic manual secondary task that utilized a comparable timing and resulted in a continuous movement.

One reason for this discrepancy might be the use of an external pacemaker. Recent work by Schmidt (i.e., Schmidt, 2016) has highlighted the role of response rhythms in the reduction of interference effects. Thus, it is possible that the additional external rhythm introduced by the metronome might have helped accomplish the secondary-task requirements, and so could account at least partially for the reduction in switch cost in comparison to a control condition that was self-paced. The external pacemaker might thus have acted as an anchor for response emission and helped keep attention focused.

However, the data clearly do not support the predictions made by the task-set reconfiguration account, which predicted an increase in switch costs, as had been reported in the work of Baddeley et al. (2001; Emerson & Miyake, 2003), for example. In addition, it should be noted that this increase in task-switch costs due to secondary-task requirements does not occur consistently and is malleable by a number of variables, such as keeping track of task sequences or cue transparency (Bryck & Mayr, 2005; Miyake et al., 2004; see Table 1), thereby arguing against a crucial role for basic verbalization processes in establishing the next task set.

Reductions of interference effects under secondary-task conditions have also been reported in a paradigm other than cued task switching—namely, the Stroop paradigm. Kim et al. (2005) investigated modulation of the Stroop effect by the WM load of a secondary-task requirement. The authors asked participants to maintain either other verbal information or spatial information. Assuming a multiple-resource account (Navon, 1984), they predicted less Stroop interference when the secondary task did not share content and processes (i.e., the spatial WM task) with the primary task (the Stroop color-matching task), a finding that was confirmed in their first experiment. In a second experiment, they reasoned that the interference effects should be reduced (i.e., less interference should be experienced) when the secondary task (i.e., verbal WM load) interfered with the processing of the distractor (i.e., color words). Therefore, they manipulated the overlap between the distractor in the primary task (i.e., color matching) and the secondary task load (WM span) and again found beneficial effects of distractor-relevant material in the secondary task that led to an overall reduction in the observed interference effect. Their observations, however, indicated an overall increase in RTs for any secondary-task condition, and especially for congruent trials. Therefore, it is not possible to infer from the reported data whether the reduction in Stroop interference was brought about by less interference in the incongruent condition or less of a benefit in the congruent condition, which would mirror our dynamic secondary-task conditions.

Likewise, it is conceivable that the primary influence of a secondary task is to abolish otherwise observed beneficial (memory-retrieval-based) effects. Given that handling a secondary task requirement might require more focused attention, this narrowing might block processing of otherwise helpful information. Such a modulation of beneficial memory-retrieval-based processes by introducing context changes (i.e., a cue modality change) has been reported for response repetition effects (Koch, Frings et al., 2018; Koch, Poljac et al., 2018), which were less beneficial in the case of a task repetition when the cue modality changed.

However, care should be taken if difference scores are used, since those have been shown to depend on the mean RT level (see, e.g., Dittrich, Kellen, & Stahl, 2014; Steinhauser & Hübner, 2008), which we found to be altered for the manual dynamic secondary task, but not for the dynamic secondary task with the nonword. One way to deal with such possible changes due to RT level differences across conditions might be to use proportional scores. Yet we also found a reduction in switch cost for proportional scores. No changes in mean RT level and a reduced influence on switch cost were observed for our static secondary tasks. Given the use of different effector systems for our primary task requirement (feet) and the secondary tasks (manual and oral), not much interference was expected, and therefore we conclude that the secondary-task dynamics matter to a larger degree than the pure muscle involvement required when pressing keys down or balancing a spattle, corroborating former work (e.g., Baddeley et al., 2001).

In summary, in a task-switching study involving two digit-judgment tasks, we observed reduced (rather than increased) RT switch costs when task switching had to be performed in parallel with a secondary task. The reduction in switch cost occurred across the effectors with which the secondary task was performed, but this seemed to depend mainly on the dynamic nature of the secondary task. Therefore, we suggest that a dynamic secondary task interferes with the formation of episodic-memory traces for the task episode, which is responsible for positive memory-retrieval-based processes associated with task repetitions. This episodic repetition benefit is reduced when a dynamic secondary task is performed in parallel with a primary task requirement that requires flexible retrieval of the currently relevant stimulus category–response category bindings. Further research will be required in order to establish exactly how dynamic secondary tasks interfere with episodic-memory mechanisms, such as task-specific binding processes.

Author note

The data and analysis scripts from this article can be viewed at the Open Science Project: https://osf.io/wxdez/?view_only=99bdb5669fe646ed8c71c5f96c08f245.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.