Intentional switching of auditory attention between long and short sequential tone patterns

Nolden, Sophie; Koch, Iring

doi:10.3758/s13414-017-1298-5

Intentional switching of auditory attention between long and short sequential tone patterns

Published: 15 February 2017

Volume 79, pages 1132–1146, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Intentional switching of auditory attention between long and short sequential tone patterns

Download PDF

Sophie Nolden¹ &
Iring Koch¹

1435 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

The current study focuses on auditory task switching, more precisely on switching attention between different temporal patterns of the same auditory stimulus. Tone sequences consisting of nine different pitch tones were presented aurally. Three repetitive short 3-tone patterns (local focus) were combined to a long pattern (global focus), and each could be either rising or falling, resulting in congruent or incongruent combinations. Participants were informed by a cue if they had to attend to the short or to the long pattern, and they indicated if the target pattern was rising or falling by pressing one of two keys. In two experiments, we investigated cued switches between the two attentional foci. Switch costs in reaction times and errors were observed when switching from the long to the short pattern but not when switching from the short to the long pattern. These asymmetric switch costs were reduced when participants had more time to prepare for the switch in a condition with a prolonged cue-stimulus interval. In addition, participants made more errors when global and local patterns did not correspond to each other (i.e., in incongruent trials) when attending to either of the patterns, but this congruency effect was not modulated by preparation time. The data suggest that the mechanisms of task goal prioritizing, as indicated by the asymmetric attention switch costs, are dissociable from those underlying stimulus selection, as indicated by the congruency effects.

Neural Switch Asymmetry in Feature-Based Auditory Attention Tasks

Article 23 January 2019

Preparing auditory task switching in a task with overlapping and non-overlapping response sets

Article Open access 15 February 2023

Cognitive control in the cocktail party: Preparing selective attention to dichotically presented voices supports distractor suppression

Article 19 December 2018

Introduction

Switching between different tasks is an important capacity that allows us to pursue complex goals as well as to adapt our behavior when environmental demands change (Allport, Styles, & Hsieh, 1994; Rogers & Monsell, 1995; see Kiesel et al., 2010; Vandierendonck, Liefooghe, & Verbruggen, 2010, for reviews). However, it is difficult to find an unambiguous definition of what a task is (Rogers & Monsell, 1995), and a task can comprise motivational factors, attention, perceptual, and motor processes, stimulus-response rules (i.e., a certain stimulus requires a left-hand response and another stimulus requires a right-hand response), or mnemonic processes, or even multiple subtasks (Hirsch, Nolden, & Koch, in press).

A typical finding in task switching in general are so-called switch costs. That is, when participants shift from one task to another, performance, as for example measured in response times or error rates, is worse than when the task remains constant. Two kinds of behavioral indices can be distinguished when switching between tasks. First, performance can be worse when the task changes from one trial to the next trial compared with when the task stays the same (“switch costs,” for a review, see Kiesel et al., 2010). Second, in addition to these “transient” switch costs, performance impairments between mixed blocks, in which participants’ task can change from trial to trial, and “pure” blocks, in which participants do not need to change the task, are also possible (“mixing costs”). One would typically compare performance in repetition trials of mixed blocks with trials in pure blocks (where all trials are repetition trials by definition). These costs of “sustained” control (Braver, Reynolds, & Donaldson, 2003) can be interpreted in terms of higher working memory load in mixed blocks than in pure blocks because participants have to keep two different stimulus interpretations active and because participants cannot predict the next task (Kiesel et al., 2010; Liefooghe, Barrouillet, Vandierendonck, & Camos, 2008).

Intentional control of auditory attention in selective listening

One way to implement a sequence of tasks that vary unpredictably is to use a cue at the beginning of each trial. This cue then indicates which task participants should perform in the upcoming trial. This kind of explicit cueing-procedure has been used in several studies to investigate auditory selective attention in a dichotic listening paradigm (Koch & Lawo, 2014, 2015; Koch, Lawo, Fels, & Vorländer, 2011; Lawo & Koch, 2014, 2015; Lawo, Fels, Oberem, & Koch, 2014). In the dichotic listening paradigm, two different auditory stimuli or auditory streams are presented simultaneously, one to each ear. Typically, participants only attend to one ear. Auditory selective attention allows the listener to focus on the relevant auditory stimulus, while distracting stimuli remain in the perceptual background (for reviews, see Bronkhorst, 2015; Lachter, Forster, & Ruthruff, 2004; Schneider, Li, & Daneman, 2007; Shinn-Cunningham, 2008). Early studies on auditory selective attention in a setting with competing stimulus streams have focused on sustained auditory attention and involuntary attention capture (for early examples of dichotic listening, see Broadbent, 1958; Cherry, 1953; for a review see Hugdahl, 2011, for studies on selective attention to a certain part of a complex stimulus, see Mondor, & Bregman, 1994; Mondor, Zatorre, & Terrio, 1998).

Combining task-switching methodology with dichotic listening allowed investigating an important aspect of auditory task switching, namely intentional control of the auditory attentional focus. In one study, Koch and collaborators applied a special case of auditory task switching in which the main focus was on the attentional demands of the task. Participants had to classify one of two dichotically presented number words: one spoken by a female voice and the other one spoken by a male voice (Koch et al., 2011). A visual cue that was presented before the spoken number words informed participants which speaker they had to attend to. The relevant speaker category (female or male) could change from trial to trial (switch trial) or remain the same (repetition trial). Classification rules were held constant because participants had to classify the relevant numbers according to their magnitude in all trials, no matter if the relevant speaker changed or repeated. In addition, stimulus-response mappings were also held constant, because participants had to press the left key for small numbers and the right key for large numbers, no matter if the relevant speaker changed or repeated. Thus, this special variant of auditory task switching allowed investigating the control of the auditory attentional focus while keeping the remaining task-related factors constant. Hence, cued auditory attention switching can be understood as a special case of task switching where the classification and the response mappings remain constant, but the relevant auditory stimulus selection criterion changes (see also Logan, 2005, for a similar approach to visual attention switching). Koch et al. (2011) found faster responses when the selection criterion repeated than when it switched and thus auditory switch costs.

To investigate if control processes in auditory task switching were proactive, preparation effects were investigated by manipulating the interval between cue and target stimuli. In addition to general preparation benefits, switch-specific preparation effects are an index of proactive control processes. The authors used a 2:1 cue-to-task mapping and focused on trials without immediate cue repetitions to control for perceptual cue priming in task repetitions. However, unlike many studies in the domain of visual task switching (Kiesel et al., 2010, for a review), auditory attention switch costs were not reduced when participants had more time to prepare for a switch (Koch et al., 2011, Experiment 3). Therefore, the factors that determine whether preparation for cued auditory attention shifts is effective still remained unclear.

Intentional control of auditory attention to temporally limited parts of auditory patterns

Notably, attending to one auditory stimulus among distracting stimuli, as applied in the dichotic listening paradigm, represents only one specific situation where auditory attention is deployed. It also is possible to attend to a certain part of an auditory stimulus, for instance when attending to a specific part of binaurally presented tone sequences. Some studies used “hierarchical” auditory stimuli consisting of short repetitive sequential patterns that were combined to a long sequential pattern (Bouvet et al., 2011; Bouvet et al., 2014; Justus & List, 2005; List & Justus, 2007, 2010; List et al., 2007; Ouimet, Foster, & Hyde, 2012; Sanders & Poeppel, 2007; Sanders & Astheimer, 2008). Hence, such studies examine the auditory analogy of hierarchical visual stimuli (Ivry & Robertson, 1998; Kimchi, 1992; Navon, 1977, 1981).

Justus and List (2005) used these kinds of tone sequences to investigate attentional persistence. In addition, they examined whether frequency and time were so-called indispensable attributes in the auditory modality (List & Justus, 2007). Auditory hierarchical stimuli consisted of sequences of nine tones (long pattern) that could be subdivided into three 3-tone sequences (short pattern). Each individual tone lasted for 150 ms. The 9-tone sequence as a whole could be arranged in one of four possible ways, as well as the 3-tone sequence (either two consecutive rising changes, two consecutive falling changes, a rising change followed by a falling change, or a falling change followed by a rising change). Two of these arrangements were targets, the other two served as distractors. Either the long or short pattern contained one of the targets; the other pattern contained one of the distractors (Justus & List, 2005, Experiment 2). Participants had to detect which of the two targets had been presented in a trial (e.g., two rising changes), but they did not know if the target would occur in the short pattern or in the long pattern before the presentation of the auditory stimulus. They thus needed to attend to both patterns to make a decision.

In this experimental paradigm, participants responded faster when the target was presented in the same temporal range (either short or long) as in the previous trial, suggesting attentional persistence, even when the specific target pattern had changed. In addition, participants made fewer errors when the target was presented in the long pattern than when it was presented in the short pattern. These results suggested, first, that participants showed generally better performance when attending to the long pattern than when attending to the short pattern, and second, that performance was worse when participants needed to change their auditory attentional focus from one trial to another than when it remained constant.

Further studies revealed that the non-attended pattern also influenced the processing of the attended pattern, which became evident in congruency effects that showed up as performance costs when target pattern and distractor pattern are incongruent to each other relative to when they are congruent (List, 2006). Several studies revealed greater congruency effects when attending to the short pattern than when attending to the long pattern (Bouvet et al., 2011; Ouimet et al., 2012; Sanders & Poeppel, 2007, Experiments 1 and 2).

One important property of the auditory hierarchical stimuli as described above is the sequential presentation of the tone patterns. Unlike simultaneously presented visual hierarchical stimuli (Navon, 1977), auditory hierarchical stimuli unfold over time. Consequently, the information necessary to classify the repetitive short pattern is available earlier than the information necessary to classify the long pattern. While this is a necessary prerequisite of these kinds of auditory hierarchical patterns, it makes overall response time (RT) differences between the long and the short pattern somewhat difficult to interpret.

For example, one could imagine that participants used the minimum information possible to classify the 9-tone sequences, which is earlier in the short-pattern condition than in the long-pattern condition. Notably, such a strategic temporal order bias would effectively lead to a shorter interval between the cue and the critical pattern-discriminating tone in the short-pattern condition than in the long-pattern condition. If so, RT switch costs should be reduced for the long pattern, because preparation would be more advanced with the long pattern, for which there is certainty once the second tone has passed by, than with the short pattern.

Moreover, there could be reduced RT congruency effects in the short pattern condition compared with the long pattern condition, due to the later presentation of interfering information from the long pattern when attending to the short pattern than vice versa. However, so far there is little empirical support for a strong impact of temporal order on auditory congruency effects. Previous studies showed symmetric congruency effects or even greater congruency effects in the short pattern than in the long pattern (Bouvet et al., 2011; Ouimet et al., 2012; Sanders & Poeppel, 2007).

Some authors dealt with the issue of temporal order by reporting RTs from the point in time when the minimum information to solve the task is available (Ouimet et al., 2012). Yet, such an approach suffers from requiring specific assumptions about participants’ strategies, which may vary and depend on factors such as musical experience. Therefore, overall RT difference between the long and the short pattern should be interpreted carefully and considered in light of the specific characteristics of the sequential presentation.

Goals of the present study

We examined mechanisms of intentional control of the auditory attentional focus. Indeed, the sequential level-repetition priming effect (Justus & List, 2005) also could be due to passive “inertia” (i.e., persistence) of the previously established auditory attentional focus rather than to an active attentional focus shifting process. Therefore, it is important to examine active preparation for shifts in auditory focus to different temporal patterns. To do this, we adapted Koch et al.’s (2011) cued attention switching approach to investigate if shifting the focus to a certain auditory short vs. long pattern, which is cued before the presentation of an auditory tone sequence, may rely on active and intentional processes. Therefore, we used 9-tone sequences that were similar to the sequences described above (Justus & List, 2005), with the long and the short pattern either rising or falling. Critically, before the stimulus occurred, a cue instructed participants to attend to either the long or the short pattern (Meiran, 1996; Jost et al., 2013, for review). That is, the task did not resemble an auditory search task that required attending to both patterns until a target was detected (Justus & List, 2005), because participants could prepare for the selection of the cued relevant temporal pattern and could completely ignore the irrelevant temporal pattern. Our study thus targeted a novel research question, because it addressed mechanisms of intentional control of the auditory focus to a temporally limited part of a tone sequence.

Notably, attention to an auditory pattern within a sequence of tones differs in several aspects from attention to one of two dichotically presented stimuli (Koch et al., 2011). For instance, Koch et al. (2011) used dichotic listening and presented stimuli that belonged to different categories (such as male vs. female voices) and are spatially distinct, which is not the case for long or short patterns of the same stimulus. In addition, stimuli are presented simultaneously in dichotic listening situations and require attenuation of the distracter and/or enhancement of the target, which also is not the case when attending to auditory patterns of a sequence. Therefore, the present study examined a different research question, focusing on auditory attention switching of temporal levels in tone sequences.

Overview of the current experiments

Adjusting the attentional auditory focus in auditory patterns could potentially be related to two types of performance impairments, namely mixing costs and switch costs. In addition, we were interested in whether attention switches can be prepared before the presentation of the auditory stimulus. Therefore, we manipulated preparation time (i.e., the time between the cue and the auditory stimulus [CSI]).

In Experiment 1, we investigated auditory mixing costs and switch costs. In previous studies, the processing of long auditory patterns was related to better performance than the processing of short auditory patterns (Justus & List, 2005). We therefore expected asymmetric costs between the two patterns, with two possibilities for the direction of the asymmetry. Either switch costs would be smaller for attending to the long pattern, because this represents the default and should therefore be rather easily processed. Alternatively, if the processing of the (default) long pattern is inhibited when attending to the short pattern, residual inhibition might cause larger switch costs when shifting the focus back to the global pattern (Allport et al., 1994; for reviews see also Koch et al., 2010; Monsell, Yeung, & Azuma, 2000). In both cases, switch costs would be asymmetric, with larger switch costs for either the short or the long pattern. Experiment 1 was aimed at deciding between these opposing scenarios.

In Experiment 2, we were interested in preparatory mechanisms of attention switches and therefore varied the time that participants could use to prepare for the auditory attentional focus of the next trial (i.e., the CSI). We explicitly targeted the active adjustment of the attentional focus, in contrast to potentially passive processes of sequential level repetition priming. In addition, in both experiments, we also examined congruency effects, which reflect the involuntary processing of irrelevant stimulus aspects, so that patterns of asymmetric interference can inform us about processing biases. Importantly, by investigating mixing costs and switch costs as well as congruency effects, we targeted two different important aspects of auditory attention, namely cognitive control of the auditory attentional focus and involuntary processing of irrelevant information.

Experiment 1

The goal of Experiment 1 was to investigate task-switching when participants attended to a specific auditory pattern within the same auditory stimulus. Participants listened to sequences of tones and they attended either to the entire pattern (long) or to the repetitive shorter pattern (short). The auditory attentional focus either varied from trial to trial (“mixed blocks”) or remained constant within an experimental block. We used a 1:1 cue-to-task mapping in the current experiment because the objective was to investigate if mixing costs and switch costs could be found at all using the present task requirements.

Method

Participants

Twenty-four participants participated in Experiment 1. Three participants with an excessive number of errors (> 40%) in either the long-pattern or the short-pattern condition were replaced by new participants. The final 24 participants had a mean age of 25 years (SD = 5 years, range: 19-36 years), 17 were female, and 22 were right-handed. None of them reported any hearing problems. On average, participants had 9 years (SD = 3) of musical training during their school education. One participant reported that she saw herself as a musician. Participants gave informed consent and received partial course credit or 8 € for their participation.

Stimuli and apparatus

Visual cues were presented at the center of a 17-inch monitor with white background. The participants’ distance to the screen was about 60 cm. The cues were a blue and an orange asterisk that were 6 mm in width and 6 mm in length.

Auditory stimuli were sequences of 9 tones that were chosen from a set of 30 different tones. The fundamental frequencies of the tones in Hz were 155, 165, 176, 188, 201, 215, 230, 246, 263, 281, 300, 320, 342, 365, 390, 416, 444, 474, 506, 540, 576, 615, 657, 701, 748, 798, 852, 910, 971, and 1037. Tones consisted of three harmonics with decreasing intensity (1/number of harmonics). We chose tones that are not related to western musical scale to avoid associations with implicit or explicit musical knowledge (see also Trehub, Schellenberg, & Kamenetsky, 1999). Each tone lasted for 200 ms, including onset and offset ramps of 10-ms each. Tones were adjusted for subjective loudness. Three-tone patterns were built from these tones, such that there were steps of four tones between adjacent tones. The intertone interval was 0 ms. These patterns could be rising or falling, for example the tones with the frequencies 155 Hz, 201 Hz, and 263 Hz would built a rising pattern. Three of these three-tone patterns were then combined to the nine-tone sequence, such that the three-tone patterns always had the same structure, hence all of them were rising or all of them were falling. The first tones of the individual three-tone patterns were three steps apart and could be combined in either a rising or a falling way, independently from the direction within the three-tone patterns. Four kinds of nine-tone sequences were constructed this way (Fig. 1 depicts a schematic description of the stimuli). First, the short three-tone patterns and the long nine-tone sequence could be both rising (congruent), which would for example result in a sequence comprising the following frequencies: 155, 201, 263/188, 246, 320/230, 300, 390 Hz. Second, the short three tone patterns and the long nine-tone sequence could both be falling (congruent), which would for example result in a sequence comprising the following frequencies: 390, 300, 230/320, 246, 188/263, 201, 155 Hz. Third, the short three-tone patterns could be falling and the long nine-tone sequence rising (incongruent), which would for example result in a sequence comprising the following frequencies: 263, 201, 155/320, 246, 188/390, 300, 230 Hz. Last, the short three-tone patterns could be rising and the long nine-tone sequence falling (incongruent), which would for example result in a sequence comprising the following frequencies: 230, 300, 390/188, 246, 320/155, 201, 263 Hz. Auditory stimuli were presented via headphones (Grundig 38629 DJ Headphones). All stimuli were presented with E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA).

Participants responded with “c” (left index finger) and “m” (right index finger) on the computer keyboard (QWERTZ). They were asked to indicate if the attended pattern was falling or rising by pressing “c” when the attended pattern was falling and “m” when the pattern was rising, thus the mapping between “rising/falling” was compatible with the spatial position of the response keys.

Procedure

Each trial started with the visual cue (an orange or blue asterisk) that remained on the screen until the participants’ response. The color of the cue indicated the auditory attentional focus, such that the mapping of the colors to the auditory attentional foci was counterbalanced over participants. After 500 ms, the auditory stimulus started (cue-stimulus interval, CSI). Participants had maximally 4,400 ms from onset of the auditory stimulus to indicate if the pattern of the relevant auditory attentional focus was rising or falling. The visual cue remained on screen until the response was made. In case of an error, the word Fehler! (German for error) was displayed in red color on the center of the screen for 500 ms. In case of no response after 4,400 ms, the word Schneller! (German for faster) was displayed in red color on the center of the screen for 500 ms. After a blank of 500 ms (response-cue interval, RCI), the next trial started (Fig. 2).

Participants completed 12 experimental blocks of 40 trials each. Eight blocks were mixed blocks with the auditory attentional focus varying randomly from trial to trial (as indicated by the color cue). During 2 blocks, participants were instructed to respond to the long pattern only, and during the remaining 2 blocks, participants were instructed to respond to the short pattern only (pure blocks). We chose twice as many trials in mixed blocks as in pure blocks to compare conditions with equal number of trials in the mixing-costs contrast, in which we would compare pure blocks with the repetition trials of the mixing blocks only. The order of the blocks was counterbalanced over participants with two mixed blocks alternating with one pure block. Half of the participants started with a pure block, and the other half started with a mixed block. Half of the participants attended to the long pattern in their first pure block, and the other half attended to the short pattern in their first pure block. Before the experimental blocks, participants completed four practice blocks with eight trials each. Two practice blocks were mixed blocks, two were pure blocks, and the order was counterbalanced over participants.

Participants reported demographic data and musical expertise before the experiment and were asked about strategies after the experiment. Participants were instructed orally and with written instructions on the computer screen. The total experiment lasted about 45 minutes.

Design

Independent variables were auditory attentional focus (long pattern, short pattern), transition (pure, repetition, switch), and congruency (congruent, incongruent). Dependent variables were RTs and errors.

We analyzed two non-orthogonal contrasts on RTs and errors. First, we analyzed the mixing-costs contrast with the independent variables auditory attentional focus (long pattern, short pattern), transition (pure, repetition), and congruency (congruent, incongruent). Only trials from pure blocks and trials from mixed blocks with an immediate repetition of the auditory attentional focus were used for this contrast (Kiesel et al., 2010).

Second, the switch-costs contrast was analyzed. Only trials from mixed blocks were used for the switch costs contrast, including switch trials and repetition trials. Independent variables were auditory attentional focus (long pattern, short pattern), transition (repetition, switch), and congruency (congruent, incongruent).

Results

Practice trials, the first trial of each block, error trials, and trials following errors were excluded from the analysis of the RTs, as well as outliers (RT ± 3 SD from the mean of each condition). Practice trials, the first trial of each block, and trials following errors were excluded from the analysis of the error rates.

Mixing costs contrast

Reaction times

We conducted a 2x2x2 ANOVA with the within-subject variables auditory attentional focus (long pattern, short pattern), transition (pure, repetition), and congruency (congruent, incongruent) on RTs. The short pattern could be classified at the onset of the second tone (200 ms after stimulus onset) and the long pattern at the onset of the fourth tone (600 ms after stimulus onset). Figure 3a depicts the uncorrected RTs, and Fig. 3b depicts the RTs corrected by 200 ms or 600 ms, respectively (all analyses are based on the uncorrected RTs). The ANOVA revealed a main effect of auditory attentional focus, F(1, 23) = 56.12, MSE = 222977, p < 0.001, η_p ² = 0.71, indicating that RTs were 510 ms slower for the long pattern than for the short pattern (1,491 ms vs. 981 ms). The main effect of transition was not significant, F(1, 23) = 1.15, MSE = 17,304, p > 0.29, η_p ² = 0.05, suggesting that there were no overall mixing costs. The interaction of transition and auditory attentional focus was a not significant either, F(1, 23) = 3.26, MSE = 24,642, p > 0.08, η_p ² = 0.12, even though mixing costs were somewhat greater for the short-pattern condition than for the long-pattern condition.

The main effect of congruency was not significant, F(1, 23) = 2.18, MSE = 15,622, p > 0.15, η_p ² = 0.09, but the interaction of congruency and auditory attentional focus was significant, F(1, 23) = 8.35, MSE = 8,525, p < 0.01, η_p ² = 0.27, indicating larger congruency effects in the long-pattern condition than in the short pattern condition. Indeed, only in the long-pattern condition did participants respond faster in congruent trials than in incongruent trials, 1,459 ms vs. 1,524 ms; congruency effect of 65 ms, t(23) = −3.62, p < 0.01, whereas in the short-pattern condition there was no significant congruency effect, 987 ms vs. 975 ms, t(23) = 0.45, p > 0.66. The interaction of transition and congruency, F(1, 23) = 1.30, MSE = 5340, p > 0.26, η_p ² = 0.05, and the three-way interaction were not significant, F < 1.

Errors

We conducted the same ANOVA on error rates (Fig. 4). The ANOVA revealed a main effect of auditory attentional focus, F(1, 23) = 6.00, MSE = 0.021, p < 0.03, η_p ² = 0.20, indicating that error rates were smaller for the long pattern than for the short pattern: 7.7% vs. 12.8%. Note that the smaller error rates and the slower responses in the long-pattern condition compared with the short-pattern condition might suggest a speed-accuracy trade-off. However, as explained in the introduction, it is possible that the slower responses in the RTs to the long pattern might be partly attributable to the sequential presentation of the sequences (see also Experiment 2).^{Footnote 1}

The main effect of transition was not significant, F < 1, showing no evidence for mixing costs. The interaction of transition and auditory attentional focus was not significant either, F(1, 23) = 3.67, MSE = 0.003, p > 0.06, η_p ² = 0.14, even though mixing costs were somewhat larger for the long-pattern condition than for the short-pattern condition. Note that this nonsignificant trend was in the opposite direction of the non-significant trend in the RTs, and both failed to reach the significance threshold, so that they may not represent robust findings. Mixing costs in RTs and errors were thus small and non-significant for both auditory attentional foci.

In addition, the main effect of congruency was significant, F(1, 23) = 10.73, MSE = 0.017, p < 0.01, η_p ² = 0.32, indicating that participants made fewer errors in congruent trials than in incongruent trials (7.2% vs. 13.3%). The interaction of auditory attentional focus and congruency was not significant, F(1, 23) = 1.39, MSE = 0.008, p > .25, η² = 0.06, just as all other effects, Fs < 1.

Switch costs contrast

Reaction times

We conducted a 2x2x2 ANOVA with the within-subject variables auditory attentional focus (long pattern, short pattern), transition (repetition, switch), and congruency (congruent, incongruent) on RTs (Fig. 3). The ANOVA revealed a main effect of auditory attentional focus, F(1, 23) = 37.75, MSE = 172320, p < 0.001, η_p ² = 0.62, indicating that RTs were 368 ms slower for the long pattern than for the short pattern (1,481 ms vs. 1,113 ms). The main effect of transition was significant, F(1, 23) = 39.75, MSE = 12,505, p < 0.001, η_p ² = 0.63. Importantly, the interaction of transition and auditory attentional focus also was significant, F(1, 23) = 45.14, MSE = 10,962, p < 0.001, η_p ² = 0.66. Indeed, in the short-pattern condition, RTs were 203 ms higher in switch trials than in repetition trials, 1,215 ms vs. 1,012 ms, t(23) = −7.52, p < 0.001, whereas in the long-pattern condition, there were no switch costs at all, 1,481 ms vs. 1,481 ms, t(23) = −0.02, p < 0.98.

In addition, the main effect of congruency was significant, F(1, 23) = 6.56, MSE = 11735, p < 0.02, η_p ² = 0.22, indicating that participants responded faster in congruent trials than in incongruent trials (1,277 ms vs. 1,317 ms). The interaction of auditory attentional focus and congruency: F(1, 23) = 4.24, MSE = 8,479, p > 0.05, η_p ² = 0.16 was not significant but suggested slightly greater congruency effects for the long-pattern condition than for the short-pattern condition. The interaction of transition and congruency was not significant, F < 1. The three-way interaction was not significant either, F(1, 23) = 1.29, MSE = 3,814, p > 0.27, η_p ² = 0.05.

Errors

We conducted the same ANOVA on error rates (Fig. 4). The ANOVA revealed a main effect of auditory attentional focus, F(1, 23) = 15.71, MSE = 0.015, p < 0.001, η_p ² = 0.41, indicating that error rates were smaller for the long pattern than for the short pattern (7.9% vs. 14.9%). The main effect of transition was not significant, F(1, 23) = 3.92, MSE = 0.004, p > 0.05, η_p ² = 0.15, but the interaction of auditory attentional focus and transition was significant, F(1, 23) = 13.39, MSE = 0.004, p < 0.001, η_p ² = 0.37, indicating that switch costs were smaller in the long-pattern condition than in the short-pattern condition. Indeed, in the long-pattern condition, there was no significant difference of repetition trials and switch trials, 8.7% vs. 7.1%, t(23) = 1.67, p > 0.10, whereas in the short-pattern condition, participants made less errors in repetition trials than in switch trials, 12.3% vs. 17.5%, t(23) = −3.37, p < 0.01. Note that the asymmetric switch costs in the errors rates confirm the pattern in the RTs (see Discussion).

In addition, the main effect of congruency was significant, F(1, 23) = 19.22, MSE = 0.018, p < 0.001, η_p ² = 0.46, indicating that participants made less errors in congruent trials than in incongruent trials (7.2% vs. 15.6%). The interaction of transition and congruency was not significant, F(1, 23) = 4.13, MSE = 0.004, p > 0.05, η_p ² = 0.15, with slightly smaller congruency effects in repetition trails than in switch trials. The interaction of auditory attentional focus and congruency was not significant, F(1, 23) = 2.25, MSE = 0.011, p > 0.14, η_p ² = 0.09. The three-way interaction was not significant either, F(1, 23) = 1.88, MSE = 0.006, p > 0.18, η_p ² = 0.06.

Discussion

In this experiment, we examined mixing costs and switch costs as empirical markers for intentional control of attentional focus on short vs. long auditory patterns. In addition, we examined congruency effects as an empirical marker for involuntary processing of task-irrelevant (i.e., noninstructed) information. Overall, we found no significant mixing costs when using performance in experimental blocks with constant attentional focus as baseline, suggesting that working-memory load in repetition trials plays a minor role in auditory selection of tone patterns. However, we observed clear switch costs in the mixed blocks, which were markedly asymmetric, with large switch costs when switching to the short pattern and basically no switch costs at all when switching to the long pattern. We observed this asymmetry in both RT and error rates, confirming each other. Hence, the asymmetric switch costs in RTs and errors suggest more efficient adjustment of auditory attention when attending to the long pattern than when attending to the short pattern, possibly indicating that attending to the longer pattern represents the default processing mode in situations with short and long patterns.

However, as the 9-tone sequences unfolded over time with the short pattern being faster to be identified than the long pattern, the question arises if the asymmetric switch costs in the RTs could be related to the temporal structure of the sequences. Hence, it is possible that participants used the minimum information possible, which is the second tone in the short pattern but the fourth tone in the long pattern. This would lead to a shorter interval between the cue and the informative tone in the short-pattern than in the long pattern, so that there would be effectively a longer preparation time for a switch to the long pattern. If preparation reduces the switch costs, then we would expect smaller switch costs for the long pattern than for the short pattern. Yet, we propose that the observed switch-cost asymmetry is at least partly, or even largely, due to attending to the long pattern representing the default processing mode rather than on critical differences in time-based, for two reasons.

First, the pattern of congruency effects is not in line with such an account. Specifically, if participants based their performance on the minimal information required to discriminate the short and the long auditory pattern, then participants should be able to select their responses much earlier (by about 400 ms) when attending to the short pattern. If so, then the response should be selected in many cases even before the discriminating information for the long pattern (i.e., the fourth tone) becomes available, which should result in clearly asymmetric congruency effects. However, the pattern of congruency effects does not clearly support this account. In the mixing costs contrast, there is no such asymmetry in the error rates, even though it is present in the RT data. Note though that we did not find mixing costs in the first place, for both attentional foci. Moreover, in the switch costs contrast, this asymmetry of the congruency effect was nonsignificant both in the RT data and the error rates, suggesting that congruency effects occurred for both the short and the long pattern. This pattern of results appears to be in line with the idea that participants actually often tend to wait until the information about the identity of the long pattern is available, which would be expected by the idea that attending to the long pattern represents the default processing mode.

Second, the possible assumption that participants are simply better prepared for a switch to the long pattern because they have relatively more time for preparing such a switch is post hoc. In Experiment 2, we explicitly test the influence of preparation on switch costs by manipulating the preparation interval (i.e., the CSI). To foreshadow the results of Experiment 2, the pattern of preparation effects does not conform to predictions derived from the idea that participants in Experiment 1 were simply better prepared for a switch to the long pattern.

In sum, in Experiment 1 we found asymmetric switch costs of the two auditory attentional foci. This finding suggests the idea that switch costs reflect primarily an attentional shifting process, and that attending to the long pattern represents the default processing mode and therefore shows only little (if any) switch costs. In Experiment 2, we investigated the question if switch costs could be reduced with increased preparation time for the attention switch. The preparatory reduction of switch costs would give further support to the notion of active attention shifting after the cue presentation and before the presentation of the tone sequence, which would further differentiate the results of the present study from carryover effects in terms of sequential-level priming (Justus & List, 2005; List & Justus, 2007).

Experiment 2

The goal of Experiment 2 was to enhance our understanding of the asymmetric switch costs that we found in Experiment 1. We wanted to examine if the switch costs in the short-pattern condition could be reduced if participants had more time to prepare for the attention switch. Because we did not observe switch costs for the long pattern in Experiment 1, we thus predicted a three-way interaction of attentional focus, transition, and CSI. Importantly, this interaction would strongly point to an active process of auditory attentional shifting that takes place before the onset of the tone sequence.

In Experiment 2, the design was similar to the mixed blocks of Experiment 1. We did not use pure blocks anymore, because we did not find any mixing costs in Experiment 1. We used two different CSIs in Experiment 2 (100 ms and 1,000 ms) to increase or reduce the time to prepare for a switch relative to the cuing interval in Experiment 1 (CSI = 500 ms). We used a 1:1 cue-to-task mapping again because the results of Experiment 1 did not suggest any role of visual cue-priming effects, as such cue-priming benefits also should have been observed in the long-pattern condition, for which we did not find any effect of switch vs. repetition of the cue (hence ruling out the presence of general cue repetition priming).