Introduction

Increasing research has shown that the allocation of attention is influenced not only by stimulus salience and task goals (Egeth & Yantis, 1997; Folk, Remington, & Johnston, 1992; Theeuwes, 2010), but also by previous experience (Awh, Belopolsky, & Theeuwes, 2012). A driver on the road will automatically attend to locations, like crosswalks, that have been important in the past, along with bright salient stimuli like a passing ambulance, or task-relevant stimuli like street numbers. Many researchers have posited that these selection history effects reflect an ability to learn statistical regularities within a complex visual environment (Aslin, 2017; Awh et al., 2012; Chelazzi, Bisley, & Bartolomeo, 2018; Jiang, 2018; Theeuwes, 2019; Todd & Manaligod, 2018). Individuals need not be aware of the environmental regularities to use them (Colagiuri & Livesey, 2016; Jiang, Sha, & Sisk, 2018), and previous experience can influence attention even when it conflicts with current task goals (Anderson, 2016; Jiang, Swallow, & Sun, 2014).

Although the existence of selection history effects is well established, more contentious is how they affect visual search. In attentional guidance accounts, previous experience biases attention toward likely target locations, expediting visual processing early during search, before the target is found. Alternatively, in response facilitation accounts, previous experience with a given context facilitates later response-related processes that occur after a target has been found, such as target verification and response selection. The debate on the locus of selection history effects is exemplified in one well-researched paradigm – contextual cueing (Chun & Jiang, 1998; Goujon, Didierjean, & Thorpe, 2015). Despite rich empirical findings and important theoretical debates arising from contextual cueing, only a few reviews of this literature have been published (Chun, 2000; Goujon et al., 2015; Wolfe & Horowitz, 2017). None of them focused primarily on the theoretical debate between attentional guidance and response facilitation that is central to an understanding of contextual cueing and selection history effects more broadly. Here we provide a tutorial review of work on contextual cueing, beginning with an overview of the effect, followed by an examination of the locus of contextual cueing.

Contextual cueing

Contextual cueing refers to the observation that repeated displays in visual search lead to faster response time (RT) and increased accuracy in identifying the search target, even when participants are at or near chance in discriminating repeated from new displays. In the standard paradigm (Chun & Jiang, 1998), participants report the orientation of a target letter T among distractor letter Ls. Unbeknown to participants, some of the search displays are shown multiple times, appearing once in each block of the experiment (Fig. 1A). Because the locations of the distractors and the target are held constant across repetitions, the context of the spatial array provides a cue to the target’s location. The repetition of the spatial context speeds RT on repeated displays, relative to novel displays (Fig. 1B).

Fig. 1
figure 1

Contextual cueing paradigm and typical results. (a) A schematic of the standard spatial contextual cueing paradigm. Participants search for the letter T and indicate whether its tail is pointing to the left or the right. Some displays repeat once per block, with the target appearing in the same location relative to the array of distractors. The target’s orientation varies randomly in each presentation of a repeated display. Other displays are not repeated (novel), but the target will appear in the same locations in novel displays the same number of times across all blocks. (b) Typical finding in response time (RT) from Jiang and Chun (2003). RT is faster in the repeated (old) condition than in the novel (new) condition. The difference emerges after a few blocks (Panel B is from Fig. 1 of “Implicit, long-term spatial contextual memory,” by M. M. Chun and Y. Jiang, 2003, Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(2), p. 226. Copyright by the American Psychological Association)

Contextual cueing is not restricted to situations in which the spatial layout of distractors is the repeated aspect of the context. Repetition of the shapes of the distractors, the semantic category of distractor words, auditory cues, temporal sequences, the motion trajectories of distractors, background color or texture, and background scenes all yield an RT advantage (Brockmole & Henderson, 2006; Chun & Jiang, 1999; Endo & Takeda, 2004; Goujon, Brockmole, & Ehinger, 2012; Goujon, Didierjean, & Marmèche, 2009; Kawahara, 2007; Kunar, Flusberg, & Wolfe, 2006; Makovski, Vázquez, & Jiang, 2008; Olson & Chun, 2001; Summerfield, Lepsien, Gitelman, Mesulam, & Nobre, 2006). These different aspects of repeated contexts may interact. In some cases, learning of repeated spatial layouts overshadows learning of repeated background colors or textures (Kunar, John, & Sweetman, 2014a), while in other cases, contextual cueing from background scenes overshadows associative learning of configurations (Rosenbaum & Jiang, 2013). Contextual cueing is flexible in that repeated displays facilitate search even when each display is associated with two (Chun & Jiang, 1998; Kunar & Wolfe, 2011; but see Zellin, Conci, von Mühlenen, & Müller, 2011) or even four possible target locations (Kunar & Wolfe, 2011), when only a few target-adjacent distractors repeat (Brady & Chun, 2007; Kourkoulou, Leekam, & Findlay, 2012; Olson & Chun, 2002), or when half of the distractors from each of two previously learned contexts are combined (Jiang & Song, 2005; Jiang & Wagner, 2004).

Several lines of research show that contextual cueing is predominantly associative, rather than perceptual, learning. When the spatial layout of distractors is repeated but not paired with a consistent target location, it does not facilitate search (Chun & Jiang, 1998). This finding suggests that although participants can acquire perceptual familiarity with repeated spatial layouts (Beesley, Vadillo, Pearson, & Shanks, 2015; Geyer, Shi, & Müller, 2010a), the utility of these layouts lies primarily in their predictability of the target location (Olson & Jiang, 2004; Olson, Jiang, & Moore, 2005). In fact, context repetition sometimes incurs a “contextual cost” when associations between a repeated layout and a target location are disrupted after contextual cueing has been acquired; participants find the target more slowly in a repeated display than in a novel display if it appears in a location that contained a distractor in previous repetitions of the display (Makovski & Jiang, 2010). Contextual cost may additionally manifest as proactive interference. After participants acquire contextual cueing using one set of displays, they learn associations between the already-learned distractor arrays and new target locations more slowly than they learned the initial associations (Conci, Sun, & Müller, 2011; Manginelli & Pollmann, 2009; Zellin, von Mühlenen, Müller, & Conci, 2014). Proactive interference disappears if entirely new displays must be learned (Jiang, Song, & Rigas, 2005). Furthermore, although some studies find evidence of contextual cueing on target-absent trials (Beesley et al., 2015; Geyer, Shi, & Müller, 2010a), learning is more consistent across studies under target-present conditions (Kunar & Wolfe, 2011; Schankin, Hagemann, & Schubö, 2011; Shen & Jiang, 2006). Thus, RT improves when the context is predictive of the target’s location, but it does not reliably improve when the context is predictive of a target present/absent response, presumably because an association between target location and distractor array cannot be formed.

Consistent with associative learning, search is not more efficient when the target position and identity differ across repetitions of a search scene. In “Repeated Search,” for example, participants search from the same scene multiple times, but each time for a different object (Hout & Goldinger, 2012; Kunar, Flusberg, & Wolfe, 2008b; Oliva, Wolfe, & Arsenio, 2004; Wolfe, Klempen, & Dahlen, 2000). Repeated search does not make search more efficient, in part because a change in the target object prevents participants from learning associations between the context and the search target location.

Not all associations are created equal – some display elements contribute to the contextual cueing effect more than others. This is exemplified in two characteristics of contextual cueing. First, distractors near the target have a greater influence on learning than distractors far from the target, resulting in a local spatial constraint on what is learned (Brady & Chun, 2007). Second, attention modulates context learning. Selectively attending to a subset of the repeated context produces stronger associations between the target and attended elements than between the target and unattended elements (Jiang & Chun, 2001). These two characteristics may be related – the local spatial constraint may result from a disproportionate level of attention to locations near the target.

This spatial modulation of contextual cueing receives support from studies that repeat only part of the display. Repetition of the distractor locations in the same half of the display as the target produces contextual cueing, whereas repetition of distractors in the half not containing the target is less effective (Olson & Chun, 2002). Brady and Chun (2007) developed a two-layer connectionist network model to account for this local spatial modulation. Their model divides the screen into a large matrix of cells, with each cell connected to every other cell by weighted links. The input layer consists of cells that are filled by display elements, and the output layer represents the likelihood of a target appearing at each input location. This likelihood is calculated by propagating activation over the links. Each element of the output layer sums the weighted activity over its links to input elements to determine the likelihood that a given screen location contains a target. The model “searches” for the target by examining each element in the output layer in the order of activation until the target is located. Upon detecting a target, the network increases weights on links connecting the filled locations in the input layer to the target location. Repeating a display repeats both the filled input and target locations, strengthening their association, making it more likely the model will search the target earlier, and thus speeding search.

In addition to dynamic weights determined by the relationship between elements in the input layer and target location, the model incorporates static weights based on distance, weighting links to elements near the target more so than those further away. The local focus is modeled as a Gaussian kernel with a peak surrounding the target. Distractors in the center of this kernel induce more learning than those on the edge or outside (Fig. 2A). The inclusion of the local spatial constraint is critical to successfully modeling the spatial modulation of contextual cueing found by Olson and Chun (2002). The model also accounts for the transfer of contextual cueing to new contexts created by the recombination of two learned contexts (Jiang & Song, 2005; Jiang & Wagner, 2004). In addition to its congruence with previous data, the model accurately predicted some radical subsequent findings. For instance, it predicted that if screen locations were divided into quadrants, repetition of only the distractor locations within the target’s quadrant would induce as much contextual cueing as repetition of the entire array. Three experiments confirmed this prediction (Brady & Chun, 2007; Fig. 2B). These local spatial constraints may stem from the sensitivity of context learning to the spatial gradient of attention. In fact, when attentional allocation is more global, as when natural scenes serve as the cue, the local constraint is less apparent (Brockmole, Castelhano, & Henderson, 2006; Brooks, Rasmussen, & Hollingworth, 2010; Castelhano, Fernandes, & Theriault, 2018). Not only does prior experience affect visual search, but the gradient of attention also modulates context learning.

Fig. 2
figure 2

(a) Illustration of the local constraint as the Brady and Chun (2007) model described. Distractors near the target influence search more strongly than distractors farther from the target. (b) A figure from the third experiment of Brady and Chun (2007) that demonstrates that repetition of the entire display and repetition of only locations in the target’s quadrant reduce response time (RT) equally after training. In this experiment, entire displays repeated during a training phase. Then, in a testing phase, some of these already-learned displays continued to repeat in their entirety (“Glob-pred”), while others were shown with only the distractors in the target’s quadrant remaining unchanged from the learned displays (“Quad-pred”). In the “New-QPred” condition, distractors in the target quadrant of displays that were not shown in training repeated during the testing phase, providing a baseline that took into account learning during the testing phase in the Quad-pred condition (from Figs. 2 and 15 of “Spatial constraints on learning in visual search: Modeling contextual cueing,” by T. F. Brady and M. M. Chun, 2007, Journal of Experimental Psychology: Human Perception and Performance, 33(4), pp. 800-807. Copyright by the American Psychological Association)

More direct evidence that attention affects contextual cueing comes from studies that require participants to attend to a subset of the repeated context. Jiang and Chun (2001) modulated the degree of selective attention allocated to different distractors by presenting half of the search items in the same color as the target and the other half in a different color. The target color was held constant throughout the experiment, so participants could restrict search to one color-defined set of items. Thus, each display contained an attended context, represented by the distractors in the target’s color, and an ignored context, represented by the distractors in the other color. Participants showed significant contextual cueing when only the attended context repeated, but not when only the unattended context repeated. A subsequent study replicated and extended this finding, showing that the unattended context had less of an effect on search than the attended context (Jiang & Leung, 2005). This study also provided some evidence that both attended and unattended contexts may be learned, but only the currently-attended context affects search (see also Goujon et al., 2009). Spatial working memory load influences contextual cueing in a similar manner (Annac et al., 2013; Annac, Zang, Müller, & Geyer, 2018; Manginelli, Geringswald, & Pollmann, 2012; Rausei, Makovski, & Jiang, 2007; Travis, Mattingley, & Dux, 2013; Vickery, Sussman, & Jiang, 2010), though contextual cueing is unaffected by non-spatial working memory load (Manginelli, Langer, Klose, & Pollmann, 2013).

These studies and other related findings demonstrate that contextual cueing is robust to many manipulations of stimulus and presentation conditions. However, it is also subject to constraints, including local spatial constraints and selective attention constraints. As discussed below, these constraints play a significant role in evaluating evidence as to whether the locus of the mechanisms underlying contextual cueing lies early, during search, or late after the target has been found but before a response is made. We turn next to this question.

The locus of contextual cueing

Theoretical distinction

Context effects are prevalent, and their underlying mechanisms are complex. Objects are more readily recognized when presented in a consistent rather than an inconsistent context (Biederman, 1972; Biederman, Mezzanotte, & Rabinowitz, 1982). These findings reflect enhanced early object recognition in a familiar context (Davenport & Potter, 2004). However, context may also influence later response processes. When unsure about what they have seen, people are biased toward naming an object that is consistent with the scene (Brewer & Treyens, 1981; Hollingworth & Henderson, 1998).

The distinction between early and late effects of context has also engendered debate in visual search. Conjunction search tasks, such as finding a T among Ls, are comprised of several stages. Two early stages include the initial preattentional-processing stage, in which participants process coarse perceptual characteristics of the display, and the search stage, in which participants scan the display, shifting attention from item to item until the target is detected. Late stages include the response-selection stage, in which the participant maps the relevant target property to a pre-assigned response, and a motor stage, in which the participant makes a motor response to indicate the answer. Just as context can affect both early object recognition and late response selection, repeated search context could facilitate both early and late search processes.

The leading theoretical accounts of contextual cueing, attentional guidance and response facilitation, differ fundamentally in their emphasis on the early or late stages of visual search (Fig. 3). Early-locus accounts, such as the attentional guidance account, hold that memory of the context-target association in repeated contexts expedites search before the target is detected. Late-locus accounts, such as the response facilitation account, suggest that context repetition speeds the late response-selection stage that occurs after the target is found. These accounts are not mutually exclusive, as benefits of context repetition may be seen both before and after finding the target. Similarly, the dominant examples of the early- and late-locus accounts – guidance and response facilitation, respectively – are not contradictory. For instance, memory for target-context associations could first lead those contexts to cue attention to the target location then lower the response threshold once the target is found.

Fig. 3
figure 3

Visual schematic of the distinction between theories that place the locus of contextual cueing early, before the target is found, and those that emphasize a late locus, between the moment the target is found and the moment a motor response is recorded. The attentional guidance account is the dominant account among those that place the locus of contextual cueing early, while the response facilitation account is the dominant account among those that posit a late locus

To assess whether contextual cueing has an early or late locus or both, many behavioral studies have measured the slope of the linear function relating RT to the number of search items to assess the efficiency of search leading up to target detection ("search slope"; Wolfe, 1998). Others rely on event-related potentials (ERPs; Luck, 2014) to identify the time course of contextual cueing using well-established ERP components. Still others segment RT into early and late stages based on eye movements. The following review groups studies based on the methodology used.

Behavioral evidence

The majority of contextual cueing research focuses on RT measures. Some studies use search slope as a proxy for attentional guidance. Others assess early effects by comparing feature search with conjunction search. Still others explore the interaction between contextual cueing and late response-related effects.

Search slope

Search slope measures the change in RT as a function of the number of distractor items to search through, providing an index of search efficiency. If the impact of display repetition is to facilitate early-locus processes, such as attentional guidance or perceptual discrimination, then reductions in slope should be observed. In contrast, late-locus accounts predict that the intercept of the search slope should be smaller for repeated than for novel displays, with no change in slope, as search efficiency would not be affected.

Across several contextual cueing studies, search slope effects have been largely inconsistent. The initial contextual cueing study by Chun and Jiang (1998) observed reduced search slope and no reduction in intercept in the repeated condition relative to the novel condition when testing across three set sizes: 8, 12, and 16. Kunar et al. (2007) also found larger contextual cueing at larger set sizes when comparing extreme set size differences of 12 and 1, though search with a set size of one may be mechanistically different than search with larger set sizes. Shallower search slope in the repeated condition was also observed when the search items were presented against a textured background or when there was increased time for attentional guidance (Kunar, Flusberg, & Wolfe, 2008a). In contrast, other studies have reported no difference in search slope between repeated and novel displays (Makovski & Jiang, 2010; Rausei et al., 2007), including one that observed surprisingly similar magnitudes of contextual cueing between set sizes of 8 and 12 (Fig. 4; Kunar et al., 2007). Although the slope of the RT × set size function was significantly shallower for repeated than novel displays in that particular experiment of Kunar et al. (2007), only one of nine additional experiments in their study found a significant slope difference. Furthermore, Hodsoll and Humphreys (2005) found that contextual cueing was less robust at a set size of 20 than of 10.

Fig. 4
figure 4

Search slope for predictive (repeated) and random (novel) displays across each epoch of Kunar et al. (2007)’s Experiment 1 (From Fig. 3 of “Does contextual cueing guide the deployment of attention?” by M. A. Kunar, S. Flusberg, T. S. Horowitz, and J. M. Wolfe, 2007, Journal of Experimental Psychology: Human Perception and Performance, 33(4), p. 816. Copyright by the American Psychological Association)

Large sample sizes do not adequately resolve inconsistencies in search slope findings, suggesting that low power cannot account for the lack of a consistent effect. Using a similar design and display setup to Kunar et al. (2007) but a greater sample size (N=38, rather than 12), Zhao et al. (2012) found significantly larger contextual cueing at set size 12 than at set size 8. However, when Kunar et al. (2007) pooled data across 118 participants, no search slope effect emerged. The inconsistent findings in the search slope measure, despite high-powered analyses, contrast with consistent search efficiency benefits in feature search tasks (Wolfe, 1998). This suggests that context repetition may not influence the early stages of visual search.

However, the local spatial constraints of contextual cueing complicates a straightforward interpretation of these results. That is, the lack of a consistent search slope effect may reflect the error in assuming that guidance operates equally across the entire distractor ensemble throughout the entire search trial. As reviewed above, contextual cueing is primarily driven by associations between the target and adjacent distractors. If only the three or four target-adjacent distractors induce contextual cueing, the number of distractors that induce attentional guidance remains the same even as more distractors are added. Thus, the inconsistent slope difference is fully compatible with Brady and Chun (2007)’s model of attentional guidance. Therefore, search slope analyses are not diagnostic of the theoretical debate between early- and late-locus accounts.

Feature search

If late but not early factors underlie contextual cueing, we should observe the effect even in search tasks that already benefit from highly efficient attentional guidance during early stages. Kunar et al. (2008) did find a small but significant contextual cueing effect when the search display contained a single target presented among repeated or random placeholders or when the target was a unique color among distractors (Kunar et al., 2007). Subsequent work has similarly observed a small contextual cueing effect in feature search tasks (Geyer, Zehetleitner, & Müller, 2010b; Harris & Remington, 2017; Schankin & Schubö, 2010) and when the target’s location is validly precued (Harris & Remington, 2017). At first glance, these findings may be taken as evidence of a late locus of contextual cueing.

However, the contextual cueing effect observed in feature search and in search following a valid precue is small compared with the effect observed in the typical conjunction search task. In fact, the small magnitude relative to conjunction search tasks could be taken as evidence of an early locus. Feature and conjunction search differ in terms of early search efficiency but not late response-related processing, so the large difference in contextual cueing effect size between the two likely reflects interaction between contextual cueing and early processes. Because guidance in feature search is only “near” optimal, early-locus accounts would not predict a null contextual cueing effect in feature search, but rather one that is relatively small due to the small contribution of additional guidance (Harris & Remington, 2017). Consistent with this idea, Geyer et al. (2010), who observed contextual cueing in a feature search task, also provided evidence that repeated context expedites the attentional selection of pop-out targets. When feature search trials are presented briefly to obtain a measure of detection sensitivity, target detection sensitivity is higher on repeated than novel trials. This higher sensitivity for repeated trials would be predicted if context learning guided attention in a pop-out task. Thus, data from feature search provide some evidence both for and against early-locus accounts and are therefore insufficient to resolve the theoretical debate regarding the locus of contextual cueing.

Congruency effects

More direct evidence for late-locus accounts comes from Experiment 3 of Kunar et al. (2007), which found an interaction between response congruency effects and contextual cueing. Participants had to find a singleton color target and identify it as an A or an R. The distractors were all in a different color than the target but could have the same identity and response code as the target (e.g., a red A target among green As) or a different identity and response code (e.g., a red A target among green Rs). Congruency effects were evidenced by faster RTs when the target and distractors shared the same response code (congruent) compared to a conflicting response code (incongruent). Because congruency is presumed to influence response-related processes, late-locus accounts would predict that contextual cueing should interact with these congruency effects. Indeed, Kunar et al. (2007) only observed significant contextual cueing when the distractors were congruent with the target, suggesting that contextual cueing is modulated by processes that occur late, such as response preparation.

This observation remains the strongest piece of behavioral evidence for the response facilitation account. Still one can question aspects of the design and results. The congruency effect was confounded with perceptual factors: congruent displays contained distractors that had the same shape as the target, as well as the same response code, potentially aiding target localization. Furthermore, although contextual cueing was significant on congruent and not incongruent trials, the crucial interaction between context repetition and congruency was not reported. Given that the experiment involved feature search, and given that contextual cueing was small even on congruent trials, it is difficult to interpret the lack of contextual cueing on incongruent trials in the absence of an interaction, so more data are needed.

Summary of behavioral evidence

The lack of a consistent difference in search slope between repeated and novel displays, combined with a small but significant benefit of display repetition in feature search tasks, has been viewed as problematic for accounts that place the locus of contextual cueing before target detection. However, neither of these behavioral measures is diagnostic of the true locus of the effect. Although the presence of contextual cueing when the singleton target appears among congruent but not incongruent distractors (Kunar et al., 2007) supports the late-locus accounts, the absence of an interaction between congruency and contextual cueing weakens the conclusiveness of this finding, making further replications particularly informative.

The dominant late-locus accounts also have difficulty explaining why, when a repeated context is consistently associated with a response, contextual cueing is not always observed. For example, displays consistently associated with a target-absent response are poorly learned (Kunar & Wolfe, 2011; Schankin et al., 2011; Shen & Jiang, 2006). If context repetition facilitates response mechanisms, the target present/absent response paradigm should have produced large effects. As a whole, behavioral studies have not conclusively localized the mechanisms underlying contextual cueing.

Event-related potential

With its high temporal resolution, event-related potential (ERP) is useful for understanding the time course of cognitive processes (Luck, 2014) and can provide insights into whether the locus of contextual cueing is early or late. In addition, data link several components of ERP to specific aspects of cognitive processing. For example, the N2pc is a negative-going component of the EEG recorded in the posterior central electrode sites contralateral to the attended region of space, occurring about 200ms after stimulus onset. Researchers use this early component, and the related N210 component, as indices of attentional selection of the target. Accounts that place the locus of contextual cueing early would predict differences in these early-going components between repeated and novel displays. Late-locus accounts, on the other hand, would predict that differences appear in later components, such as the lateral readiness potential (LRP), which indexes response selection and motor preparation. Signal averaging for the LRP can be time-locked to either the search display onset (s-LRP) or response onset (LRP-r). Earlier onset of s-LRP would indicate faster response selection, earlier onset of LRP-r would indicate faster response preparation and motor execution, and both are recorded late in search.

The first contextual cueing study to use ERPs took advantage of a rare opportunity by obtaining intracranial data from patients undergoing brain surgery during a pre-surgical phase (Olson, Chun, & Allison, 2001). Unlike scalp EEG, the electrode sites in intracranial EEG are unambiguous and provide excellent spatial and temporal resolution. The 18 patients had electrodes in various brain areas, including visual areas (V1, V2, extrastriate areas), the medial temporal lobe, and the frontal cortex. Participants completed a standard contextual cueing task.

Participants in Olson et al. (2001) showed large contextual cueing in RT. In addition, electrodes in the visual areas showed a significantly larger N210 in the repeated condition than in the novel condition. The timing of this difference – around 200 ms after stimulus onset – suggests that contextual cueing is modulated by attention via feedback effects, rather than the initial feedforward processing in these brain regions. Although 200 ms is considered “late” relative to the feedforward visual processing, it happens much sooner than the behavioral response. The timing, combined with electrode activity in visual areas, suggests that contextual cueing begins to influence processing during early stages of search, long before participants have begun preparing a response.

Scalp EEG recordings also point to early differences between repeated and novel displays (Johnson, Woodman, Braun, & Luck, 2007). Behaviorally, Johnson et al. (2007) observed both faster average RT and a greater proportion of fast RTs in the repeated than the novel condition (Fig. 5A). Crucially, a significantly larger N2pc in the repeated condition, compared to the novel condition, accompanied these behavioral findings (Fig. 5B). The difference in N2pc amplitude between repeated and novel displays emerged approximately 175 ms after display presentation. This is similar to the time course of the N2pc component in pop-out search (Luck & Hillyard, 1994). Thus, ERP correlates of contextual cueing occur early and resemble other instances of efficient guidance of attention.

Fig. 5
figure 5

(a) From Johnson et al. (2007). Illustration of the vincentized cumulative reaction-time distributions for repeated and novel displays. (b) N2pc component plotted for repeated and novel conditions in Johnson et al. (2007). The N2pc component is isolated by plotting the difference between contralateral and ipsilateral waveforms (From Figs. 2B and 3B of “Implicit memory influences the allocation of attention in visual cortex” by J. S. Johnson, G. F. Woodman, E. Braun, and S. J. Luck, 2007, Psychonomic Bulletin & Review, 14(5), pp. 837-838. Copyright by the Springer Publishing Company)

Contextual cueing induces even earlier differences in visual processing, measured in occipital sensors, when the study design involves more precise experimental controls. One magnetoencephalography (MEG) study controlled for configuration repetitions by presenting only repeated configurations. Half of them maintained consistent associations between distractor array and target location, whereas the other half had variable target locations (Chaumon, Drouet, & Tallon-Baudry, 2008). The design used the same 12 target locations in predictive and non-predictive displays, controlling for location repetition effects as well. Participants were faster finding the target in predictive displays than in non-predictive displays, and in MEG, significant effects of display predictability appeared in occipital sensors about 50–100 ms after display onset (Chaumon et al., 2008). This study shows that when a more stringent experimental design is used, MEG correlates of contextual cueing can emerge as early as 50–100 ms into search. This time course could either reflect pre-attentional benefits of context repetition or early attentional guidance, but either way, it suggests an early locus.

That display repetition affects early components of ERP does not preclude the possibility that a late locus for contextual cueing also exists. Schankin and Schubö (2009) analyzed two EEG components, the N2pc and the LRP, to concurrently assess evidence for early and late effects. LRP data are of particular significance. According to Schankin and Schubo (2009, page 670), “If response selection processes contribute to the contextual cueing effect, we expect an earlier onset of the s-LRP for responses to targets presented in a repeated context than for those presented in a novel context…If late motor processes are facilitated by the repeated context, the interval extending from LRP-r onset to the response onset should be shortened.”

Schankin and Schubö (2009) made two modifications to the stimuli and presentation to obtain clean LRP data. First, because variations in the display-offset time can contaminate the LRP, the search array terminated after 700 ms regardless of whether a response had been made. This is too brief for a typical T-among-L search task where RTs usually range from 1,000 to 2,000 ms. The reduced available search time necessitated a second modification: a reduction in task difficulty. To reduce search difficulty, Schankin and Schubö made distractors highly dissimilar from the target. These two changes directly resulted in rapid search speed and a small contextual cueing effect. Average RT in repeated displays was only 20–30 ms faster than RT in novel displays across three articles that used this modified contextual cueing paradigm (Schankin et al., 2011; Schankin & Schubö, 2009, 2010).

Across the entire sample of 14 participants in Schankin and Schubö (2009), neither N2pc nor s-LRP or LRP-r differed significantly between the repeated and novel conditions. Thus, the study did not find direct ERP evidence for either early or late facilitation. In a further analysis, Schankin and Schubö (2009) included just the 11 participants who showed a contextual cueing effect larger than 25 ms. In these participants, the size of contextual cueing, indexed by the RT difference between novel and repeated displays, correlated significantly with the amplitude of N2pc and the latency of LRP-r, but not s-LRP. These correlation findings were taken as support for both early- and late-locus accounts.

However, some caveats accompany these findings. Overall, EEG data appear to be underpowered in this modified paradigm, as one might predict given the weak behavioral effect. Although a sample size of 14 is standard for contextual cueing studies, scalp ERP data are noisier and require larger sample sizes (Johnson et al., 2007; Luck, 2014). Furthermore, the correlations were only significant when based on a subset of 11 participants. Such an individual differences approach typically requires large sample sizes to produce reliable effects. The reduced sample size of 11 is questionable for this approach. In addition, the study provided no measure of internal consistency – correlations of this type depend on the reliability of the behavioral contextual cueing effect and the ERP measurement. In fact, when tested multiple times, participants show little internal consistency in the behavioral contextual cueing effect – a participant who produces a large contextual cueing effect in one session may not produce a large contextual cueing effect in another session (Jiang et al., 2005). Thus, individual variability in contextual cueing measured in a single session may not necessarily reflect stable experimental effects. Because of this, dividing participants based on the size of contextual cueing may amplify existing noise in the data.

Subsequent studies provide conflicting results, weakening the strength of the conclusions of Schankin and Schubö (2009). In a follow-up study, Schankin and Schubö (2010) had participants search the display for 700 ms per trial for a target that was perceptually distinct from distractors. The main methodological difference from Shankin and Schubö (2009) was that in the 2010 study, the target could only appear in one of four locations. Schankin and Schubö (2010) found significantly larger N2pc in repeated, compared to novel, contexts. This early effect had not been observed in the 2009 study. As in the 2009 study, LRP-r did not differ between repeated and novel contexts, though unlike the 2009 study, there was a marginally significant s-LRP effect. There was no report of a correlation analysis between the behavioral effect and the ERP components. Thus, the ERP data across the two studies show no consistent pattern in the early N2pc or the late s-LRP and LRP-r.

Both studies did, however, find a larger P3 magnitude for repeated displays than novel displays (Schankin & Schubö, 2009, 2010). The P3 component occurs later than the N2pc, but prior to the LRP, so it may be considered a post-attentional, pre-response effect that occurs relatively early, approximately 300 ms after display onset. Unlike the N2pc or the LRP, however, the P3 component is difficult to isolate and is not closely linked to a particular cognitive process. According to Luck (2014), “We know a great deal about the effects of various manipulations on P3 amplitude and latency, but there is no clear consensus about what neural or cognitive processes the P3 wave reflects,” (Luck, 2014, pages 42-43). This makes it difficult to attribute the P3 finding to either early attentional guidance or late response facilitation.

In a third study using the same stimuli as Schankin and Schubö (2009, 2010) and a target present/absent task, Schankin and colleagues did not replicate their previously reported effects (Schankin et al., 2011); on target-present trials, N2pc did not differ between the repeated and novel displays and did not correlate with individual differences in the behavioral contextual cueing effect. The study did not directly measure s-LRP, LRP-r, or P3. There was a difference between repeated and novel displays in late positivity from 500 to 600 ms, but only in the half of participants who showed a large contextual cueing effect (using a median split analysis).

In summary, the early intracranial EEG contextual cueing study and subsequent EEG and MEG studies provide converging support for early-locus accounts (Chaumon et al., 2008; Johnson et al., 2007; Olson et al., 2001; Schankin & Schubö, 2009, 2010). They show differences between repeated and novel displays associated with components of the ERP that index visual and attentional processes occurring before the target is found. However, these studies all reported different early components (e.g., N210, N2pc, early 50- to 100-ms effects). Though all suggest that repeated and novel displays are processed differently early in search, the strength of this conclusion moving forward will depend on replication of these findings. They must face the same direct replication test as the LRP-r finding from Schankin and Schubö (2009) that was not found in subsequent studies. It will also be important to identify a sound theoretical explanation for why early ERP components are larger, rather than smaller, in the repeated condition than the novel condition. Johnson et al. (2007) provided a possible explanation that explicitly links N2pc effects to attention shifts associated with first saccades to the target, which occur more frequently in repeated trials. Effects of context repetition on the later components s-LRP and LRP-r have also been found but have proven inconsistent across the studies, making it difficult to draw firm conclusions about the validity of late-locus accounts.

Eye tracking

In the T-among-L conjunction search typical of contextual cueing, participants make a series of fixations to examine potential targets. A straightforward prediction from early-locus accounts is that repeated trials should require fewer fixations to find the target, whereas late-locus accounts predict a shorter time between the moment when the target is fixated and the moment a response is made.

Consistent with early-locus accounts, Peterson and Kramer (2001) observed significantly fewer fixations on repeated displays than novel displays. In addition, the proportion of trials in which the first saccadic eye movement landed directly on the target was higher in the repeated condition (11.3%) than in the novel condition (7.1%). Neither of these values was high, but this is consistent with the nature of search in a task that requires an average of five to seven fixations to find the target. In addition, due to the local constraints of contextual cueing (Brady & Chun, 2007), even if context repetition benefits search before the target is found, it need not always do so immediately upon display presentation – the effective context may not be recognized until well into search (Jiang, Sigstad, & Swallow, 2013a; Peterson & Kramer, 2001). When Peterson and Kramer (2001) excluded trials in which the eyes immediately landed on the target, the repeated condition was still associated with fewer fixations than the novel condition, suggesting that contextual cueing was not restricted to trials in which guidance occurred immediately. These findings are consistent with the view that contextual cueing occurs relatively early, during search and before the target is found. However, these studies did not clearly divide eye-tracking data into fixations before and after finding the target.

The reduction in the number of saccades in repeated displays has also been observed in subsequent eye-tracking studies, some of which did divide the data into early and late search stages (Geringswald, Baumgartner, & Pollmann, 2012; Harris & Remington, 2017; Manginelli & Pollmann, 2009; Solman & Smilek, 2010; Tseng & Li, 2004; Zhao et al., 2012). Zhao et al. (2012) divided the duration of the search trial into three segments: an initial phase from the onset of the display to the first saccade, a middle phase from the first saccade to the “last” fixation, and a late phase from the “last” fixation to the response. Zhao et al. (2012) considered the middle phase – between the first saccade and the “last” saccade – as the early stage when repeated context may guide search. They classified the final phase – from the “last” fixation to the response – as the late post-search response phase. The last eye fixation was defined as “one of the last two fixations that was spatially closer to the target,” meaning that it could be the second-to-last fixation, and it could be a fixation that did not land on the target. This definition was used because sometimes participants made another fixation after fixating on the target.

The duration of the search phase was significantly shorter and the number of saccades significantly smaller in the repeated condition than in the novel condition. The scan path of the eye was more direct (toward the target) in the repeated condition than in the novel condition as well. The late phase, from the “last” fixation to response, was also significantly shorter in the repeated than the novel condition. This is consistent with contextual cueing having affected both early and late processes. However, the ambiguity in the definition of the “last” fixation inherent to all eye-tracking studies of this sort imposes a major caveat. By including the second-to-last fixation and fixations that were not on the target, it is possible that participants in Zhao et al. (2012) continued to search after the “last” fixation. As a result, the late phase may have included both search and response-related processes.

To more definitively isolate search from post-search stages, Harris and Remington (2017) defined the final response and decision stage as the “duration between when the eyes first land on a target, and when that target is responded to.” Furthermore, in one experiment, Harris and Remington (2017) used a fixation-contingent method, in which they displayed placeholders in place of search objects (T and Ls), unmasking the letter only upon fixation. The fixation-dependent display avoided peripheral processing of the target letter and isolated the duration of the late stage as the time from fixation on the target to response.

In Harris and Remington (2017), participants first completed four epochs of the standard T-among-L search task. As is typical of similar designs, RT was on the order of 2,000 ms. A significant contextual cueing effect emerged (Fig. Fig. 6a), accompanied by a significantly smaller number of fixations on repeated than novel displays (Fig. 6b). The duration of the response phase did not differ between the two conditions. A reduction in the number of fixations always accompanied contextual cueing effects. In fact, the size of contextual cueing in RT correlated strongly with a reduction in the number of fixations, with Pearson’s R values around .90. A reduction in the duration of the response phase never accompanied contextual cueing (Fig. 6c). Two additional experiments extended these results. In no cases did the time to respond differ between the repeated and novel displays. These findings directly contradict those of Zhao et al. (2012). This contradiction likely results from the difference in the definition of the end of the search phase and the beginning of the response phase. Though neither definition is clearly superior, late-locus accounts have difficulty explaining the fixation-contingent experiment of Harris and Remington (2017), which controlled for peripheral processing, sharpening the boundary between search and response stages. Overall, the eye-movement literature, like the EEG literature, shows a consistent association between the contextual cueing RT effect and the time before fixation on the target, providing compelling evidence that contextual cueing begins to affect search early, during search and before target detection.

Fig. 6
figure 6

Results from Experiment 1 of Harris and Remington (2017). (a) Reaction time (RT) for repeated and novel displays across four uncued epochs and four subsequent cued epochs. Validly-cued repeated and invalidly-cued RT overlap for novel trials. (b) Number of fixations across four uncued and four cued epochs. (c) RT time duration between fixation on the target and keyboard response, plotted separately for repeated and novel displays. The repeated and novel conditions overlap in all epochs and all cueing conditions. Inset in C are the same data with a smaller scale. All error bars are within-participant SEM (From Fig. 2 of “Contextual cueing improves attentional guidance, even when guidance is supposedly optimal” by A. M. Harris and R. W. Remington, 2017, Journal of Experimental Psychology: Human Perception and Performance, 43(5), p. 816. Copyright by the American Psychological Association)

General discussion

Summary

We have considered evidence that repeated context may facilitate search early, late, or both. Table 1 provides a summary of results.

Table 1 Summary of evidence for contextual cueing having an early or late locus

We have reviewed a large number of studies using behavioral, electrophysiological, and eye-movement measures, all of which provide strong evidence that repeated search context affects processing at an early time point, prior to that associated with response selection or motor preparation. Specifically, EEG studies find effects of contextual cueing on the N210 and N2pc components; eye-tracking studies find fewer pre-target fixations for repeated compared to novel displays; and behavioral studies find a smaller magnitude of contextual cueing effects in feature search, consistent with an early locus. Although search slope is not reliably shallower in repeated than novel displays (Kunar et al., 2007), this may reflect the local spatial constraints of contextual cueing (Brady & Chun, 2007). Both EEG and eye-tracking data warrant further replication, but current findings support early-locus accounts and contradict alternative views that place the locus of contextual cueing exclusively at late stages, after the target is found.

Is there, however, also evidence that, in addition to an early locus, repeated context may facilitate search late, after the target has been found? Evidence for the late effect is mixed. The lack of a consistent search slope effect, the presence of small but significant contextual cueing in feature search, as well as possible interactions between response congruency and contextual cueing, all point to an additional, late locus of contextual cueing. Yet alternative explanations for these behavioral findings exist. EEG did not record consistent effects of contextual cueing on the late lateral readiness potential, so further replication is necessary. Some eye-tracking data show fewer fixations after the end of search, but these findings conflict with studies that more cleanly separated search from response preparation.

Together, these empirical findings show that contextual cueing reflects mechanisms that influence processes early in search, but underlying mechanisms may also include contributions from later, response-related operations. The evidence for late-locus accounts is not yet conclusive. Moving forward, ERP studies that explore the lateral readiness potential with high-powered study designs and sufficient sample sizes will be highly informative. Should researchers find reliable LRP effects, this would provide a strong indication of response facilitation. In addition, future behavioral data showing a significant interaction between context repetition and congruency will be informative of the role of response processes in contextual cueing.

Modeling the locus of contextual cueing

Some researchers have taken a step toward modeling contextual cueing in a way that assesses the relative influences of early attentional guidance and late response selection. In fact, studies using Ratcliff and McKoon (2008)’s diffusion model to fit RT distributions for repeated and novel displays have recently challenged early-locus accounts (Sewell, Colagiuri, & Livesey, 2018; Weigard & Huang-Pollock, 2014). These studies have identified the best fitting models as those that assume that repeated and novel displays differ mainly in response threshold, not facilitated search. However, these models treat the entire search process as the “non-decision” component within the diffusion model, assuming that the shape of the search-time distribution reflects only the processes that occur after the participant finds the target. This is a questionable assumption and likely an incorrect application of the diffusion model to a highly inefficient configuration search task. Diffusion models gain power by analyzing the entire RT distribution, making the shape of the distribution – especially the tails – important. Sewell et al. (2018)’s application of this model assumes that if attention were more efficient in repeated displays, search times in repeated and novel displays would differ only in mean, not shape. This application is inconsistent with other analyses that suggest that the Ratcliff and McKoon (2008) diffusion model is most appropriate for brief RTs in decision-making tasks, not extended tasks such as visual search that are comprised of multiple decisions.

In conjunction search, the entire RT distribution includes not just the processing of the target, but also the series of previous fixations and rejections of non-targets. Each of the non-target decisions will have its own distribution. Thus, the entire RT distribution is a convolution of each of the preceding non-target distributions. Indeed, convolutions of either gamma or ex-Gaussian functions fit RT distributions well (Wolfe, Palmer, & Horowitz, 2010). Given that repeated displays have fewer fixations (Harris & Remington, 2017; Peterson & Kramer, 2001), the extra fixations in the novel condition represent added terms in the convolution that will differentiate the shapes of the repeated and novel search distributions and propagate this difference to the overall search RT. Moreover, Ratcliff and McKoon (2008) emphasize that diffusion modeling is valid only if the pre-decision processes are relatively short, perhaps no more than 30% of the entire RT. Configuration search tasks do not satisfy this assumption, as search for the target comprises most of RT (Ratcliff, personal communication, February, 2019).

For this reason, RT in a T-among-L search task cannot be modeled by assuming that decision occurs only at the end of a long (“pre-decision”) search process. By extension, we should not assume that either the attentional guidance or response facilitation account predicts that the distributions of repeated search times are simply shifted without changes in shape. Thus, existing diffusion modeling studies have yet to fully differentiate attentional guidance from post-search response processes. A better approach would be to model the decisions to reject non-targets, in addition to target response selection. It may be that the quality of information provided by non-target fixations supports improved guidance for repeated displays.

Return to theories of contextual cueing

So far we have considered the locus of contextual cueing, without detailing how context exerts early or late effects. Here we primarily consider two implementations of these effects: attentional guidance and response facilitation. We trace their roots and examine potential alternatives.

Traditional definitions of early- and late-locus accounts

Before delving into the definition of attentional guidance, we must first address another, seldom considered early-locus account of contextual cueing: the perceptual learning account. This account holds that the consistent configuration of the distractor array allows participants to capture more perceptual information about the target with each repetition. This account is supported in part by the very early MEG effects (50–100 ms) observed in Chaumon et al. (2008). However, it cannot account for the entire contextual cueing effect, as it would predict contextual cueing effects under conditions in which the target position changes across repetitions of the distractor array, yet empirical data contradicted this prediction (see section on Contextual cueing).

A more powerful explanation of search behavior comes from a more prominent early-learning account, the attentional guidance account. In the original conceptualization of this account, repeated encounters with a target-predictive context lead to formation of a “context map” that specifies the probability of finding a target in various locations within the context. Just as the “saliency map” guides attention to locations with high saliency, the context map guides attention to locations that are likely to contain the target, given the context (Chun & Jiang, 1998). This context map version has been successfully implemented in the connectionist model of Brady and Chun (2007), which describes how context learning changes the priority of different locations within a context, where the highest priority location will be searched first.

By invoking the concept of a “context map” that affects attentional priority, the original attentional guidance account implies strong similarity between context-guided search and saliency- or goal-guided search. Even though learned context is not the same source of attentional control as are physical salience or task goals, it may exert similar effects on attention. Awh et al. (2012) further developed the concept of an “integrated priority map,” which combines effects of selection history, physical salience, and current goals.

A prominent late-locus account, the response facilitation account, originates from a dissatisfaction over the implied similarity between context-guided search and saliency- or goal-guided search. Stemming from the lack of a consistent search slope effect, Kunar et al. (2007) and subsequent researchers questioned whether contextual cueing can be considered as a form of attentional guidance. In this light, the response facilitation account can be considered to offer two related but independent propositions. The first proposition is that the RT effects observed in contextual cueing are driven by faster decision-making processes when identifying a target’s identity and preparing a response in a familiar context. This review suggests that this proposition, as the dominant late-locus account, does not yet have conclusive empirical support, though this area of research is gaining momentum and may find conclusive support in the future. The second proposition of the response facilitation account is that effects of implicit learning on attention are distinct from attentional guidance by salience or task goals. Though less frequently discussed in the literature on contextual cueing, this proposition is of crucial theoretical significance.

Kunar et al. (2007) suggests that guidance of attention by task goals and perceptual salience differs qualitatively from guidance by selection history. As such, Kunar et al. (2007) presents the first serious challenge to the mainstream view about selection history effects. By suggesting that context learning does not affect selection in the same way as other forms of attentional guidance, proponents of the response facilitation account raise the possibility that there is not an integrated priority map. In our view, this is the most important theoretical contribution of the response facilitation account.

Goal-driven versus experience-guided attention

Strong evidence for a dissociation between selection history effects and goal-driven attention comes from a related, but distinct paradigm involving experience-based RT facilitation – location probability learning (Geng & Behrmann, 2002; Jiang, Swallow, Rosenbaum, & Herzig, 2013b; Miller, 1988). This paradigm is relevant because it exemplifies the mechanistic distinction between goal-directed attention, in which certain task-relevant locations or objects may be prioritized, and spatial selection history effects, in which patterns of attentional shifts are preferred and repeatedly engaged in a “habit-like” manner. Much like contextual cueing studies, in location-probability learning, participants search for a T among Ls and report the orientation of the T. Unbeknownst to them, the T more often appears in one region than in other regions. Unlike contextual cueing, this high-probability region is maintained in all trials, regardless of the specific search context. This manipulation yields learning of the target location probability, manifested as faster search RT when the target appears in the high- rather than the low-probability region. When location probability learning is contrasted with goal-driven attention, several differences emerge. First, secondary working-memory load impairs goal-driven attention but does not interfere with location probability learning (Won & Jiang, 2015). Second, whereas goal-driven attention is less efficient in older adults than in young adults, location probability learning is insensitive to aging (Jiang, Koutstaal, & Twedell, 2016). Third, whereas an endogenous spatial cue induces a baseline shift of attention, location probability learning does not (Addleman, Schmidt, Remington, & Jiang, 2019). Fourth, when learned explicitly, frequently attended locations are coded in relation to the external environment. In contrast, implicit location probability learning yields a spatial bias that is viewpoint dependent (Jiang & Swallow, 2013, 2014). Finally, training with location probability learning may yield an attentional bias that is task-specific, rather than a change in the generic attentional priority map (Addleman, Tao, Remington, & Jiang, 2018; Sha, Remington, & Jiang, 2018).

Alternative definitions of early- and late-locus accounts

These findings suggest that it may be necessary to re-define the attentional guidance account in a way that integrates components of the original response selection account. This integration maintains that repeated contexts begin to affect search early (“early locus”). However, it also preserves an important component of the response selection account – that context-guided attention is distinct from saliency- or goal-guided attention. This integration may be described as a “habit-guided” account that emphasizes the mechanistic differences between the dynamic spatial selection history effects and the static effects of a “saliency map” built upon task goals and stimulus salience. Spatial attention involves either covert or overt shifts of attention in space. The active nature of spatial attention necessarily implicates a procedural component of attention. People may be driven to attend to a particular location either because that location is highly prioritized due to its salience or task relevance, or because shifts of attention in the direction of that location have been reinforced through experience to the point where they occur automatically, or “habitually.” Whereas task goals primarily affect the attentional priority assigned to various locations in space, selection history effects like contextual cueing are likely to also influence the procedural component of spatial attention (Jiang & Sisk, 2018), inducing a “search habit” that is either context specific (contextual cueing) or general (location probability learning). Recognition of repeated contexts may cue participants to engage the “search habit” associated with that context, shifting attention toward the target location.

The habit-guided account is also plausible given increasing evidence of a distinction between selection history effects and goal-driven attention. This distinction within the contextual cueing paradigm is most clearly shown when contrasting implicit, array-based contextual cueing (e.g., Chun & Jiang, 1998) with explicit, scene-based contextual cueing that likely relies on endogenous goal-driven attention (Brockmole & Henderson, 2006). Scene-based contextual cueing occurs more quickly, after just two to three repetitions, and produces an effect five to ten times greater than the effect produced by array-based contextual cueing. Whereas array-based contextual cueing has a strong local constraint, scene-based contextual cueing often depends on the larger, global scene context (Brockmole et al., 2006; Brooks et al., 2010; Castelhano et al., 2018). In addition, the two forms of contextual cueing differ in their time-course. Following explicit learning of the scene-target association, participants can shift attention in anticipation of the target when presented with the scene (Summerfield et al., 2006). In contrast, array-based contextual cueing does not begin to guide attention until after search is underway (Jiang et al., 2013; Peterson & Kramer, 2001).

We may also consider alternative interpretations of response facilitation. As noted above, a precise theoretical definition of the dominant late-locus account, the response facilitation account, is rarely granted a more nuanced definition than that the locus of contextual cueing lies in a response stage after the target is found. We believe that, given closer consideration, a version of context-driven, response facilitation may actually be compatible with an early locus of contextual cueing. Rather than limiting response facilitation to the moments after the target is found, response and decision factors should be seen as playing a role throughout search as each distractor is attended and rejected. Each non-target rejection represents an individual decision. If repeated displays lower thresholds for decisions or increase information accrual, then each non-target rejection would benefit from the same facilitation as the final target response – non-target rejections should be faster in a familiar context.

This version of the response facilitation account may be more powerful than the current version because it accounts for a greater proportion of the behavioral data. Interestingly, this account presents predictions that are similar to the attentional guidance account because response facilitation is no longer limited to late processing of the target. If participants more easily reject non-targets, there should be a search slope effect in contextual cueing, moderated by the local constraint. Whereas the attentional guidance account predicts fewer fixations during search, this revised response facilitation account predicts shorter fixation duration for the distractors near the target, which would diminish the duration of search up to the point that the target is fixated. Modeling each decision within the search process using individual diffusion models would elucidate the influence of response facilitation in the moments leading up to target fixation.

The conclusive evidence for early-locus accounts and inconclusive evidence for late-locus accounts therefore leave many fertile lines of research open. Future research may search for more consistent evidence for a late locus of contextual cueing. Alternatively, future studies may explore the influence of context repetition on rejections of each non-target during search or the distinctions between spatial selection history effects and goal-directed attention.

Explicit or implicit?

An important research direction in contextual cueing will be the exploration of the differences between explicit and implicit learning in this task. Although the theoretical significance of contextual cueing does not depend on this distinction, it is crucial to our understanding of attentional guidance by selection history. If contextual cueing and other selection history effects are driven by explicit learning, they simply reflect goal-directed attention. Once regularities in the environment are explicitly known, as in scene-based contextual cueing, preferences toward frequent target locations are nothing more than preferences toward goal-relevant locations. However, when learning is implicit, selection history effects drive attention independently from task goals, potentially via “habit-like” learning mechanisms.

A great deal of evidence suggests that array-based contextual cueing typically reflects implicit, rather than explicit learning. Post-experiment recognition tests reveal a low level of explicit awareness that is uncorrelated with the magnitude of contextual cueing (Chun & Jiang, 1998; Chun & Jiang, 2003; Colagiuri & Livesey, 2016; Vadillo, Konstantinidis, & Shanks, 2016). While a meta-analysis suggests that the level of explicit awareness exceeds chance and that the many studies that fail to observe significant levels of explicit awareness have insufficient power (Vadillo et al., 2016), the level of explicit awareness does not correlate with the size of contextual cueing even in a study with more than 700 participants (Colagiuri & Livesey, 2016). This suggests that implicit learning occurs in parallel to, and independent of, explicit awareness. Thus, contextual cueing is commonly considered an example of implicit, relational learning (Chun & Phelps, 1999).

Nonetheless, more recent studies using novel analyses have identified interesting connections between contextual cueing and measures of explicit awareness, such as a correlation between contextual cueing magnitude and both explicit awareness and fixational dwell times in the target quadrant during a subsequent target localization recognition task (Annac et al., 2019), and a correlation between subjective experience of display clarity immediately following search and later localization accuracy (Schlagbauer, Rausch, Zehetleitner, Müller, & Geyer, 2018). The correlation between contextual cueing and explicit awareness in Annac et al. (2019) conflicts with Colagiuri and Livesey (2016), though the latter has greater power with a sample size of over 700. Both the correlation between contextual cueing and fixational dwell times (Annac et al., 2019) and the correlation between localization accuracy and subjective experience of the display (Schlagbauer et al., 2018) are compatible with the notion that contextual cueing represents implicit learning. Implicit memory for repeated contexts could increase familiarity and lead to increased subjective experience of the display and localization accuracy, as well as increased fixational dwell times in the target quadrant during localization. Though further research is needed, much evidence suggests that contextual cueing often reflects implicit learning, which further distinguishes its influence on attention from that of current task goals.

Conclusions

Two decades of research on contextual cueing has brought resolution to many contentious issues. Despite early doubts (Lleras & Von Mühlenen, 2004), contextual cueing has been firmly established in many studies and across the lifespan (Goujon et al., 2015; Jiang, Sisk, & Toh, in press). Some contexts, such as natural scenes, induce explicit contextual cueing. Others, such as the repeated spatial layout of search elements, are learned implicitly, largely independent of explicit awareness (Chun & Jiang, 2003; Colagiuri & Livesey, 2016; Vadillo et al., 2016). Due to the centrality of the attentional guidance vs. response facilitation debate, which is better represented as an early- versus late-locus debate, this review, while not exhaustive, has implications for many studies that could not be covered within the limited scope of a tutorial review.

The field now approaches a resolution to this central theoretical debate in contextual cueing. Empirical evidence indicates that the repeated context expedites search early, before a target is found. Although late-locus accounts do not find fully conclusive evidence, this will remain an important line of research in the future, and the significance of the response facilitation account goes beyond identifying the locus of contextual cueing.

As the field of attention and selection history effects developed, the early- versus late-locus debate evolved into the impetus for a new, important future research direction – the exploration of similarities and differences between selection history effects – exemplified by contextual cueing – and other drivers of selective attention. Kunar et al. (2007) raised the possibility that although spatial statistical learning can cue spatial attention, it may do so through qualitatively different mechanisms than goal-driven attention or stimulus salience. Experience-guided attention may exhibit unique characteristics not shown in endogenous attention. It may be constrained in ways not anticipated by traditional attention or vision research. An example is the recent observation that, when real-world objects are used as search stimuli, the identity of the objects is an integral component of the learned context. Repeating the spatial layout but varying the placeholder objects prevents contextual cueing from occurring (Makovski, 2016). This and other findings on the nature of learned context and the limitations of this learning (Assumpção, Shi, Zang, Müller, & Geyer, 2018; Chua & Chun, 2003; Conci, Müller, & von Mühlenen, 2013; Conci & von Mühlenen, 2009; Feldmann-Wüstefeld & Schubö, 2014) may bring us to new fronts in understanding the power and limitations of experience-guided attention. For instance, a promising emerging research area explores the interaction between contextual cueing and emotions and stress (Kunar, Watson, Cole, & Cox, 2014b; Meyer, Quaedflieg, Bisby, & Smeets, 2019), further uncovering the nuances of spatial statistical learning and its relationship to conscious task goals, stimulus salience, and other implicit factors.

This review raises an outstanding question about how the repetition of a search context facilitates guidance. Does contextual cueing modulate the weights assigned in an attentional priority map, or does it affect the dynamic component of search, such as the vector and trajectories of covert attentional shifts? More broadly, do selection history effects induce a “search habit” (Jiang & Sisk, 2018), supported mainly by subcortical regions of the brain (e.g., the striatum), or do they change the attentional priority map, represented mainly by cortical regions (e.g., the parietal cortex)? Understanding how search contexts affect attention may facilitate the development of a comprehensive taxonomy of attention (Chun, Golomb, & Turk-Browne, 2011). Future research should also heed methodological issues, such as sample size and statistical power, that will increase the reliability and reproducibility of the ongoing work toward conceptualizing the way experience shapes visual search (Jiang & Sisk, 2019).