Observers change their target template based on expected context
Previous studies have shown that when observers search repeatedly for a target in a particular context, they may develop a target template that is biased for that context. Because the same target may appear in multiple contexts, we wondered whether observers are able to develop multiple templates for the same target, with each template biased for a particular context. In a series of behavioral experiments, we show that observers can learn multiple target templates for a single target and that they can voluntarily switch among these templates depending on the context they expect to see. Our results suggest that these biased templates may coexist with an unbiased representation of the target, provided they are learned first.
KeywordsVisual search Object Recognition
When we search for an object, our eye movements are not random; instead, we tend to fixate things in the environment that resemble the target. The mental representation that guides these eye movements is called the target template, or alternatively, the search template (Williams, 1967; Rao, Zelinsky, Hayhoe & Ballard, 2002; Eckstein, Beutter, Pham, Shimozaki & Stone, 2007; Hwang, Higgins & Pomplun, 2009). When our eyes land on a potential target, we must verify that it is the object we are seeking. The target template also may serve as the reference for this verification decision (Desimone & Duncan, 1995; Miller, Erickson & Desimone, 1996). Given the pivotal role the target template plays in visual search, it is important to characterize this representation and to determine how it is influenced by the search task.
In some search tasks, observers search for a target among distractors that vary within and across displays. In these tasks, there is typically no single feature or set of features that distinguishes the target from the distractors, and the target template is assumed to be an unbiased representation that includes all of the target's features (Rao, Zelinsky, Hayhoe & Ballard, 2002; Zelinsky, 2008; Hwang, Higgins & Pomplun, 2009).
In other search tasks, the target appears among distractors that do not vary within or across displays. The target template for these tasks may consist only of the feature that best distinguishes the target from the distractors (Wolfe, 1994; Abbey & Eckstein, 2002; Navalpakkam & Itti, 2007; Becker, 2010). Sometimes, the feature that best distinguishes the target from the distractors may be only weakly associated with the target. In these cases, the template may be a highly biased representation of the target. For example, when observers search for a line oriented at 55 degrees among lines oriented at 50 degrees, they appear to use a target template for a line oriented at 60 degrees (Navalpakkam & Itti, 2007). Similarly, when observers search for an orange target among yellow distractors, they appear to use a target template for red (Becker, 2010). These studies suggest that the target template is analogous to the decision template used for noisy discrimination tasks (Li, Levi & Klein, 2004; Kauai, Kourtzi & Levi, 2013). That is, rather than being a faithful, unbiased representation of the target, the target template is a biased representation that reflects the information necessary to perform the search task.
Given that target templates may be biased by context, and given that we may encounter the same target in different contexts, it seems possible that we may have multiple templates for the same target. To express this in concrete terms: if we frequently search for our garden spade in our garage and in our garden, we may use different templates for the spade depending on whether we are searching for this target amid other tools or amid plants. The current study tests whether observers searching for a target object can use a different target template depending on the expected context.
Experiment 1: context cues
Visual search studies showing that the distractor context can shape the target template have typically used simple stimuli that vary along a single dimension, such as orientation, hue, or spatial frequency. To test whether practice searching for a target in multiple contexts can produce multiple templates, it is necessary to use stimuli that vary along multiple dimensions. For our stimuli, we chose images of complex, real objects. Our search items were images of four very similar wristwatches, one of which served as the target. In addition to the target watch, our search arrays contained multiple copies of a second watch. Because the differences between the target and distractor watches were subtle, we expected search performance to improve as observers learned the features that distinguished the watches. Based on previous research (Abbey & Eckstein, 2002; Navalpakkam & Itti, 2007; Becker, 2010), we further assumed that these distinguishing features would become the basis for the target template.
The purpose of this experiment was to extend previous work on biased templates by testing whether observers can develop multiple biased templates for the same target and select among these templates depending on the context (the distractors) they expect to see. To encourage observers to develop multiple templates, we gave them blocked practice on the three contexts (three different distractor watches). Each context was associated with a symbolic cue. We expected that during training observers would learn to associate each cue with a context and they would learn which features of the target were distinctive in that context. In other words, we expected observers to learn to associate each cue with a different target template.
After giving the observers an opportunity to learn three target templates, we tested whether they could switch among these templates depending on the context they expected to see. During this test, the three contexts were randomly interleaved. Half of the trials were preceded by a number cue that indicated the context of the up-coming trial. The other half of the trials were preceded by an uninformative cue that provided no context information. Comparing performance on context cue trials with performance on uninformative cue trials allowed us to determine the effect of context cues on search. Although these context cues provided no information about the target (the target never varied), we expected they would facilitate search by allowing the observer to select the target template appropriate for the context.
To compare our results with those of more traditional search experiments, we recruited a second set of observers and repeated the experiment with fixed distractors, a variable target, and target cues. Because the cues indicated the identity of the target, it seems reasonable to expect that observers would use the cue to select the appropriate target template. Except for switching the target and distractors, this comparison experiment was identical to the main experiment. As we discuss later, the existence of search asymmetries prevents this experiment from functioning as a strict control, but we thought it would still be a useful reference.
Twenty participants were recruited from the Introduction to Psychology subject pool at Rutgers-Camden. All participants reported having normal or corrected to normal acuity.
The watches were scaled to an area of 5,000 pixels before they were added to a mid-level gray background, 1,280 pixels wide and 960 pixels tall. Each display contained one randomly rotated target that was positioned entirely in either the left or right side of the background. Each display also contained 3, 6, or 9 randomly rotated copies of one of the distractor watches (Fig. 1, bottom). The distractor watches were randomly positioned in the display, subject to the constraint that they not overlap one another. Each type of distractor was associated with a single digit (1, 2, or 3). These cues were written in black, 72-point font. The stimuli were displayed on a 27-inch iMac computer, which observers viewed unrestrained from a distance of approximately half a meter.
Observers participated in a training session and a testing session separated by 1–3 days. During the training session, the observers practiced searching for the target in the three contexts (the three distractors). The purpose of the training was to give the observers an opportunity to develop three distinct templates for the target, with each template tailored to a particular distractor context. The practice trials were blocked by distractor, and blocks with different distractors were interleaved. During training, observers ran 12 blocks of 30 trials each; the trials within a block were evenly divided among the three sets sizes (3, 6, or 9 distractors). Each trial began with a number cue that corresponded to the distractor type. Because the distractor type was constant within a block, the cue was constant within a block. A constant cue is easy to ignore, and so we asked the observers to say the numbers to themselves each time a cue appeared.
All trials consisted of a 1,000 msec cue, followed by a 600-msec blank interval, followed by the search array. The search array was displayed until the observer responded by pressing either the “F” or the “J” key to indicate whether the target was on the left or right side of the display. Auditory feedback was provided after an incorrect response, and the next trial began after a 500-msec delay. The experiments were controlled by a MatLab program that called on PsychToolbox routines (Brainard, 1997; Pelli, 1997).
To check that observers were learning to associate the cues and the contexts, we interrupted training after the 9th and 12th blocks and asked the observers to write a brief description of how they found the target on trials with the “1,” “2,” and “3” cues. All observers reported using different strategies for different cues. For example, one observer reported that when the cue was “1” he looked for a dark face, when the cue was “2” he looked for a circle on a watch face, and when the cue was “3” he looked for a leather band. For each cue, most observers listed the same features, although some observers listed several features while others listed only one.
One to three days after training, observers returned for a testing session. The testing session was identical to the training session with three exceptions: trials with different distractor types were randomly intermixed, half of the trials in each condition were preceded by an uninformative cue, and there were 8 blocks of 54 trials each.
At the end of the testing session, we asked the observers to pick the target from a line-up of 14 similar watches. The line-up consisted of the target, the distractors, and the first 10 watches in Fig. 4. The watches were displayed on the computer screen, and observers simply pointed to the target. All participants were able to identify the target watch, but we did notice that while some observers selected the target immediately, others appeared to deliberate before making their decision.
Experiment 1b: target cues
As a check on our methods, we recruited 20 additional observers from the subject pool and repeated the experiment with the target and distractors sets switched. In this second version of the experiment, there was one type of distractor (one context) and three types of targets. During training, the observers learned to associate a number cue with each target; during testing, the number cues indicated which target would appear in the upcoming trial. Except for switching the target and the distractors, Experiment 1b was identical to Experiment 1a.
We included this comparison study as a check on our methods. It seems self-evident that observers would use different target templates when searching for different targets, so if we did not find an effect of the target cue, this would indicate a likely problem with our stimuli. If the target and distractors were too different from one another, for example, then the target might “pop-out” making the cues irrelevant. Or, if the three targets were too similar to one another, they might not require distinct templates. If the later were true, it would mean that distractors in experiment 1a were too similar to produce appreciably different contexts. Since we used the same target-distractor pairs in the two experiments, we thought that finding an effect of the target cues would provide some validation of our method for studying context cues.
Results & discussion
We also ran 20 different observers on a version of the experiment in which the context was fixed and the cue predicted the target. Figure 2, right, shows search times as a function of set size for trials with an informative target cue (filled circles) and for trials with an uninformative cue (open circles). An ANOVA showed an effect of cue type (F(1,19) = 7.32, p < 0.05), with target cues reducing search times by 62 msec, on average. The ANOVA also showed an effect of set size (F(2,38) = 126.0, p < 0.0001), but no interaction between cue type and set size (F(2, 38) = 1.5, n.s.). The error rates did not exceed 2 % for any condition.
Typically, cues to the target's identity facilitate guidance (target acquisition) and target verification (Malcolm & Henderson, 2009; Vickery, King & Jiang, 2005; Wolfe, Horowitz, Kenner, Hyle & Vasan, 2004). Facilitation of guidance is associated with reduced interference by distractors and a decrease in slope of the search function (i.e., a set-size × cue-type interaction). Facilitation of verification is associated with a decrease in search times that is independent of set size. Our finding of a main effect of cue type but no interaction between cue type and set size suggests that both the target cues and the context cues facilitated target verification but not guidance. The failure of the cues to improve guidance may be related to the subtle differences between the watches: if the distinctive features were not visible in the periphery they would have limited usefulness for guiding attention. It is worth noting that the absence of an interaction between cue type and set size suggests that the context cues were not facilitating distractor rejection. We will return to this finding later when we consider the possibility of distractor suppression.
Experiments 1a showed that when the target is fixed, cueing the distractors can facilitate search. Experiment 1b showed that when the distractors are fixed, cueing the target can facilitate search. Because the two experiments used the same search items and yielded similar effect sizes, it might be tempting to assume that the targets and distractors played essentially equivalent roles, much as they would in a discrimination task. This is probably not the case. A number of studies have shown that search times may change dramatically when the target and distractors are switched (Treisman & Souther, 1985; Wolfe, 2001). For example, finding a Q among Os is easier than finding an O among Qs. These search asymmetries have been attributed to the relative ease of finding the presence of a feature compared with finding the absence of a feature (Treisman & Souther, 1985).
In addition to this well-documented search asymmetry, there may be a second asymmetry in our experiments. Because distractors dominate search displays, any uncertainty about distractor identity is quickly resolved once the display appears. Thus, the benefit of a context cue is likely limited to the beginning of the search process. In contrast, uncertainty about the target's identity can only be resolved once the target is located. Thus, target cues may benefit performance throughout the search process.
Regardless of the interpretation of the magnitude of the context cue effect, the existence of the effect requires an explanation. There are two obvious explanations, and they are predicated on different assumptions about the nature of the target template. If the target template consists of the features that are distinctive to the target, then it will be shaped by both the target and the distractors. By this account, context cues would benefit search because knowledge of both the target and the distractors is necessary to select the optimal target template. Alternatively, if the target template is an unbiased representation of the target, then its formation will be independent of the distractors. For this second account to predict an effect of context cues it would require an additional top-down process, such as the top-down suppression of the distractors. To test whether our observers were using the context cues to suppress the distractors, we repeated the experiment with the same three distractors and a randomly varying target.
Experiment 2: distractor suppression?
Recent studies have reported distractor suppression based on salient features (Arita, Carlisle & Woodman, 2012; Moher, Lakshmanan, Egeth & Ewen, 2014). To test whether our observers were using the context cues to inhibit the processing of the distractors, we repeated the experiment with the same three distractors and ten randomly varying targets. If the observers in the first experiment used the context cues to select the appropriate target template, then the cues should be ineffective when the target varies unpredictably. Alternatively, if the observers used the context cues to suppress the distractors, then the cues should continue to facilitate search. As with the previous experiment, we also ran a comparison condition that involved three targets and ten randomly varying distractors.
Twenty new observers were recruited from the subject pool to participate in Experiment 2a. The training and testing methods were identical to the previous experiment except that the target on each trial was randomly selected from a set of ten watches. As before, the context cues informed the observer of the distractors in the upcoming trial.
For comparison, we also ran 20 different observers on the reverse experiment with three targets and ten randomly varying distractors. In this case, the cues informed the observer of the target on the upcoming trial. We included this condition as a check on our methods, and we will refer to it as Experiment 2b.
Results and discussion
We also ran the reverse experiment in which the cues predicted the target on the upcoming trial and the distractors varied randomly. If observers use the target cue to select an unbiased target template, then these cues should facilitate search. Figure 3, right, shows the response times for trials with informative target cues (filled circles) and for trials with uninformative cues (open circles). An ANOVA of the reaction time data showed a significant effect of the cue condition (F (1, 19) = 5.15, p < 0.05). On average, the cue reduced search times by 86 msec across conditions. The ANOVA also showed an effect of set size (F(2, 38) = 38.7, p < 0.0001), but no cue × set size interaction (F(1, 19) = 0.358, n.s.). The average error rates ranged from 6 % to 7 % across conditions.
Combining the results of Experiments 1 and 2, the following pattern emerges: Target cues facilitate search whether or not the distractors vary, but context cues only facilitate search when the target is fixed. The finding that target cues reliably facilitate search fits with the existing literature (Williams, 1967; Findlay, 1997; Eckstein et al., 2007; Malcolm & Henderson, 2009), findings about context cues are new and require an explanation.
One explanation assumes that the context cues allowed observers to suppress the distractors. Given this assumption, our results suggest that the distractor template (the representation of the features that should be suppressed) is limited to a small number of features. Only a small number of features is needed to distinguish a distractor from a particular target, whereas a large set of features would be needed to distinguish the distractor from many targets. The failure of the context cues to facilitate search with variable targets suggests that the distractor template lacks the specificity necessary to distinguish the distractor from a variety of targets.
An alternative explanation for the finding that distractor cues facilitate search only when the target is fixed assumes that the context cues allowed observers to call up a biased target template that had been honed through practice for that context. If the context cues specify a biased target template, then clearly they would only be helpful for that particular target.
Our results cannot discriminate between these two accounts. But based on the previous literature demonstrating the existence of biased templates, and based on our finding that the context cues appear to facilitate target verification rather than distractor rejection, we favor the second account. Our third experiment assumes this second account and is designed to examine a different question. Nonetheless, the results of this experiment also provide evidence in support of the account involving biased target templates.
Experiment 3: multiple templates for familiar targets?
Experiment 1a showed that observers were faster to find the target when they could anticipate the context. We proposed that the observers were using a different target template for each of the contexts. This proposal might be less interesting if the observers were under the mistaken impression that they were performing three separate tasks, each with a different target. However, our observers were told from the outset that they would be searching for the same target throughout their training and testing sessions. Moreover, when the experiment was over, our observers were able to pick this target from a line-up of 14 similar objects. Nonetheless, we do not know if our observers ever developed a single unified representation of the target. Our intention in this final pair of experiments was to provide training that would induce observers to develop an unbiased (i.e., complete) representation of the target before they were trained to find the target in specific context. (We recognize that all representations are biased, if only to reflect the statistics of natural images. By unbiased, we really mean minimally biased.) As with the previous experiments, we also included a comparison condition, which in this case involved reversing the order of the training sessions.
We ran two versions of this experiment. Both versions involved two training sessions followed by a testing session, all within a one-week period. In Experiment 3a, the discrimination training preceded the search training. In Experiment 3b, the search training preceded the discrimination training. Thus the experiments differed only in which training came first: the training intended to produce an unbiased target representation or the training intended to produce biased target templates.
Results and discussion
In Experiment 3b, the order of the training was reversed: observers first practiced the search task and then they practiced the discrimination task. Otherwise this experiment was identical to Experiment 3a. The results for this experiment are shown in Fig. 5, right. An ANOVA of the response time showed a significant effect of the cue type F(1, 19) = 4.39, p < 0.05. On average, the cue reduced the search times by 31 msec. There was also a significant effect of set size, F(2, 38) = 96.45, p < 0.0001, but no cue x set size interaction F(1, 19) = 0.159. The error rates averaged 2 % across conditions. The existence of a context cueing effect suggests that once observers learn the biased templates, they are able to maintain these templates even when they learn a detailed representation for the target.
Taken together, the results of Experiments 3a and 3b suggest that the order of training may determine whether observers will acquire both a biased template and an unbiased template. This order effect is not entirely surprising: When observers first encounter the unfamiliar target in the search task; they must learn how to distinguish it from the distractors. They may pick-up on the most salient feature that allows them to perform this task, and they may focus only on this feature. When they encounter the target in a new search task, this salient feature may no longer be useful, and so they may pick-up on a second feature that is sufficient for the new task. In this way, the observers may learn three separate templates for performing the three search tasks. Now consider the observers who have had extensive practice discriminating the target from decoys. When presented with the search task, these observers would not need to learn how to discriminate the target from the distractors. Because they had already developed a detailed representation of the target, they would be able to recognize it in a wide range of contexts.
Understanding the way objects are represented in memory is a fundamental problem in vision science. Despite decades of research, we still do not have a firm grasp on the nature of object representations (Peissig & Tarr, 2007). This paper examines the target template, the object representation used for visual search. Parsimony suggests that the same representations would be used for search and for object recognition, although this has not been established. If they are the same, visual search may provide increased access to these representations because it provides a method for controlling for the confounding effects of priming. Before discussing the implications of this experiment, we will briefly discuss the potential advantages of using visual search to study object representations.
People are faster to recognize or find a stimulus they have recognized or found before, and this facilitation is due in part to priming. Priming may occur at many levels of processing, not just the highest-level where object identity is represented. Consequently, response times in both search and recognition tasks may reflect the effects of low-level priming as well as high-level recognition. For example, observers may be faster to recognize an object from a familiar view rather than an unfamiliar view, not because (or not only because) their representation is viewpoint dependent, but simply because the familiar view has been primed (Bar, 2001).
Visual search times may reflect several forms of priming including target priming, distractor priming, and response priming (Huang, Holcombe & Pashler, 2004; Kristjánsson & Driver, 2008). All of these effects occur automatically, independent of the observer's expectations about the target. In contrast, the target template embodies the observer’s expectations. Experimenters can manipulate these expectations with cues, and if the cues are symbolic, they should cause minimal priming. When symbolic cues are used, the difference between cued and uncued search trials can be attributed to the effects of the target template. This method allows the researcher to examine the observer’s representation of the target object free of the contaminating effects of priming (Bravo & Farid, 2009).
We used this cueing strategy to examine the extent to which the target template is shaped by the distractors in the display. If the target template is related to the representation used for object recognition, then our study uses a new method to examine an old question: Do object representations emphasize diagnostic features? (Sigala & Logothetis, 2002; Schyns, 1998). An object's diagnostic features are the features that distinguish the object from alternative objects; if the set of alternatives changes, then the object's diagnostic features may change. Our experiment extends this work by asking whether observers can have multiple representations of one object, with each representation emphasizing different diagnostic features.
In our experiments, observers searched for a target among three types of distractors, and they found the target faster when they were given a cue to the distractors that would appear on the upcoming trial. An obvious interpretation of this result is that observers were using the cues to suppress the distractors. While we cannot definitely rule out this interpretation, we think this is unlikely for several reasons. First, if the cues led to distractor suppression, one might expect the cues to reduce the effect of the distractors, resulting in an interaction between cue type and set size. In Experiment 1a, we found a clear effect of the context cue but no interaction. Second, if the cues led to distractor suppression, one might expect the cues to be effective regardless of the target’s identity. In Experiment 2a, we found that varying the target eliminated the context cue. Third, if the cues led to distractor suppression, one would not expect that increasing the familiarity of the target would eliminate the cueing effect. In Experiment 3a, we found that discrimination training with the target prior to search task eliminated the benefit of the context cue on search. Although distractor suppression may exist in other situations (Arita et al. 2012; Moher et al., 2014), the preponderance of evidence in this study suggests that the context cues were not used for distractor suppression.
Instead, we interpret our results as showing that observers can use a context cue to select a target template adapted for the anticipated context. These experiments also suggest that observers may retain these context-dependent, biased target templates even after they have developed a more detailed representation of the object.
We also tested whether observers could develop biased representations after they had acquired an unbiased representation of the target. We found no evidence for this, but our results do not reveal whether our observers were unable or un-incentivized to develop context-dependent templates. The search times for Experiment 3a (which showed no context cue effect) were not significantly different from those for Experiment 3b (which showed a small but significant context cue effect). Thus, we cannot be certain that the biased templates would have conferred an advantage over generic templates.
An interesting suggestion raised by our results is that the conditions of an observer’s initial exposure to an object may have a prolonged effect on their representation of that object. If observers first encounter the object while performing a highly specific task, they may represent the object in terms of the task. This biased representation may remain distinct even when they encounter the object in other tasks. That is, our task-specific representations are not necessarily absorbed into a unbiased representation over time. Alternatively, if we first encounter the object in the context of a varied discrimination task, or no task at all, we may develop a more generic, unbiased representation. If the object becomes easily recognizable, then this rapid recognition may preclude the development of task-specific representations. As such, this study adds to previous physiological research (Wong, Folstein & Gauthier, 2012; Op de Beeck, Baker, DiCarlo & Kanwisher, 2006) and behavioral research (Suzuki & Cavanagh, 1995; Malinowski & Hubner, 2001; Shen & Reingold, 2001; Bravo & Farid, 2014), suggesting that there is a bidirectional effect between object representations and the perceptual task. The task involved in object learning constrains object representations, and conversely preexisting object representations constrain task performance.
- Bravo, M., & Farid, H. (2009). The specificity of the search template. Journal Of Vision, 9(1), doi: 10.1167/9.1.34
- Hwang, A., Higgins, E., & Pomplun, M. (2009). A model of top-down attentional control during visual search in complex scenes. Journal Of Vision, 9(5), doi: 10.1167/9.5.25
- Suzuki, S. S., & Cavanagh, P. P. (1995). Facial organization blocks access to low-level features: An object inferiority effect. Journal of Experimental Psychology: Human Perception and Performance, 21(4), 901–913.Google Scholar