Introduction

When we look at a scene or an image, certain things draw our gaze more readily than others. If a scene is composed mostly of identical objects, but contains one object that is different from the others, then we are more likely to direct our gaze to the “odd one out” (Donk & van Zoest, 2008; van Zoest, Donk, & Theeuwes, 2004; van Zoest, Donk, & Van der Stigchel, 2012). An object that “pops out” and grabs our attention in this way is said to be salient. Salience is greater when the differences between an object and its surroundings are greater. For example, in a group of lines that are all oriented at a certain angle, a line that is oriented very differently from the others is more salient than a line that is oriented only slightly differently from the others (Nothdurft, 1991, 1992, 1993, 2000; van Zoest et al., 2012).

Many different things contribute to an object’s salience. It may stand out from other objects because it is oriented differently, is more luminous (Jonikaitis & Belopolsky, 2014; Schütz, Trommershäuser, & Gegenfurtner, 2012), is moving in a different direction (Dick, Ullman, & Sagi, 1987; Nothdurft, 1993), is a different color (D’Zmura, 1991; Nagy & Sanchez, 1990; Nothdurft, 1993), moves, flickers, or rotates, or appears suddenly (van Zoest, Heimler, & Pavani, 2017). With so many sources of information guiding our gaze, it is necessary to either integrate or to select from among them in some way to produce a single decision about where to move our eyes next. One way in which this might be achieved is if the visual system contains a central “salience map” that represents the salience of locations in our field of view (Koch & Ullman, 1985; see Treue, 2003, for a review; see also Belopolsky, 2015, for the similar but distinct notion of a “priority map” that, in addition to salience, represents past selection history, reward value, and emotional significance). This map receives inputs from other systems that process the different visual features or current contextual constraints that contribute to overall salience. When we make an eye movement to shift our gaze (a saccade), that saccade will go to the current location of highest salience in the salience map.

In working out the details of the salience map framework, one question that arises is how inputs from different sources are combined into one map. For example, if an object is of contrasting orientation to its neighbors and is more luminous, does it draw our gaze more strongly than if it exhibited only one of these salient features? If so, is its overall salience simply the sum of the salience of the two separate features? These questions may be addressed by measuring the salience of individual features separately, then measuring the salience of an object that exhibits both of those features together. For example, to investigate the integration of orientation and luminance salience, one could first measure the extent to which a line of high orientation contrast draws people’s gaze, then perform the same measurement for a line of high luminance, and then finally for a line that is of high orientation contrast and of high luminance (Fig. 1).

Fig. 1: Example stimulus displays from Experiment 1
figure 1

The target circle is shown above the fixation point, and the distractor to the right of the target. The display was based on that used in van Zoest et al. (2012)

In the present study, we investigated such cases of salience from multiple features and tested three competing hypotheses. The additivity hypothesis asserts that salience from separate features is additive, which is to say that the overall salience of an object is the sum of its salience for each separate feature. Additivity is consistent with a model of the salience system in which processing of the different features is carried out independently, and the effects of one feature do not depend on the status of any other feature.

We consider two types of alternative to additivity. The underadditivity hypothesis asserts that the overall salience of an object is less than the sum of its salience for each feature, whereas the overadditivity hypothesis asserts that overall salience is greater than the sum of saliences for each feature. Underadditivity may be the result of inhibitory communication between the systems responsible for processing each feature; high salience detected by one system causes other systems to become less sensitive to salience, with the result that the total salience that they detect together is less than what they would detect had they worked in isolation. Alternatively, underadditivity may be the result of a general mechanism of saturation; when a high enough level of overall salience is reached because of one salient feature, additional salience from another feature cannot further increase overall salience past this level of saturation. In the case of overadditivity, communication between feature systems is excitatory; high salience detected by one system causes other systems to become more sensitive, with the result that the total salience they detect together is more than when they work in isolation.

For an example of what these three hypotheses predict for simple images, consider Fig. 1. In panel A, one line stands out from the others, because the orientation is slightly contrasting. In panel B, the line is also of a higher luminance. The difference in the line’s salience from display A to display B is the salience effect of the increase in luminance, given that the line also shows a slight orientation salience. Panels C and D have the same increase in luminance, but now the line has greater orientation salience. According to the additivity hypothesis, the effect of luminance on salience is independent of the level of orientation salience, so the difference between C and D should be the same as the difference between A and B. According to the underadditivity hypothesis, the increase in salience as a result of added luminance should diminish if orientation salience is already high, so the difference between C and D should be less than between A and B. Finally, according to the overadditivity hypothesis, the effect of added luminance should increase if orientation salience is already high, so the difference between C and D should be greater than between A and B.

It does not necessarily have to be the case that one of these hypotheses is true for the entire salience system. Instead, it may be the case that some combinations of features are additive, whereas others are underadditive or overadditive. Indeed, there is some evidence that the extent to which salient features are additive depends on the particular combination of features being considered. Nothdurft (2000) used arrays of lines (similar to those shown in Fig. 1) to measure the salience of combinations of contrasts in orientation, luminance, movement, and color, by asking people to judge which of two lines appeared more salient to them. Combinations of orientation and luminance, and of movement and luminance, were close to additive, whereas combinations of orientation and color, and of orientation and movement, were clearly underadditive.

To our knowledge, no clear evidence exists of overadditivity for feature combinations. Kerzel et al. (2011) asked people to judge the brightness of an item and found that items that were salient by virtue of their orientation or movement direction were judged to be brighter than items of equal brightness that were not salient by virtue of orientation or movement. The authors argue that the overall salience of the item is “fed back” into a separate map that represents only luminance salience and enhances salience in this map. This mechanism might constitute the first step in an overadditive salience system. However, for the system as a whole to be overadditive, it would be necessary for the enhanced luminance to be fed back again into the overall salience map, thus closing a positive feedback loop and producing an overall level of salience greater than the sum of its separate feature inputs.

Both Nothdurft (2000) and Kerzel et al. (2011) measured salience using subjective judgments; that is, they asked people how things looked to them. This approach is admirable, because it remains true to the phenomenology of vision; one of the central concerns of vision research is to explain why things appear the way they do in our conscious perception. However, subjective judgments pose some problems. Some of these problems are purely practical. The task of judging the salience or contrast of two sometimes very similar items can be very difficult. This means that participants must be carefully instructed in the task and may develop idiosyncratic strategies. In attempting to replicate Nothdurft’s (2000) study, Koene and Zhaoping (2007) found that more than half of the participants seemed to give random answers or developed a strategy of ignoring one of the two features and basing their judgments only on the other. This may mean that Nothdurft’s (2000) results are based on an unrepresentative sample of highly trained participants, and possibly explains the small number of participants (n = 5) and large differences in performance between some of them.

Another problem with the subjective judgment method is not a practical one but concerns the interpretation of the results. An important purported function of a salience map is to guide the gaze to the next item of interest (Parkhurst et al., 2002; Underwood et al., 2006). Whereas saccades to a new item can be made very quickly following the item’s appearance, as quickly as 150 ms (Fischer & Ramsperger, 1984), a subjective judgment of an item’s salience comes after a much longer period of deliberation. Therefore, it is possible that the subjective judgment method does not measure salience at the same stage of processing that is used to guide eye movements. Indeed, it is not clear that the notion of subjective “salience” that, for example, Nothdurft’s (2000) participants were asked to judge is conceptually the same thing as the salience that leads to an item being selected for visual attention and ultimately eye movement. The two could in principle be processed independently of one another, or additional interactions among features could take place after the processing of salience for visual attention but before the generation of a subjective impression of salience.

Therefore, if we wish to investigate salience as an input to attentional selection, it would be best to use a measure of salience that is directly based on eye movements. Van Zoest et al. (2012) recorded eye movements while participants viewed arrays of vertical lines similar to those shown in Fig. 1. The participants were given a very simple task. Each time the display appeared, they had to make a saccade to a small circle (the target), which appeared somewhere in the display. Sometimes, one of the lines (the distractor) had a contrasting orientation to the others, as explained in the example above. In such “target-distractor” tasks, saccades to the target tend to deviate towards the distractor, i.e., the gaze initially moves to a position that is very close to the target but is located somewhat towards the position of the distractor (Van der Stigchel et al., 2006; Walker & McSorley, 2008; Van der Stigchel, 2010, for reviews). The extent of this deviation towards the distractor may be taken as an indicator of the distractor’s salience. In agreement with subjective judgment studies, van Zoest et al. (2012) found that distractors of a strongly contrasting orientation were more salient than those of only a slightly contrasting orientation. The saccade deviation method also has been successfully used to measure the salience of moving distractors (van Zoest et al., 2017).

The “saccade deviation” method of quantifying salience has two advantages over the subjective judgment method. First, saccade deviation is easily interpretable within the salience map framework. Neurophysiological studies from monkeys show that the direction of saccades is determined by activity in a map of the visual field, located in the superior colliculus (McPeek & Keller, 2001; McPeek et al., 2003; Port & Wurtz, 2003). In the target-distractor task, there is activity in this map at the locations of both the target and the distractor, and the saccade is therefore directed towards a position intermediate between the two. Salient distractors elicit greater deviation towards themselves, probably because they elicit more activity in the salience map that guides eye movements.

Second, the target-distractor task is much simpler for participants than subjective judgment tasks. Because the target is clearly visible and clearly discriminable from the distractor, there is arguably little or no conscious deliberation required to locate it. The result is that participants do not need to make any idiosyncratic strategic decisions about their salience judgments, for example by deciding which features of the distractor to pay attention to and which to ignore. Indeed, participants do not need to pay attention to the distractor at all; the extent to which the distractor influences the salience measurements is instead determined by automatic and unavoidable processes.

In the present study, we used saccade deviation to quantify salience from combinations of different features. We investigated three of the combinations used in Nothdurft’s (2000) study—orientation with luminance, orientation with movement, and luminance with movement—in each case comparing the support the findings offered for additivity, underadditivity, or overadditivity. The goal was to test the earlier conclusions but using a method that more directly reflects the concept of the salience map as a mechanism for generating shifts of gaze. If it is the case that later, more deliberative, subjective judgments are based on the same salience system as eye movements, then the pattern of results should confirm that of Nothdurft (2000), namely that the combination of orientation and movement is clearly underadditive, whereas luminance is close to additive with the other two features.

Experiment 1: Orientation + luminance

Methods

Participants

Twenty-two participants, aged 19 to 36 (mean age 27) years, 17 females and 5 males, were recruited from the participant pool at the Institute of Psychology at the Humboldt-Universität in Berlin. Participants were informed about the details of the experiment in advance and gave their written consent to participate. As compensation, they received participation credit towards the completion of their undergraduate degree in psychology.

Apparatus

The task display was programmed using MatLab with the Psychophysics Toolbox and shown on a Samsung SyncMaster 2233 monitor with a refresh rate of 60 Hz using the default manufacturer settings for brightness and contrast. Eye movements were recorded from the left eye only, using an SMI iView X Hi-Speed system with a sampling rate of 1250 Hz. The experiment was conducted in a blinded room with no light source other than the display monitor. The participant was seated at a desk facing the display monitor at a distance of approximately 70 cm, with chin resting on the eye tracking system’s built-in chin rest.

Stimuli

The stimulus display was based on that reported in van Zoest et al. (2012). Participants completed a target-distractor saccade task. In each trial of the task, participants first fixated a central point, marked by a small circle approximately 0.44 degrees of visual angle (dva) in diameter. After a variable period of time, ranging randomly from 800 to 1,300 ms, the stimulus display appeared.

To introduce variation into the latencies of saccades, a Stimulus Onset Asynchrony (SOA) was introduced between the disappearance of the fixation point and the appearance of the stimulus display (Saslow, 1967; van Zoest et al., 2012). On some trials, the fixation point disappeared before the stimulus display appeared (SOA −150 or −50 ms). On some trials, the two events occurred simultaneously (SOA 0 ms), and on some trials, the fixation spot disappeared after the stimulus display appeared (SOA 50 or 150 ms).

The stimulus display consisted of a 17 × 17 grid of vertical lines. The entire grid was approximately 17.5 × 17.5 dva in size. Each line was approximately 0.76 dva in length and 0.15 dva in width. Participants were instructed to make a saccade to a target shape as soon as they could. The target shape was a small circle the same as the fixation point and could appear six grid positions away from the central fixation point (so approximately 7.6 dva), either vertically above or below it. The background of the display was set at minimum black (approximately 0.13 cd/m2 in luminance). The fixation point, target, and grid lines were an intermediate dark gray color (approximately 24.46 cd/m2 in luminance).

In 8 of every 9 trials, an irrelevant distractor appeared near the target: two grid positions below it and two grid positions either to its left or to its right (see Fig. 1 for examples). The distractor was a line the same as the rest of the lines in the grid but oriented differently in either a clockwise or counter-clockwise direction. This orientation difference was either small (22.5 degrees, low orientation salience; Fig. 1a) or large (67.5 degrees, high orientation salience; Fig. 1c). In half of the trials containing a distractor, the distractor also was of increased luminance relative to the rest of the display (maximum white, approximately 110.30 cd/m2; Fig. 1b and c).

The stimulus display remained on screen for 1500 ms, after which time it was removed and replaced by the fixation point for the beginning of the next trial.

Procedure

Each participant completed 900 trials of the task. Distractor trials were of 8 types, composed of the orthogonal combination of 2 levels of orientation salience (low and high), 2 levels of luminance salience (absent and present), and 2 orientation directions (clockwise or counter-clockwise). To these was added a ninth trial type without a distractor. The 9 trial types were combined orthogonally with 2 target positions (above or below fixation), 2 distractor positions (left or right of the target), and 5 SOAs (−150, −50, 0, 50, or 150 ms), repeated 5 times to give the full set of 900 trials. The order of trial types was randomized for each participant.

The experiment began with 10 trials of practice. Participants then completed the 900 experimental trials in 10 blocks of 90 trials each. Each block began with a 13-point calibration. The calibration was then validated and repeated if validation failed. To speed up data collection and obtain enough trials for analysis, no drift correction was performed during the block (but drift was corrected post hoc; see Data Analysis). After each block, the participant was given time to rest before continuing.

Data analysisFootnote 1

All gaze samples falling outside the dimensions of the stimulus monitor were discarded. Gaze samples that did fall within the dimensions of the monitor were smoothed to average out small-scale sampling noise. This was achieved by replacing the x and y coordinates of each sample with the mean of coordinates from all samples within 2.5 ms of the current sample (i.e., smoothing with a “rectangular sliding window”).

To extract the first saccade from the processed samples, we used a “velocity peak method” (Smeets & Hooge, 2003). This method avoids erroneously categorizing small fluctuations in gaze velocity as saccades, which may occur with a fixed saccade velocity criterion (Nyström & Holmqvist, 2010). The first velocity peak was identified as the first set of contiguous samples with a velocity greater than 100 deg/s. The start and end points of the saccade were identified by searching from this peak backwards and forwards in time respectively until finding a sample with a velocity below 35 deg/s and an acceleration below 0 deg/s2.

Saccade trajectory deviation was quantified using the measure known as overall direction (OD; Van der Stigchel et al., 2006). OD is the size of the angle between a straight line from saccade start to saccade endpoint and a straight line from saccade start to target position. As such, it measures the extent to which the saccade deviates from its target. Values of OD were signed according to the position of the distractor relative to the target, such that larger values indicate greater deviation towards the distractor. OD was chosen rather than any of the other commonly used measures of saccade trajectory deviation. It has been shown to be sensitive to the presence of a distractor when the distractor is located close to the target (McSorley, Cruikshank, & Inman, 2009), as was the case in the present study, and has been shown to be more sensitive to the salience of the distractor than other measures (Tudge, McSorley, Brandt, & Schubert, 2017). Saccade latency was calculated as the time between onset of the stimulus display and the start of the first saccade.

Trials were excluded from further analysis if saccade latency was less than 100 ms (suggesting an anticipatory saccade) or greater than 600 ms (suggesting a saccade that was not an immediate reaction to the onset of the stimuli), if OD was more than 18 degrees either side of the target (suggesting a saccade not directed towards the target), or if the participant was not fixating the screen within 2 degrees of visual angle of the fixation point at the time the saccade was initiated.

All statistical analyses were carried out using the R software (R Core Team, 2015). OD was analyzed using a linear mixed effects model, using the lme4 package (Bates et al., 2015). The model included saccade latency, distractor orientation, and distractor luminance as fixed effects, random intercepts for each participant (i.e., a random effect of participant), and a random latency slope for each participant, using the following model formula:

$$ \mathrm{OD}\sim {\mathrm{latency}}^{\ast }{\mathrm{orientation}}^{\ast}\mathrm{luminance}+\left(\mathrm{latency}|\mathrm{participant}\right) $$

For each of the coefficients in the model, the null hypothesis that the population coefficient is zero was tested using a t test based on the Satterthwaite approximation to the degrees of freedom (Satterthwaite, 1946), implemented in the lmerTest package (Kuznetsova et al., 2016). The coefficient of interest was the coefficient for the interaction between distractor orientation and distractor luminance. A negative value for this coefficient indicates underadditivity of the two sources of salience, and a positive value indicates overadditivity.

In addition to the possible under- and overadditivity, it was necessary to assess the additivity hypothesis. In the linear model applied, additivity would manifest as a lack of interaction between orientation and luminance or equivalently a coefficient of zero for this interaction. Therefore, two nested models were compared: an “additive” model without a term for the interaction of orientation and luminance, and an “interactive” model that differs from the additive only by inclusion of the interaction term. The fixed effects of the two model formulas differed as follows:

  • Madditive:

$$ {\mathrm{latency}}^{\ast}\mathrm{orientation}+{\mathrm{latency}}^{\ast}\mathrm{luminance} $$
  • Minteractive:

$$ {\mathrm{latency}}^{\ast}\mathrm{orientation}+{\mathrm{latency}}^{\ast}\mathrm{luminance}+{\mathrm{orientation}}^{\ast}\mathrm{luminance} $$

It is important to note that the additive model acts as a “null model” here, because it contains one less free parameter than the interactive model. If we quantify the support that the data offer for this model, a null hypothesis testing procedure is not appropriate, because it does not distinguish between data that support the null hypothesis and data that are merely inconclusive. Therefore, the support the data offered for the two models was assessed using two well-known model selection criteria: Akaike’s Information Criterion (AIC; Akaike, 1974) and the Bayesian Information Criterion (BIC; Schwarz, 1978). These criteria can be used to compare a null and alternative model and take into account the difference in numbers of free parameters. For both criteria, lower values indicate greater support. Also, the raw values of either criterion are of little interest, only the difference in values between the two models. Therefore, values of ΔAIC and ΔBIC were calculated by subtracting the value for the additive model from the value for the interactive model. Negative values indicate greater support for an interaction between orientation and luminance, and positive values indicate greater support for additivity.

The two criteria differ slightly in their interpretation. The AIC is intended to estimate how well competing models approximate the unknown true process generating the data and will tend to favor more complex models. When two models are nested and differ by only one free parameter, as is the case for the additive and interactive models, the ΔAIC can favor the simpler model by a maximum value of 2, whereas there is no such limit on how much it may favor the more complex model (Akaike, 1974). The additive model is the simpler model in this context. Therefore, values of ΔAIC close to 2 indicate close to maximal support for additivity.

The BIC is intended to estimate the relative posterior probabilities of competing models and tends to favor simplicity. Therefore, in contrast to the AIC, the BIC will tend to favor the additive model. Because of its strong preference for simplicity, it has been suggested that even slight support from the BIC for the more complex model is good grounds for concluding in favor of the more complex model (Raftery, 1999).

Both ΔAIC and ΔBIC are reported to provide some assessment of the robustness of conclusions to different model selection methods. Where the ΔAIC and ΔBIC disagree on the favored model, we conclude that the model selection problem is hard and the data are inconclusive. Where the two criteria agree, we may be more confident in the conclusion.

Results

See Table 1 for all results from the linear mixed effects model. OD declined significantly with increasing saccade latency (t(26) = 3.51, p = 0.002), confirming the results of previous studies (McSorley, Haggard, & Walker, 2006; Tudge & Schubert, 2016). Distractors of high orientation salience produced significantly greater OD than those of low orientation salience (+1.02 degrees; t(14314) = 8.37, p < 0.001), and this effect declined significantly with increasing latency (t(14320) = 6.77, p < 0.001), again confirming the results of previous studies (van Zoest et al., 2012). Distractors with luminance salience also produced significantly greater OD than those without luminance salience (+1.01 degrees; t(14314) = 8.28, p < 0.001), and this effect also declined significantly with increasing latency (t(14323) = 5.81, p < 0.001). See Fig. 2 for an illustration of the latency-dependent effects of orientation and luminance salience on OD.

Table 1 Results from linear mixed effects model for orientation and luminance salience
Fig. 2: OD as a function of latency, distractor orientation, and distractor luminance
figure 2

Data are grouped into latency quartiles, with each point showing mean latency and mean OD for each latency quartile, averaged over participants. Error bars show ±1 standard error of the mean (SEM)

Dependent measure is OD. Coefficients and standard errors (SE) are given in degrees of OD. For orientation and luminance, the reference level is the lower level of salience, so coefficients reflect the effect of increased salience. Latency was scaled for computational convenience, so the intercept reflects the value of OD at mean latency, and 1 unit of latency indicates 1 standard deviation. Degrees of freedom (df) are based on the Satterthwaite approximation. Note that the degrees of freedom for fixed effects that were also estimated as per-participant random effects (the intercept and effect of latency) are therefore based on the number of participants, whereas the degrees of freedom for the other fixed effects are based on the total number of observations.

The combination of high orientation salience and luminance salience was slightly overadditive. Specifically, the increase in OD produced by luminance when the distractor was of high orientation salience was 1.03 times what it was when the distractor was of low orientation salience. However, this interaction was not significant (t(14311) = 0.18, p = 0.858), nor did it interact with latency (t(14318) = 0.59, p = 0.553). The comparison of additive and interactive models for orientation and luminance favored the additive model according to both model selection criteria, and support for additivity according to the AIC was close to maximal (ΔAIC = 1.97, ΔBIC = 9.54). Figure 3 illustrates the lack of interaction between distractor orientation and luminance in their effects on OD.

Fig. 3: Mean OD for all four combinations of orientation and luminance salience
figure 3

Error bars show ±1 SEM

Experiment 2: Orientation + movement

Methods

Participants

Twenty-two participants, aged 19 to 31 (mean 24) years, 17 females and 5 males, were recruited from the participant pool at the Institute of Psychology at the Humboldt-Universität in Berlin. Two of the participants had already taken part in Experiment 1. Participants were informed about the details of the experiment in advance and gave their written consent to participate. As compensation, they received participation credit towards the completion of their undergraduate degree in psychology.

Stimuli

The technical set-up of the experiment, the apparatus used, the stimulus display, and the procedure were all the same as described for Experiment 1, with one alteration to the stimuli. The manipulation of distractor luminance was removed, so that all distractors were of the same luminance, equal to that of the target and the other lines in the display. Instead, some of the distractors were made to move. This was achieved by jittering the distractor randomly around its correct location at a rate of 60 Hz. Jittered positions were generated by sampling x and y positions according to a normal distribution centered on the correct location and with a standard deviation of approximately 0.05 dva. We chose these parameters for the manipulation of movement, because during a pilot test they produced effects on OD approximately comparable to those produced by the luminance manipulation in Experiment 1.

Data analysis

The data analysis was the same as described for Experiment 1, except that the linear mixed-effects model included a term for movement in place of the term for luminance. The model comparison procedure for assessing the support the data offer for additivity versus interactivity was performed in the same way, such that negative values of ΔAIC and ΔBIC indicate greater support for an interaction between orientation and movement, and positive values indicate greater support for additivity.

Results

Table 2 shows the results from the linear mixed-effects model. Several findings confirmed those of Experiment 1. OD again declined significantly with increasing saccade latency (t(26) = 3.92, p < 0.001). Distractors of high orientation salience produced significantly greater OD than those of low orientation salience (+1.39 degrees; t(16790) = 11.26, p < 0.001), and this effect declined significantly with increasing latency (t(16807) = 7.15, p < 0.001). Distractors with movement salience also produced significantly greater OD than those without movement salience (+1.38 degrees; t(16786) = 11.12, p < 0.001), and this effect also declined significantly with increasing latency (t(16794) = 3.41, p < 0.001). Figure 4 is an illustration of the latency-dependent effects of orientation and movement salience on OD.

Table 2 Results from linear mixed effects model for orientation and movement salience
Fig. 4: OD as a function of latency, distractor orientation, and distractor movement
figure 4

Data are grouped into latency quartiles. Error bars show ±1 SEM

Coefficients and standard errors (SE) are given in degrees of OD. For orientation and movement, the reference level is the lower level of salience. Degrees of freedom (df) are based on the Satterthwaite approximation.

The combination of high orientation salience and movement salience was significantly underadditive. Specifically, the increase in OD produced by movement when the distractor was of high orientation salience was 0.59 times what it was when the distractor was of low orientation salience (t(16787) = 3.24, p = 0.001). This underadditive interaction did not interact with latency (t(16795) = 0.94, p = 0.350). The comparison of additive and interactive models for orientation and movement favored the interactive model according to both model selection criteria (ΔAIC = −8.47, ΔBIC = −0.74), consistent with the hypothesis test for the interaction in the full model. Figure 5 illustrates the underadditivity of distractor orientation and movement in their effects on OD.

Fig. 5: Mean OD for all four combinations of orientation and movement salience
figure 5

Error bars show ±1 SEM

Experiment 3: Luminance + movement

Methods

Participants

Twenty-two participants, aged 18 to 37 (mean 26) years, 11 females and 11 males, were recruited from the participant pool at the Institute of Psychology at the Humboldt-Universität in Berlin. Two of the participants had already taken part in Experiment 2. None had taken part in Experiment 1. Participants were informed about the details of the experiment in advance and gave their written consent to participate. As compensation they received participation credit towards the completion of their undergraduate degree in psychology.

Stimuli

The technical set-up of the experiment and the apparatus used were the same as described for Experiments 1 and 2, except for the stimuli. This time, no differently oriented lines appeared in the display. Instead, in 8 of every 9 trials, the distractor line was made salient by displaying it at a higher luminance than the other lines in the array, as described for Experiment 1, or by making it move, as described for Experiment 2, or both. In half of the distractor trials, the distractor was of increased salience for only one feature, either luminance or movement, and in the other half of the trials, the distractor was of increased salience for both features. Trials were therefore of four types: baseline trials (i.e., all lines in the array were identical, so no distractor), luminance-only trials, movement-only trials, and luminance-movement trials.

Procedure

The procedure was the same as for Experiments 1 and 2, with one small change. The number of trials was increased from 900 to 1,080, by repeating the combinations of possible trial types 12 times instead of 10. Participants completed the experiment in 12 blocks of 90 trials each, instead of in 10 blocks.

Data analysis

In Experiments 1 and 2, the low orientation salience trials served as a baseline measure of OD for a weakly salient distractor. OD for the other distractor types could be compared against this baseline. In Experiment 3, there was no such baseline available. Because both luminance salience and movement salience could be either present or absent, the “distractor” in the trials where both were absent was not visually distinguishable from the other lines in the array (it neither moved nor was brighter than the other lines, nor was it of a different orientation). To work around this structural difference in the trial types and to provide a baseline measure of OD, a “dummy” distractor location (either left or right of the target) was chosen at random for each trial at the time the experiment was run (Al-Aidroos & Pratt, 2010). OD was then calculated in these trials as if there had been a distractor at the randomly-chosen location. Because no distractor was actually present, mean OD for these trials should be zero, but the variance in OD should reflect the normal variance in OD seen in eye movement trajectories. It is important to note that the main effect of saccade latency on this baseline should be zero. As a check for the robustness of the results to the element of randomness introduced by this baselining procedure, all statistical analyses described below were repeated but with the positions of the dummy distractors reversed. This made no changes to the pattern of results.

The data analysis was otherwise the same as described for Experiments 1 and 2, except that the linear mixed effects model included no term for orientation but terms for luminance and for movement instead. The model comparison procedure for assessing the support the data offer for additivity versus interactivity was performed in the same way, such that negative values of ΔAIC and ΔBIC indicate greater support for an interaction between luminance and movement, and positive values indicate greater support for additivity.

Results

The results from the linear mixed effects model are shown in Table 3. As expected given the way in which baseline OD was calculated in this experiment, mean OD at baseline did not differ significantly from zero (0.25 degrees, t(33) = 0.75, p = 0.461), nor did it vary significantly with latency (t(49) = 0.40, p = 0.694). Distractors with added luminance produced significantly greater OD than at baseline (+2.72 degrees; t(14284) = 14.83, p < 0.001), and this effect declined significantly with increasing latency (t(14289) = 14.28, p < 0.001). Distractors with movement salience also produced significantly greater OD than baseline, although rather less than they had produced in Experiment 2 (+0.45 degrees; t(14269) = 2.48, p = 0.013). This effect also declined with increasing latency but not significantly so (t(14283) = 0.35, p = 0.726). See Fig. 6 for an illustration of the latency-dependent effects of luminance salience on OD.

Table 3 Results from linear mixed effects model for luminance and movement salience
Fig. 6: OD as a function of latency, distractor luminance, and distractor movement
figure 6

Data are grouped into latency quartiles. Error bars show ±1 SEM

Coefficients and standard errors (SE) are given in degrees of OD. For luminance and movement, the reference level is the baseline level of OD when no distractor was present, as described in Data analysis above. Degrees of freedom (df) are based on the Satterthwaite approximation.

The combination of luminance salience and movement salience was slightly underadditive. Specifically, the increase in OD produced by movement when the distractor was also of increased luminance was 0.84 times what it was when the distractor was not also of increased luminance. However, this interaction was not significant (t(14270) = 0.32, p = 0.748), nor did it interact with latency (t(14282) = 0.32, p = 0.746). The comparison of additive and interactive models for luminance and movement favored the additive model according to both model selection criteria, and support for additivity according to the AIC was close to maximal (ΔAIC = 1.92, ΔBIC = 9.48). Figure 7 illustrates the lack of interaction between luminance and movement in their effects on OD.

Fig. 7: Mean OD for all four combinations of luminance and movement salience
figure 7

Error bars show ±1 SEM

Comparison of salience effects across experiments

In addition to the separate analyses described for each experiment above, we also performed a comparison of the effects of the orientation, luminance, and movement manipulations on OD between the three experiments. For each feature, we restricted the analysis to “baseline” trials, i.e., those trials in which the other two features were absent (or in the case of orientation contrast not fully absent but at its lower level of ±22.5 degrees). With these trials, we then fitted a linear mixed-effects model of the sort described above, in which the fixed effects were: saccade latency; the single feature in question; the experiment; and the interactions of these three factors. For example, in the analysis of orientation, the model had the form:

$$ \mathrm{OD}\sim {\mathrm{latency}}^{\ast }{\mathrm{orientation}}^{\ast}\mathrm{experiment}+\left(\mathrm{latency}|\mathrm{participant}\right) $$

To assess differences in the magnitude of the effect of a feature manipulation between experiments, the coefficient of interest is the interaction term between the feature and experiment (e.g., orientation:experiment). We tested this coefficient against zero, using the testing procedure with the lmerTest package described above for the previous analyses.

These tests showed a small but significant increase in the effect of the higher orientation contrast from Experiment 1 to Experiment 2 (+0.57 degrees, t(15622) = 3.34, p < 0.001; compare Figs. 3 and 5); a much larger and significant increase in the effect of luminance from Experiment 1 to Experiment 3 (+2.10 degrees, t(11874) = 9.90, p < 0.001; compare Figs. 3 and 7); and a moderate but significant decrease in the effect of movement from Experiment 2 to Experiment 3 (−0.88 degrees, t(13056) = 4.29, p < 0.001; compare Figs. 5 and 7).

General discussion

Using a method that measures salience directly from eye movements, at an early stage of processing, we found evidence that the visual system combines different sources of salience in different ways. Specifically, this pattern was such that orientation and movement salience combined underadditively, whereas luminance salience combined additively with both orientation and movement. This specific pattern is broadly similar to that found using subjective reports of salience; Nothdurft (2000) found that most combinations of salient features are underadditive, except for those involving luminance, which are additive or nearly so. Other studies have provided independent confirmation of the additivity of luminance salience with salience from other features (Poirier, Gosselin, & Arguin, 2008; Engmann et al., 2009).

However, our comparison of salience effects between the three experiments adds some subtleties and caveats to this general conclusion. First, the increase in the effect of luminance on OD from Experiment 1 to Experiment 3 suggests that there may yet be some degree of underadditivity at work in the combination of orientation and luminance salience. In Experiment 1, distractors always had some orientation contrast, and we merely varied the magnitude of this contrast (van Zoest et al., 2012). The baseline measure of the effect of luminance in Experiment 1 is therefore a measure of its effect when paired with the low orientation contrast of 22.5 degrees. In Experiment 3, the baseline manipulation of luminance occurred when the distractor was not salient by virtue of any other feature. A comparison of the effect of luminance between these experiments is therefore also implicitly a comparison of its effects at zero orientation contrast (Experiment 3) and low orientation contrast (Experiment 1). Interpreted in this way, the observed difference in the effect of luminance conforms to the pattern predicted for underadditivity; a reduction in the effect of luminance when orientation contrast is also increased. It is therefore possible that the additivity that we found for combinations of orientation and luminance in Experiment 1 is particular to the range of orientation contrasts used in this experiment (from 22.5 to 67.5 degrees) and that there may be underadditive interactions at lower magnitudes of orientation contrast.

Second, a similar interpretation of the difference in the effect of movement between Experiments 2 and 3 is possible. In Experiment 2, the baseline effect of movement was measured at the low level of orientation contrast, whereas in Experiment 3 it was measured at zero orientation contrast. In this case, however, the observed difference conforms to the pattern predicted for overadditivity; the effect of movement is greater when an orientation contrast is also present. Together with the analysis within Experiment 2, this conclusion suggests an overall pattern in which orientation and movement combine overadditively only at low levels of orientation contrast, but underadditively at higher levels of orientation contrast. This might occur, for example, because at low levels of overall salience a combination of salient features brings the overall salience signal from the distractor over a nonlinear “threshold” after which it is selected for further processing.

We focus now on the pattern of findings from within each of the three experiments: the underadditivity of orientation and movement, and their additivity with luminance, bearing in mind the caveat that these may be particular to the range of feature contrasts that we investigated. There are a few different possible explanations for this pattern. First, it is important to note that underadditivity need not necessarily be the result of interaction or communication between feature channels. Instead, it may be the case that orientation and movement are processed independently but that in some instances only one of them wins the competition to determine selection of the next saccade target, leading to an influence of orientation on some trials and an influence of movement on others, in turn resulting in an average influence of their combination that is less than the sum of their individual influences. However, this account predicts an effect of the combination of orientation and movement that is a weighted average of their two individual effects, and therefore cannot be greater than either one of these individual effects. This is not the pattern that we find. Instead, the combined effect of orientation and movement salience is nonetheless greater than either individual effect, although less than their sum. This suggests that the two signals are indeed combined and are not independent of one another. There is also neurophysiological evidence to suggest that orientation and luminance are not processed in independent channels. Some cells in V1 respond to both orientation and movement contrasts (Kastner et al., 1999). It therefore may be the case that the system “ignores” some orientation information when it is already processing movement, and vice versa.

That the same does not occur for the combination of orientation or movement with luminance may reflect the fact that cells on the retina and in early visual processing areas, such as the Lateral Geniculate Nucleus (LGN), are already sensitive to contrasts in luminance, whereas contrasts in orientation and movement are processed later in area V1 of the cortex. Luminance may have less opportunity to influence the processing of orientation or movement. Alternatively, it may be the case that the additivity of luminance with the other two sources of salience is a consequence of the primacy of luminance contrasts in the processing of the other two features. Orientation is a feature of edges, and detecting a difference in luminance that defines an edge will assist in the discrimination of orientation. Likewise, movement is a feature of objects, and detecting a difference in luminance between an object and its background will assist in detection of the object and therefore in discrimination of its movement. Effects of orientation or movement on overall salience can only occur if luminance has already been processed, with the result that, for example, the orientation channel of processing depends on the luminance channel for its inputs, and therefore is structurally unable to “discount” the processing of luminance, precluding an underadditive interaction between the two.Footnote 2

Whatever its physiological basis, the underadditivity of orientation and movement has important implications for how we think about and investigate these two features in visual perception. Although it is clear that orientation and movement represent dissociable physical features of the natural world, it does not necessarily follow that the visual system always dissociates them and encodes them as separate features. The ubiquity of gestalt phenomena vividly demonstrates the extent to which the visual system encodes certain combinations of features as new, irreducible percepts (Pomerantz & Portillo, 2011). Our results, together with earlier neurophysiological findings (Kastner et al., 1999; Nothdurft, 2000), suggest that to a certain albeit weaker extent, the visual system may do the same with combinations of orientation and movement, treating them as a single feature: “orientation-in-motion.” It is important to take into account the possibility of such composite features and to be wary of the assumption that every physically dissociable aspect of the world is treated as such in the visual system (Gozli & Deng, 2017), for example when studying the capacity of working memory in terms of numbers of features.

It is important to acknowledge potential criticisms of our conclusions. First, it may be argued that underadditivity does not reflect anything special to the particular combination of features for which it is observed but is rather produced by a simple general mechanism of saturation. The central salience map presumably has a maximum level of salience that it may represent at any one location. If one source of salience, for example orientation contrast, brings the overall salience map to this saturation level or close to it, then additional inputs will not be able to bring overall salience much higher. In this case, underadditivity would result but would not be special to the particular features being combined, merely a result of the fact that the particular levels of contrast chosen for one of those features happened to bring overall salience close to saturation. However, this is not a very plausible explanation of our results. If underadditivity were driven by saturation, then we would expect to see the most pronounced underadditivity in the combination of luminance and movement, because this combination produced the largest values of OD (approximately 3 degrees on average, and 7.5 degrees for the fastest saccades). Instead, we found underadditivity only for the combination of orientation and movement, which produced smaller values of OD (approximately 2.5 degrees on average, and 6 degrees for the fastest saccades) and therefore was further away from saturation than the combination of luminance and movement.

Second, whereas the “saturation” objection described above posits a ceiling effect to account for the findings, the presence of floor effects might also complicate the interpretation. In Experiment 3, we found much lower effects of movement than of luminance, and the effect of movement did not decline with increasing saccade latency as in the other experiments. It might be argued that this smaller overall effect of movement leads to smaller predicted differences in OD for the additive and underadditive models, therefore making it difficult to distinguish the two. Again, although this objection is plausible in its own right, it does not account for the particular pattern of results that we observed. Two aspects of the results indicate that Experiment 3 did not lack power to detect underadditivity. First, although the effect of movement was smaller here than elsewhere, it was still clearly detectable, as shown by the significant hypothesis test for the effect of movement in the model. More importantly, the weight of evidence in favor of the additive model in Experiment 3 was very close to that obtained in Experiment 1, where the main effects of both kinds of salience were greater (ΔAIC = 1.97 and 1.92, ΔBIC = 9.54 and 9.48 in Experiments 1 and 3, respectively). Were the failure to find underadditivity due to a lack of power to distinguish it from additivity, we would expect the weight of evidence to be more equivocal, i.e., to see differences in information criteria closer to zero.

A third criticism concerns the generalizability of our conclusions regarding movement. We investigated only one form of movement, whereas movement may take many different forms. For example, a moving object may jitter randomly, as in the present experiment, but also may rotate or move smoothly in one direction. However, there is some evidence to suggest that the effects of different kinds of movement on the salience system guiding eye movements are similar. Van Zoest et al. (2017) have shown that apparent rotation and continuous movement elicit similar amounts of saccade trajectory deviation, while differing from static and flickering distractors. Also supporting the generalizability of our conclusions regarding movement is the fact that Nothdurft (2000) came to the same conclusion as we did regarding the underadditivity of movement and orientation, despite manipulating movement contrast (i.e., a difference in direction of movement between distractor and background) rather than comparing moving distractors to stationary ones, as we did.

In addition to a neurophysiological explanation, it is reasonable to seek an evolutionary explanation for our pattern of results. The salience system may have evolved to combine information from different features efficiently. Underadditivity may reflect the redundance of two features that are correlated in natural environments. For example, if it is the case that fruit consistently changes both its color and its texture when ripe, then there is no need to pay attention to both the color and the texture in order to assess the ripeness of a piece of fruit; either ripe color or ripe texture alone is sufficient to make the fruit interesting. If the properties of the visual system match the statistical properties of natural scenes in this way (Ruderman, 1994), then the present results would imply that orientation and movement contrasts tend to be positively correlated in these scenes, whereas luminance does not tend to be correlated with either orientation or movement. Some work has been done in comparing independence of features in the visual system to their independence in natural images (Mante et al., 2005), but to our knowledge this has not yet been done for the combination of orientation and movement.

Like Nothdurft (2000), we also found no evidence of overadditivity. At least for the features we investigated, it does not seem to be the case that different sources of salience enhance one another. Why does overadditivity not occur? Or where might we expect it still to occur? If, as suggested above, underadditivity of two features reflects the fact that they are positively correlated in the natural environment, then overadditivity might be expected for combinations of contrasts that are negatively correlated. If two features are positively correlated, the presence of one makes the presence of the other unsurprising and therefore not worthy of attention, whereas if they are negatively correlated, the occurrence of one makes the occurrence of the other more surprising and therefore ought to make it more attention-grabbing than otherwise. Perhaps the lack of overadditivity found in our results reflects the fact that none of the feature combinations we investigated are negatively correlated in natural scenes. Indeed, it seems reasonable to expect that most feature contrasts are positively correlated. For example, moving objects tend to be people or animals, and people and animals also often stand out from the background because of their shape or color. It is unclear whether there are any combinations of feature contrasts that we would expect to be negatively correlated in natural scenes.

In conclusion, our measurements of salience using saccade trajectory deviations add to a growing body of converging evidence from different methods that suggests the visual system combines salient features in a way that varies according to which features are being combined. In particular, this pattern suggests that the visual system does not treat orientation and movement as independent sources of information. Further investigations should determine whether this pattern is general to a broad range of intensities of the feature contrasts or whether the combination of feature contrasts changes qualitatively depending on the intensity of the contrasts.