Introduction

In our daily lives, we continuously scan the visual world around us and focus our visual attention on important visual objects. From searching a familiar face to looking for informative road signs, from seeking a ripe fruit on a tree or in the market to looking for an app icon on a cluttered smartphone or desktop screen, all these visual-search tasks demand much visual attention. And yet, this visual behavior is not unique just to humans, as most non-human species also need to guide their visual attention during their daily lives for mating (Aquiloni & Gherardi, 2010; Sztatecsny, Strondl, Baierl, Ries, & Hödl, 2010; Zhang et al., 2018), hunting (Honkavaara, Koivula, Korpimäki, Siitari, & Viitala, 2002; Sjöström, 1985; Surmacki, Ożarowska-Nowicka, & Rosin, 2013), navigating (Collett, 1987; Dacke, Baird, Byrne, Scholtz, & Warrant, 2013; Garm & Nilsson, 2014; Philippides, Baddeley, Cheng, & Graham, 2011; Schwarz, Mangan, Zeil, Webb, & Wystrach, 2017; Srinivasan, Zhang, Lehrer, & Collett, 1996a), and to avoid predators (Allen et al., 2010; Amo et al. 2004; Isbell, 2009; Murali, 2018; Smolka, Zeil, & Hemmi, 2011). Hence, the study of how visual attention is guided in non-human species is also crucially important and can teach us a great deal about universal behavioral principles and information-processing strategies across the animal kingdom.

Since work in this field has been dedicated almost entirely to humans, in this paper we focus on the visual search abilities of animals across the animal kingdom. We review the work on visual search in non-human species with special emphasis on the experimental technique and the similarities or differences to human results. We also concentrate on a specific unique species, the archerfish, which is significantly far from humans, both in its brain structure and in its living environment. We do so by first summarizing previous results obtained by our group and others, followed by new and comparative experimental data on both the archerfish and humans when confronted with virtually identical experimental tasks. Finally, we discuss how non-human visual search results help understand the basic principles that govern visual attention among species in nature.

Feature Integration Theory (in humans)

One of the most fundamental and influential theories that accounts for the ways in which visual attention is driven in the early visual processing system is Feature Integration Theory, which was introduced by Treisman and Gelade in the early 1980s (Treisman & Gelade, 1980). In this theory, processing a visual stimulus takes place in two separate stages. In the first stage, the features that compose a visual stimulus are processed in parallel by the visual system to yield a representation that encodes not only the location but also the basic visual properties of the stimulus such as color, orientation, motion, etc. This stage is called pre-attentive since it does not elicit attention (as would a selection process). It occurs automatically as part of early visual processing and before the organism becomes conscious of what it is looking at. The second stage is termed the attentive stage, where the organism focuses its attention on a certain object or location in the visual field. This occurs within a master map of stimulus locations where features have been detected. After attention a certain location to focus on is selected, and the individual features at that location are integrated to facilitate the perception of the whole object.

One popular way to test how biological mechanisms rely on Feature Integration Theory is through controlled visual-search experiments. In a typical visual-search experiment, subjects search for the presence or absence of a target in a display as quickly and as accurately as they can, where the target is characterized by a certain combination of features that distinguish it from distractors around it. Using this methodology, and under the classic framework of Feature Integration Theory, Treisman and her associates characterized two distinct visual search modes, which they dubbed parallel and serial (Treisman & Gelade, 1980; Treisman & Gormican, 1988).

The parallel search mode occurs in the pre-attentive stage, where certain visual features are processed immediately throughout the entire visual field and in a hugely parallel fashion. If any of these features uniquely characterizes the target, the latter pops out to complete the search. This automatic process is reflected in a reaction time that is relatively unaffected by the number of items (a.k.a. the set size), such that the slope of its reaction time graph as a function of the number of distractors is around zero (Treisman & Gelade, 1980; Wolfe, 1998a). This is illustrated in Fig. 1A, where the gray fish pops out from the white fish background due to the difference in their color/luminance feature. The gray fish would continue to pop out regardless of the number of distracting white fish, as indicated by the blue line in Fig. 1B. The serial search mode, on the other hand, occurs in the attentive stage, where attention can only be focused on a subset of the visual field, thereby forcing the observer to process each display item separately and sequentially in time. Hence, in this search mode, reaction time grows with the set size and the slope of the reaction time graph is positive (as indicated by the red line in Fig. 1B). An example of this search mode is illustrated in Fig. 1C where the target (gray and right-looking fish) does not pop out, forcing the observer to scan the displayed visual items one by one to find the target. Naturally, in such a scenario, the time to find the target will grow with the number of distracting items.

Fig. 1
figure 1

Parallel and serial search slopes. (A) When a target is defined by a unique color/luminance (the gray fish), it pops out, making the search highly efficient. (B) A cartoon summary of parallel and serial search slopes. In parallel search (blue line), the reaction time does not change as a function of the set size and therefore the slope is zero. In contrast, in serial search (red line), the reaction time increases as a function of the set size and therefore the slope is positive. (C) A conjunction of color/luminance and direction makes the target difficult to find and forces the observer to scan each item separately until it detects the target (the target is the gray fish that is swimming to the right)

As a result, the visual search mode exhibited on each search task is determined by the features of the target and the distractors, as well as their inter-relation (Duncan & Humphreys, 1989; Eckstein, 1998; Nakayama & Silverman, 1986; Treisman & Souther, 1985). For instance, a considerable amount of evidence indicates that features such as color, size, orientation, and motion make the target "pop out" of the display (Wolfe & Horowitz, 2004). According to Feature Integration Theory, this is because such features are easily accessible to the human visual system during the pre-attentive stage. In other visual-search tasks, where targets are defined by less accessible properties or non-unique combinations of properties, the target does not always pop out, forcing the observer to serially search for the target (Wolfe, 2001a; Wolfe & Horowitz, 2004). Hence, which features drive visual attention in visual search scenarios is one of the fundamental questions addressed when conducting visual-search experiments. A significant body of research and a variety of theoretical models have been studying this and other aspects of visual search in humans for more than two decades now (Duncan & Humphreys, 1989; Hochstein & Ahissar, 2002; Itti, Koch, & Niebur, 1998; Treisman & Gelade, 1980; Treisman & Gormican, 1988; Wolfe, 1994, 1998b), and new ways for running countless visual search trials for additional insights have been emerging in recent years (Abeele, Tierens, De Schutter, De-Wit, & Geurts, 2015; Mitroff et al., 2015). Because of explicit focus on non-humans this literature is not reviewed here but we refer the reader to recent surveys on this subject (e.g., Eckstein, 2011; Wolfe & Horowitz, 2017).

Feature Integration Theory beyond humans

Indeed, although the research effort discussed above followed the ideas of Feature Integration Theory, it was restricted almost entirely to humans, leaving interesting questions open. Are Feature Integration Theory and its contemporary variations applicable beyond humans? If so, what features facilitate “pop out” in different animals? If a pre-attentive stage occurs in non-humans, do different animals maintain different feature maps? And perhaps first and foremost, since visual search in humans is considered a cortical function, could animals that lack a fully developed cortex be able to perform efficient visual search, and if so, how? These questions are especially interesting given the many other non-human species that depend on vision to locate naturally important objects embedded in their complex visual environment.

To help bridge this gap, we first turn to review visual search and visual saliency in non-human species and present the similarities and differences to humans. Second, to try to address more closely some of the questions above, we present our own work on a specific visual species, the archerfish, which we investigated in an extensive set of visual-search tasks, comparing it to human performance in virtually identical experiments and scrutinizing the results in light of the topic of this special issue, namely Feature Integration Theory.

Vision of non-human vertebrates

Before exploring visual-search behavior in animals, it is important to mention that an extensive body of research has been devoted to investigating the visual behavior and visual system of non-human species and their relationship to their habitat (Allan, Day, & Edman, 1987; Darmaillacq, Mezrai, O'Brien, & Dickel, 2017; Douglas & Djamgoz, 2012; Döring & Chittka, 2007; Hart, 2001; Leonhardt, 2017; Marshall, Carleton, & Cronin, 2015; Sandström, 1999; Somanathan, Borges, Warrant, & Kelber, 2008; Somanathan, Kelber, Borges, Wallén, & Warrant, 2009; Zeil & Hemmi, 2006). In addition, a great deal is known about which visual features drive the vision of certain species and how they exploit them in their daily lives. For instance, color is known to enable various fish species to recognize other fish (Cheney, 2010). Spiders process color to avoid predators through disguise (Théry & Casas, 2002; Théry, Debut, Gomez, & Casas, 2004) or hunt honeybees by reflecting UV light that attracts them to flowers (Heiling, Cheng, Chittka, Goeth, & Herberstein, 2005). Color is also used as a signal by male damselflies to reduce intrasexual harassment (Schultz, Anderson, & Symes, 2008). Orientation, a different visual feature, was shown to trigger saliency in the archerfish, causing it to shoot an incongruently orientated target even without training (Mokeichev, Segev, & Ben-Shahar, 2010). Finally, motion is known to serve a plethora of species, for example helping spiders capture prey (Bartos & Minias, 2016).

However, while these and many other studies help us to understand how different animals use different visual features to their advantage, they have not explicitly addressed what features guide attention or enable pop out, and they have not quantified the efficiency of features in visual behavioral tasks. Thus, many questions relating to Feature Integration Theory in non-human species remain unanswered.

Visual search in non-human vertebrates

That being said, several studies have, however, examined visual search in selected non-human species, in particular certain species of fish, birds, insects, and terrestrial mammals. In these studies, the efficiency of visual features is typically determined by the target-selection rate (i.e., how often the animal actually finds and selects the designated target), but sometimes also by the reaction time and occasionally by both (Botly & De Rosa, 2011; Cook, Cavoto, & Cavoto, 1996). Interestingly, many of these studies have reported a great deal of similarity in the visual-search behavior of humans and non-humans in both feature search and conjunction search tasks, but certain differences in various shape searches.

Before reviewing some of this work, it is worth mentioning the obvious: visual-search experiments are far more complicated to perform with animals than with humans. There are two main reasons for this difference. First, it is overwhelmingly easier to communicate the task to human subjects. In animal studies, the animal must be trained to understand what constitutes the target, so it can then search, find, and select it when presented experimentally. This training can last up to several months (e.g., Orlowski, Ben-Shahar, & Wagner, 2018) and should be done with caution, as concluded from human studies, (Buračas & Albright, 1999; Nothdurft, Pigarev, & Kastner, 2009). Furthermore, since new training is required for every new visual-search experiment, testing the same animal on multiple types of visual-search tasks demands multiple training periods, which a time-consuming procedure. The second complication stems from the fact that typical training and experimentation is based on food rewards on each trial. The possible number of trials per day is thus limited by the animal's food-intake capacity. Humans, on the other hand, are rewarded differently (financially or with course credits, etc.) allowing a relatively extensive set of trials per experiment (e.g., Dickinson, Haley, Bowden, & Badcock, 2018; Rich et al., 2008; Sunder & Arun, 2016); the most typical constraint in human studies is cognitive capacity (boredom, fatigue, focus, motivation, etc.). Comparatively, an hour's worth of experimentation with human participants might require weeks or months to run on certain animals. Overall, visual-search experiments with non-humans are thus challenging and involve much creativity to extract quality behavioral data while preserving the wellbeing of the subjects.

In the aquatic world, one of the most extensively studied species for its visual-search behavior is the archerfish. This fish is especially well known for its extraordinary ability to shoot down terrestrial insects found on foliage and low-lying branches by squirting a strong accurate jet of water at them. Because it is possible to train the archerfish to shoot at artificial objects presented on a monitor, this behavior provides a window of overt observation on the fish's visual decisions and thus enables controlled behavioral experiments in the lab (Ben-Simon, Ben-Shahar, Vasserman, Ben-Tov, & Segev, 2012a; Gabay, Leibovich, Ben-Simon, Henik, & Segev, 2013; Newport, Wallis, Temple, & Siebeck, 2013; Schuster, Rossel, Schmidtmann, Jäger, & Poralla, 2004a) .

Recent work shows that the archerfish exhibits both parallel and serial search modes (Ben-Tov, Donchin, Ben-Shahar, & Segev, 2015). Since the archerfish is highly responsive to movement, the stimulus was composed of moving bars. The findings indicated speed and direction of the moving bars cause the target to pop out, but not the size of the moving bar. A different study demonstrated that searching for an image of prey embedded in a set of different stationary distractors elicited serial search in the archerfish in a similar way to in humans (Rischawy & Schuster, 2013). Working with other (non-spitting) species of fish clearly requires other means of extracting their visual decisions, for example by training them to approach the target (Fuss, Bleckmann, & Schluessel, 2014; Simpson, Marshall, & Cheney, 2016; Wyzisk & Neumeyer, 2007). By using this method, it was possible to demonstrate that color features also pop out during visual search by Zebrafish (Proulx, Parker, Tahir, & Brennan, 2014).

In the avian world, pigeons were the first to be widely investigated in the 1970–1990s since they can perform visual-search tasks by pecking at their selected target on a monitor. Data indicated that both geometric (Blough, 1977) shape or letters (Blough, 1984) were inefficient features that led to longer reaction times and drops in target-selection rates as the set size increased. The familiar effect of line terminator asymmetry in humans (i.e., Q vs. Os and O vs. Qs) was shown not to exist in pigeons (Allan & Blough, 1989). Specifically, a line terminator with circle, square, or triangle led to pop out in humans, but not in pigeons. Last, by comparing the target-selection rates of the tasks, it was shown that the color, size, orientation, or shape features are more efficient than a conjunction search in pigeons (Cook et al., 1996).

In addition, visual-search behavior of blue jays was investigated using similar techniques of pecking on a monitor in an operant chamber. In one experiment, the blue jay’s task was to detect an image of a cryptic moth (namely the target) embedded in different backgrounds with different levels of crypticity. As the complexity level of the background increased, the reaction time increased and the accuracy decreased, a classic characteristic of serial search mode (Bond & Kamil, 1999). Notably, instead of increasing the number of distracting objects as in a classic visual-search experiments, the researchers increased the difficulty level of discriminating between the target and the background. Also, rather than testing how the target’s and distractor’s features affect visual search per se, the research was aimed to test how different strategies such as cuing and sequential priming, which are used to obtain initial focal attention, affect the visual-search behavior (Bond & Kamil, 1999; Goto, Bond, Burks, & Kamil, 2014)

In the last decade, barn owls have become a major animal model in saliency and visual search research (Harmening, Orlowski, Ben-Shahar, & Wagner, 2011; Orlowski et al., 2015; Orlowski et al., 2018). Unlike most species, barn owls lack practically all eye movement, which in fact aids in the observation of their visual behavior through head tracking. In one study, a tailor-made forward-looking wireless video camera (Fig. 2A) was mounted on a barn owl's skull (the OwlCam) (Fig. 2B) enabling a frontal view from the owl's point of view. By using image processing and automatic computer vision analyses it was possible to identify the items it fixated on during free behavior (including flying) and in what order (Fig. 2C and 2D). In a series of works (Harmening et al., 2011; Orlowski et al., 2015; Orlowski et al., 2018) that employed this methodology, and with stimuli either projected or physically organized as real objects on the floor in front of animal subjects, barn owls were shown to exhibit both parallel and serial search modes. Specifically, orientation and luminance-contrast search tasks elicited the parallel search mode (Harmening et al., 2011; Orlowski et al., 2015). In these tasks, the number of saccades (e.g., Fig. 2C and 2E, dashed lines) and search time (Fig. 2E, dashed lines) did not change significantly with set size. On the other hand, low-contrast feature tasks and certain conjunction tasks (involving both high- and low-contrast orientation) elicited serial search (Orlowski et al., 2018). These tasks were characterized by a linear increase in both the number of saccades and the search time as the set size increased (Fig. 2E, solid lines) and by a relatively large number of saccades until finding the target. A typical scan path until finding a conjunction target is shown in Fig. 2D.

Fig. 2
figure 2

Barn owl's performance in visual-search tasks. (A) The OwlCam is a wireless video camera that weighs 5.5 g in total and is designed for fixed mounting on the owl’s skull. (B) A barn owl with a mounted OwlCam. The camera enables a frontal view from the owl's point of view, which, with processing, can identify the items the owl fixated on during search or other visual tasks. (C) An example of panoramic scene reconstructions of OwlCam videos and scan paths (dash lines) and fixation spots locations (circles) up to the first fixation on the target (blue disk) during an orientation task. The numbers near the circles indicate the fixation number. The stimulus arrays contain (from left to right) 16, 36, and 64 items. Note that the number of saccades stayed relatively constant despite the change in set size, indicating that the orientation feature elicited pop out in the barn owl. (D) Serial searches are characterized by a relatively large number of saccades until the target is found. This example depicts the scan path during a conjunction search task for a target with a particular contrast and orientation (shown in inset e2). In this case, the target was detected at fixation number 10. (E) A conjunction task combining high contrast and orientation elicited serial search and was characterized by a linear increase in both the number of saccades and the search time (until target detection) as a function of set size (the two solid lines, representing the two barn owls' performance). The dots are the average over all trials per condition and the error bars denote the standard error of the mean. For comparison, the dashed lines represent high luminance-contrast feature task results that elicited pop out in the same owls. A and B images are adapted from Harmening et al. (2011), C from Orlowski et al. (2015), and D and E from Orlowski et al. (2018).

Among insects, one of the most highly investigated species are bees, including their performance in visual-search tasks of color features. Bees are first trained during their foraging bout to visit a sucrose solution feeder placed inside an experimental arena near the hive (Fig. 3A). In a typical experimental trial, the bee enters the arena through a small entrance (Fig. 3A) and then faces a visual-search task composed of various visual items (e.g., colored disks) placed on the opposite wall (Fig. 3B). Each such item is equipped with a reward feeder attached to its back behind a small opening in its center (Fig. 3A and 3B). On each trial, the bee is thus required to make a visual decision to obtain the reward. This decision, as well as the time to make it, are documented for further analysis.

Fig. 3
figure 3

Visual search performance in bees is species dependent. (A) The experimental bee arena. (B) An example of a visual-search task composed of colored disks mounted around the entrance to the feeders. The feeder centered at the target disk contained the sucrose reward. To determine which colored disk was selected by the bee, a wire frame was placed 5 cm before the disks, forming a decision line (dashed line in A and B). (C) Error rate as a function of set size did not change for bumblebees (dashed and solid black lines, \( {\upchi}_3^2=7.549,\mathrm{p}=0.056 \)) but increased for honeybees (dashed and solid gray lines, \( {\upchi}_3^2=26.804,\mathrm{p}<0.001 \)). Values are means ± standard error of the mean. (D) The decision time as a function of set size of bumblebees (dashed and solid black lines) was approximately 30% longer than for honeybees (dashed and solid gray lines). Values are means ± standard error of the mean. A and B images are adapted from Spaethe et al. (2006) and C and D from Morawetz and Spaethe (2012)

Morawetz and Spaethe (2012) showed that visual-search performance in bees is species dependent. Specifically, error rates increased as a function of set size but only in honeybees and not in bumblebees (Fig. 3C). At the same time, bumblebees took approximately 30% longer than honeybees to make their decisions (Fig. 3D), although their performance was not affected by set size in a significant way. These findings suggest that the speed-accuracy trade-off to detect the target is species dependent, which the authors attributed to ecological pressure. Given the decision time and error rate, visual search of color features elicits serial search in honeybees and a parallel-like search in bumblebees.

Perhaps surprisingly, non-human terrestrial mammals have not been widely studied in the context of visual search, with monkeys being by far the most studied animal in this group, employing a variety of methodologies. For example, in one classic feature and conjunction visual-search experiment (Bichot & Schall, 1999), the researchers measured the monkey's saccade latency slope and error rate as a function of the set size and demonstrated that color and simple shape searches are more efficient than their conjunction counterparts (Fig. 4A). In related forced-choice target detection experiments, reaction-time measurements suggested that targets defined by color pop out whereas tasks defined by shape, or a conjunction of shape and color, are characterized by reaction times that increase with set size (Wardak, Ben Hamed, Olivier, & Duhamel, 2012) (Fig. 4B). Methodologically, trials (after training) in these experiments begin with the monkey's hand in contact with a lever while it focuses on a central fixation point. Shortly after, the visual search display appears and the monkey's task is to press the lever only when the stimulus contains a target (i.e., target-absence trials require no response). This way was also used to demonstrated that Macaques, like humans, exhibit parallel search in color or motion feature tasks but serial search in conjunction tasks of color and motion (Buračas & Albright, 1999). Other visual-search experiments teach us about monkeys' sensitivity to complex visual features. For example, experiments with facial features indicate the importance of these features for monkeys. This insight was obtained by measuring faster reaction times on visual-search experiments of a monkey image versus a car or a house image, frontal view image versus a profile view image, upright view image versus inverted view image, or internal facial parts image (i.e., eyes, noise, mouth) versus external facial parts image (Tomonaga, 2007; Tomonaga & Imura, 2015). Other complex visual features were also shown to be processed efficiently by monkeys such as shading features (Tomonaga, 1998).

Fig. 4
figure 4

Monkeys are used in a variety of visual-search experiments. (A) Saccade latency as a function of set size in monkeys. Triangle and square markers represent the average latencies of each monkey and solid circles represent the average latencies of the two monkeys. Feature-task results are represented by dashed lines and conjunction-task results are represented by solid lines. The average slopes make it clear that color feature and simple shape searches are more efficient than their conjunction counterparts. (B) Left panel: Three visual-search tasks conducted on monkeys. The target in all examples is the pink diamond. The tasks include (from top to bottom) a color feature task (blue box), a difficult shape task (green box), and a conjunction task (red box). Right panel: The mean reaction-time slope as a function of set size in the three search tasks for each monkey. The three search tasks are coded with the same colors as the stimulus presented in the left panel. (C) Two monkeys in this experiment were tested for a speed-accuracy tradeoff in visual search. They were trained to respond either as quickly or as accurately to a visual-search task containing shapes (experimental flow is demonstrated on the left panel). This experiment produced a classic speed-accuracy tradeoff result, with faster and more error-prone responses in the “time-condition” compared to the “accuracy-condition” (right panel). (D) The average latency of humans (left) and three monkeys (right). Both species responded faster to pictures of snakes (i.e., a threat) among pictures of flowers (i.e., a neutral distractor) than the other way around, despite having no experience with snakes. Image A is adapted from Bichot and Schall (1999), B from Wardak et al. (2012), and C from Heitz and Schall (2013). The left panel of image D is reproduced based on LoBue and DeLoache (2008) and the right panel of image D is adapted from Shibasaki and Kawai (2009)

In addition to the relation between different visual features and the search mode they elicit, monkeys also demonstrate the classic speed-accuracy tradeoff behavior (Heitz & Schall, 2013). Typically, monkeys are trained to make a single saccade to a target (T or L shape) immediately after a stimulus appears and to maintain their gaze for 750 ms (Fig. 4C, left panel). Monkeys are trained to respond either as quickly or as accurately as they can after the initial cue. The results showed that in the “time condition” the trial responses were faster but less accurate (i.e., involved more erroneous responses) than in the “accuracy condition” (Fig. 4C, right panel).

Besides the effect of visual features, the effect of biological threat has also been tested in the visual search framework. Humans are known to have an inherent sensitivity to threatening stimuli, a mechanism that is assumed to have evolved to improve survival (Isbell, 2006). As a result, humans detect threats (e.g., snakes) faster than neutral objects (e.g., flowers) when embedded in a field of distractors of the other type (LoBue & DeLoache, 2008; Öhman, Flykt, & Esteves, 2001a). Shibasaki and Masahiro (2009) suggested that this protective mechanism is inherited from a common ancestor of humans and monkeys. This intrinsic property was well demonstrated by the observation that monkeys that were born and reared in a laboratory with no experience of snakes whatsoever, responded similarly to humans on the same kind of tasks (Fig. 4D), i.e., both species detect a target image of a snake among distracting images of flowers more quickly than the other way around.

Unlike monkeys, rats exhibit a somewhat mixed performance, with essentially fixed reaction times for luminance, shape, and certain conjunction search tasks (Fig. 5A), but increasing error rates with set size in the latter (Fig. 5B), indicating relative inefficiency compared to feature tasks (Botly & De Rosa, 2011). These experiments are commonly conducted in an operant conditioning chamber equipped with a touch screen for the presentation of the visual-search task (Fig. 5C) (Horner et al., 2013). Not unlike other species, the rats underwent a long training procedure to acquire the ability to approach the screen and make their choice by touching the object with their nose or front paws. If they responded correctly, a light above the water well was activated to indicate the availability of a reward.

Fig. 5
figure 5

Rats exhibit parallel and serial visual search. (A) Median reaction time did not change with set size in a feature task (dark gray) or a conjunction task (light gray). Error bars denote the standard error of the mean. (B) Accuracy, on the other hand, did decrease as the set size increased, and more significantly so for the conjunction task (light gray) than for the feature task (dark gray) (Task Type F1, 15 = 179.63, p < 0.001, η2 = 0.92), indicating relative inefficiency compared to the feature tasks. (C) The visual search stimuli for the feature and conjunction tasks with three different set sizes (three, five, and seven distractors). The target was always the white square. There were two different types of feature search trials: Homogeneous and heterogeneous. On homogeneous trials, all distractors were identical, whereas on heterogeneous trials, the distractors were not identical. Specifically, one type of heterogeneous feature search trial required discrimination of the target from the distractors based on luminance (dark or bright) and the other type required discrimination based on shape (square or triangle). Images are reproduced based on Botly and De Rosa (2011)

Finally, very few papers deal with the visual search ability of dogs and cats. It was demonstrated that dogs can also perform visual-search tasks, though previous work has focused mostly on age-related effects rather than the fundamental visual features that guide performance (Snigdha et al., 2012). Similarly, based on their good ability to differentiate texture patterns, indirect evidence show that cats can exhibit parallel search in certain circumstances (Wilkinson, 1986).

Neuronal mechanisms of visual search in non-humans

It is worth mentioning that one unique aspect in visual search research in non-humans is of course the ability to perform electrophysiological measurements and lesion studies to explore the brain regions and neuronal mechanisms that take part in visual search. For example, it was demonstrated that lesions that reduced cholinergic afferenation of the neocortex in rats, significantly disrupted their performance during a conjunction search task of color and shape but not during a feature task (Botly & De Rosa, 2011). This implies that cholinergic afferentation contributes to visuospatial attention and is important for feature binding at least in rats. A different lesion study showed that the rat parietal cortex is important when trying to learn one and two feature components of a visual-search task and that it plays a part in feature binding (Kesner, 2012).

To try and explain what the neuronal mechanism of pop out is, visual neurons were recorded during an electrophysiological experiment while taking part in visual-search tasks. One common finding was that contextually modulated neurons play a part in visual search, having a stronger response when the stimulus feature inside their classic receptive field is in contrast to the stimulus feature outside of it (e.g., a vertical bar against a background of horizontal bars elicits a greater response than a stimulus composed of bars in the same orientation). Specifically, such neurons were found to be sensitive to orientation in the primary visual cortex of the macaque (Knierim & van Essen, 1992; Nothdurft, Gallant, & Van Essen, 1999; Slllito et al. 1995) and velocity in the area MT of owl monkeys (Allman, Miezin, & McGuinness, 1985), to orientation and motion in the cat's striate cortex (Kastner, Nothdurft, & Pigarev, 1997; Kastner, Nothdurft, & Pigarev, 1999; Xu, Wang, Song, & Li, 2013), and to motion in the archerfish (Ben-Tov et al., 2015). Since these neurons compare an object’s feature (in their receptive field) to the other objects that surround it, it was hypothesized that they are part of the neuronal substrate of the pop-out mechanism.

Neural mechanisms aside, in this work we want to focus on the behavioral visual search capacity of animals, with a focus on our own animal model, the archerfish. Toward this end, we here review earlier studies on this species and then describe how we studied it.

The special case of the archerfish

The accumulating initial evidence mentioned above is beginning to shed light on visual search in non-human species. Nevertheless, more data are needed to fully reveal visual-search strategies in different species, what aspects of Feature Integration Theory are applicable across species, which other species have evolved pop-out sensitivity, and how this sensitivity is implemented cognitively.

For this reason, we further explore an animal model that is significantly remote from humans, both anatomically and behaviorally. Specifically, we elected to extensively study visual search capacities in the archerfish, and equally importantly, to compare fish to humans in virtually identical tasks. The archerfish is a small aquatic vertebrate that lacks a fully developed cortex (Karoubi, Segev, & Wullimann, 2016) and lives primarily in mangrove habitats of the South Pacific and Indian oceans (Allen, 1978; Ringach, 2002; Simon & Mazlan, 2010). It has the incredible hunting skill of shooting down terrestrial insects by squirting a water jet at them. This is attributed to its complex visual behavior abilities (Ben-Tov, Ben-Shahar, & Segev, 2018; Rossel, Corlija, & Schuster, 2002; Schlegel & Schuster, 2008; Schuster, Rossel, Schmidtmann, Jäger, & Poralla, 2004b; Schuster, Rossel, Schmidtmann, Jäger, & Poralla, 2004b; Tsvilling, Donchin, Shamir, & Segev, 2012) and good eyesight (Ben-Simon, Ben-Shahar, Vasserman, Ben-Tov, & Segev, 2012b; Temple, Hart, Marshall, & Collin, 2010; Vasserman, Shamir, Simon, & Segev, 2010).

Based on earlier initial findings in the context of visual search (Ben-Tov et al., 2015; Ben-Tov et al., 2018; Mokeichev et al., 2010; Rischawy & Schuster, 2013), we demonstrated in our recent study (Reichenthal, Ben-Tov, Ben-Shahar, & Segev, 2019) that the archerfish not only exhibits pop-out behavior, but the latter is facilitated by four common visual features known to elicit the same behavior in humans, namely color, size, orientation, and motion. We also found that shape tasks elicit serial search, with reaction time that grows, and target-selection rate that decreases, with the number of distractors. Lastly, we showed that conjunction search tasks of size and color are asymmetrical. Parallel search was elicited in the archerfish when a large blue target was embedded in a field of small blue and large black distractors. In contrast, serial search was elicited for a small blue target amidst small black and large blue distractors.

In this paper, we extend this set of visual-search experiments in the archerfish by first presenting a new shape task and exploring the type of visual-search behavior it elicits. Equipped with a rather comprehensive visual search set of results in the fish, we then discuss how they follow the predictions of Treisman’s Feature Integration Theory. Importantly, we also present an extensive comparison between fish and human visual search by running virtually identical search tasks in both species, the collection of which is inspired all the way back from the original Feature Integration Theory work (Treisman & Gelade, 1980).

Methods

Subjects

Archerfish

Seventeen archerfish (Toxotes chatareus), 6–14 cm in length, were used in this study. The archerfish were caught in the wild and purchased from a local animal vendor. Each fish was housed in a separate water tank, 32 X 50 x 28 cm in size, filled with brackish water (2–2.5 g of Red Sea salt mix per 1 L of water). The water was filtered, oxygenated, and kept at a temperature of 25–280 C. The room was illuminated with artificial light on a 12:12-h day-night cycle. Each fish participated in approximately three experiments each week during which they received dry food pellets as a reward that also constituted their ration of food.

All experiments were approved by the Ben-Gurion University of the Negev Institutional Animal Care and Use Committee and were in accordance with government regulations of the State of Israel. All experiments also adhered to the ARVO Statement for the Use of Animals in Ophthalmic and Vision Research.

Humans

Eleven students (nine females and two males, mean age = 26.7 years, range: 20–35) at Ben-Gurion University of the Negev took part in this experiment. All subjects read and signed a consent form and were paid 40 NIS (~US$11) per hour. All subjects reported having normal or corrected-to-normal color vision and eyesight and were naïve as to the purpose of the experiment. Every subject performed all the experiments. All experiments were approved by the Ben-Gurion University of the Negev ethics committee.

Apparatus

Archerfish

Stimulus generation and presentation was via Powerpoint (Microsoft Inc., USA). The experiments were recorded using an HD camera (Handycam, HDR-CX240, Sony, Japan) at 25 frames per second and stored offline for further analysis. The stimulus was presented on an LCD screen (VW2245-T, 21.5-in., BenQ, Taiwan) placed on top of a transparent tempered-glass plate 32 cm above water level (Fig. 6A). To prevent items in the room from distracting the fish during the experiment, coroplast boards were placed on the walls of the aquarium before the experiment began. For the same reason, only the experimenter was allowed in the room during the experiment.

Fig. 6
figure 6

The experimental setup and visual-search tasks. (A) The fish were trained to shoot at an artificial target presented on an LCD screen. The screen was placed on top of a transparent tempered glass plate 32 cm above the water level. Experiments were recorded using an HD video camera and stored offline for further analysis. (B–E) Group 1 Feature-search experiments: In these experiments, we examined four features that turned out to be facilitators of efficient visual search: Color, Size, Orientation, and Motion. (F–G) Group 2 Shape experiments: (F) The target was a solid disk and the distractors were Pac-Men. (G) The target was a solid "Q" and the distractors were solid "Os". (H–I) Group 3 Conjunction of size and color experiments: The target was characterized by size and color, either large-and-blue or small-and-blue

Humans

The stimulus slides were created via Powerpoint (Microsoft Inc., USA) and stored as images. Stimulus presentation and data collection were controlled by MATLAB and the Psychophysics Toolbox (Brainard & Vision, 1997). Subjects were seated 70 cm from a 22-in. LCD screen (SyncMaster 2243BW, Samsung) with a refresh frequency of 60 Hz and a spatial resolution of 1,680 × 1,050 pixels. A chin rest was used to keep the head position constant throughout the experiment. To ensure focus during the experiment, the subjects were seated alone in a dimly-lit room and were given short breaks between experiments (see Design and Procedure for details).

Stimuli

The visual-search experiments on the archerfish were based on traditional visual-search tasks human subjects are asked to perform. Since we wanted to compare the two species, we eliminated the possibility of stimulus-dependent results by testing the human subjects on the very same stimuli used with the archerfish. However, to adjust the stimuli for complexity, the fish completed tasks with up to 12 distractors whereas the humans confronted stimuli with up to 38 distractors. Three groups of experiments were conducted, as described below.

Experiment Group 1 – Feature search experiments

Experiment 1A: Color experiment

The target was a colored solid disk and the distractors were colored solid disks of a different color (Fig. 6B). For the archerfish the colors were selected randomly and were red ([1 0 0] RGB, 52 Lux), black ([0 0 0] RGB, 1.1 Lux), or blue ([0 0 1] RGB, 11 Lux) (red vs. blue in three experiments, blue vs. red in two experiments, black vs. red in two experiments, red vs. black in two experiments). For humans the target was black ([0 0 0] RGB, 1.1 Lux) and the distractors were red ([1 0 0] RGB, 31.5 Lux)

Experiment 1B: Size experiment

The target and distractors were all black solid disks but differed in size (Fig. 6C). In one set of experiments the target diameter was twice the size of the distractors (2 cm vs. 1 cm), and in the second set of experiments the target diameter was half the size of the distractors (1 cm vs. 2 cm).

Experiment 1C: Orientation experiment

The target and distractors were static Gabor patches – that is, patches composed from exponent functions multiplied by a cosine wave. The target and distractors were orthogonal in orientation to each other (Fig. 6D). Their orientations were constant throughout the experiment and aligned with the cardinal axes of the aquarium and the screen.

Experiment 1D: Motion experiment

The target and distractors were Gabor patches that had moving phases in opposite directions (Fig. 6E, where arrows indicate the phase direction). The speed of all the patches was 1.5 cm/s.

Experiment Group 2 – Shape experiments

Experiment 2A: Solid disk versus Pac-Men experiment

The target was a red solid disk and the distractors were red Pac-Men (three-quarter disks, Fig. 6F). To keep the average intensity of the target and distractors equal, the target diameter was 1.7 cm and the distractors were slightly larger and measured 1.95 cm in diameter.

Experiment 2B: Q versus Os experiment

The target was a red solid disk with a line terminator (as in "Q") and the distractors were red solid disks (as in "Os", Fig. 6G). Both the target and the distractor diameters were 1.5 cm.

Experiment Group 3 – Conjunction of size and color search experiments

Conjunction search tasks are naturally more difficult than feature-search tasks, and in humans usually elicit serial search (Wolfe, 1998a). However, it is possible that a certain basic feature will guide the attention and enable parallel search (Eckstein, 1998; Nakayama & Silverman, 1986; Treisman & Souther, 1985; Treisman & Sato, 1990). In this study, the target was defined by a combination of size and color features. To control for the possibility that, similar to humans, conjunction search can elicit an efficient visual search in the archerfish, we conducted two different conjunction experiments.

Experiment 3A: Large-and-blue experiment

The target was a large (2 cm in diameter) blue solid disk. Half of the distractors were small (1 cm in diameter) blue solid disks. The second half were large black solid disks (Fig. 6H).

Experiment 3B: Small-and-blue experiment

The target was a small blue solid disk. Half of the distractors were large blue solid disks. The second half were small black solid disks (Fig. 6I).

To verify that a serial search in the conjunction experiment (Experiment 3) was not the result of one of the single features alone (i.e., size or color), each archerfish and human subject also completed the color (Experiment 1A) and size (Experiment 1B) feature tasks, thus making a total of three experiments.

Design and procedure

Archerfish

A trial began with a blinking square cue located in the middle of the screen, prompting the archerfish that the onset stimulus was about to appear. The archerfish prepared to shoot by gazing up at the center of the screen. When the cue disappeared, the visual search stimulus appeared. Immediately after the water jet hit the screen, the stimulus was replaced with a white blank image. If the archerfish successfully shot at the target, it was rewarded with a food pellet, and if not, it was penalized by not being fed. To keep the archerfish focused during the experiment, each trial was limited to 5 s maximum. After 5 s, the stimulus (target and distractors) disappeared automatically. The frequencies of timeout events varied across archerfish subjects but were always excluded from the analysis.

In the feature search and shape experiments, the archerfish were presented with displays that contained a target and three, six, nine, or 12 distractors. In this way, the independent parameter (i.e., number of distractors) was distributed uniformly in parameter space. In the conjunction experiments, the archerfish were presented with displays that contained a target and four, six, ten, or 12 distractors. In this case, the number of distractors was set even in each trial to control for the two types of distractors and to have the same number of distractors of each type. Since the archerfish could only be rewarded with a limited amount of food per day, each experiment was run on consecutive days. Therefore, in each experiment, each archerfish was presented with ten trials per condition for a total of 40 trials per day. In the full experiment, each archerfish thus completed 50 trials per condition and 200 trials in total. The position of the target and the distractors on the screen varied randomly between trials to avoid inducing bias toward a specific location. The trials per condition appeared in blocks to avoid confusion.

Humans

Before the experiment began, the subjects were told that they would see slides with visual-search task scenarios, and their task was to determine whether a target was present or not, as quickly and as accurately as possible by pressing one of two specially assigned keys on the control panel ("p" for present and "a" for absent). Each trial began with a 500-ms fixation on a small black plus sign in the center of the screen. Immediately after the fixation period, the search stimulus appeared and remained on the screen until a response was made. A 5-s timeout was applied here as well and the stimulus disappeared if a response was not made during that time interval. An incorrect response was signaled by auditory feedback in the form of a beep sounded from the computer and the current trial was delayed by 1,000 ms to help the subject detect the mistake and regain focus.

Each subject completed all experiments in the following order: Experiment 1A–D, Experiment 2A, Experiment 3A–B, and finally Experiment 2B. Each experiment was composed of 96 trials, with 16 trials per condition. Specifically, they were composed of stimuli with three, six, nine, 12, 26, and 38 distractors in the feature search and shape experiments, and stimuli with four, six, ten, 12, 26, and 38 distractors in the conjunction experiments. For each condition, the target was present in one half of the trials and absent in the other half. All conditions per experiment were arranged in a random sequence with the restriction that three target-present or target-absent trials could not appear more than twice in a row. Once set, the random order of stimulus slides was the same across subjects.

Preceding each task, 16 practice trials were given to the subjects containing slides with three, six, nine, and 26 distractors. For each condition, there were four slides in half of which the target was present and in the other half the target was absent. All conditions were arranged in a random sequence. The subject had to complete all 16 practice trials successfully to take the experiment itself. If subjects failed, they repeated the training process until they succeeded.

Data analysis

For fish, reaction times and target-selection rates were extracted from the recorded videos, where the reaction time was defined as the time between stimulus onset and the moment the shot was initiated. To determine search slopes, we first calculated the reaction time median for each condition and then fit a line to these medians using standard linear regression.

In contrast to humans, it was crucial to verify that the archerfish understood the nature of the task. To do so the binomial cumulative distribution function for the target-selection rates was estimated and compared to chance values using a binomial test to verify the true probability of choosing the target was above chance (25%, 14.5%, 10%, and 7.5% chance values, for the three-, six-, nine-, and 12-distractor conditions, respectively).

Results

To compare archerfish and human performance in visual search we conducted a battery of visual-search tasks on both species involving feature search, shape search, and conjunction search (see Methods). Then, we analyzed the performance of each species to compare abilities in visual search.

Examples of the experimental results for the two species are presented in Fig. 7A–B. In these examples, we measured the reaction time in the visual-search experiment while varying the number of distracting objects in the display. Then for each experiment we calculated the slope of the reaction time depending on the number of objects, which was our main analysis tool when comparing the performance of the two species in visual search.

Fig. 7
figure 7

Histogram of the reaction-time slopes. To build a histogram of the reaction-time slopes we first calculated the slopes for each human and archerfish subject per task. To do so, we calculated the median (as well as the 25th and 75th percentiles) for each distractor (upper row). Then we fit the medians to a linear line by linear regression (dashed black line) to find the slope. To verify the subject understood the task, we also calculated the target-selection rate (mean and 95% confidence interval, lower line). (A) Color experiment (Experiment 1A). (B) Conjunction of small-and-blue experiment (Experiment 3B). (C) After calculating all slopes for each species, we built the slope histogram for each species. This resulted in two distinct groups of experiments for the fish. Blue indicates reaction-time slopes obtained from the color, size, orientation, motion, and conjunction of large-and-blue experiments. Red indicates reaction-time slopes obtained from the shape and conjunction of small-and-blue experiments. The black dashed line serves a threshold divider between the two groups obtained naturally by Otsu's method and k-means (k=2). (D) For humans the histogram resulted in a continuum. Red indicates reaction-time slopes obtained from the conjunction of the small-and-blue experiment and blue indicates all the other slopes. The black dashed line served as a threshold divider between parallel and serial search modes, as also found in the literature (Wolfe, 1998a)

Overall, each human subject (N = 11) participated in all experiments, whereas individual archerfish subjects only participated in some of the experiments (see Table 1 for more details), which stems from the inherent difficulty of conducting visual-search experiments on the latter (see above), including an occasional premature death of the fish. Therefore, we indicate the number of archerfish subjects in each experiment. Note that in all experiments, the target-selection rates were significantly higher than chance (p < 0.001, binomial test, see Methods) in both fish and humans, indicating that the archerfish and humans could indeed perform all the tasks.

Table 1 Summary of fish participation

Setting the slope boundaries between parallel and serial search

Clearly, relative to humans, the total number of trials in the literature that have tested archerfish in visual search is still very small. Hence, the existence of a continuum of search slopes, as was found to exist in humans (Wolfe, 1998b), remains an open question for the archerfish. Given these scant data, and inspired by Feature Integration Theory, we categorized human and archerfish performance based on the classical bimodal division into parallel (easy/efficient) and serial (difficult/inefficient) searches (Treisman & Gelade, 1980). We first specify the performance threshold between parallel and serial search modes for archerfish and humans separately, and then compare their results in each of the three visual-search tasks groups.

To define the reaction-time slope threshold for the archerfish data, we used the histogram of all search slopes obtained in the experiments and applied Otsu's method (Otsu, 1979) as well as k-means (with k=2) clustering (MacQueen, 1967). The resultant threshold between efficient and inefficient searches (using either method) was found to be ~45 ms/item, as depicted by the black dashed line in Fig. 7C. Naturally, this threshold splits the slopes population of the different experiments into two. Below the threshold (in blue) are the slopes obtained from the feature search, Q versus Os shape searches, and the large-and-blue conjunction search experiments (Experiments 1A–D, 2B, and 3A). Above this threshold (in red) are the slopes obtained from the disk versus Pac-Men shape and the small-and-blue conjunction search experiments (Experiments 2A and 3B).

In humans, the slope histogram formed a continuum of slopes (Fig. 7D) as expected from previous observations (Wolfe, 1998b). However, to be able to compare the results, we set a threshold of 5 ms/item based on the slope intervals as suggested by Wolfe (Wolfe, 1998a). To achieve a common terminology with the archerfish results, we used the threshold to characterize the human search as either parallel (efficient) or serial (inefficient) as well. Specifically, slopes below or equal to 5 ms/item were considered parallel search and slopes above 5 ms/item were considered serial search.

Feature-search tasks elicit pop out in archerfish and humans

We measured behavior in the four feature-search tasks (Experiment 1A–D, see Methods): color, size, orientation, and motion. In most cases (100% of the archerfish and 96% of the human subjects), the search was parallel for both species; i.e., search slopes were below 45 ms/item for the archerfish and below 5 ms/item for the humans (although 100% of the human slopes were 6 ms/item or lower).

Different shapes result in different efficiencies for different species

We next examined which visual search mode was elicited in archerfish and humans on the different shape tasks. In the first shape task (Experiment 2A), the target was a solid disk and the distractors were Pac-Men; i.e., three-quarter disks with adjusted diameters to equal the total intensity.

Interestingly, this experiment revealed different visual-search behaviors for archerfish as compared to humans (Fig. 8A). For the archerfish (N = 3) this task elicited a serial search (a mean slope of 76.7 ± 17.6 ms/item). In addition, though the target-selection rate values were higher than chance, they were relatively low compared to the target-selection rate observed in the feature search experiment (\( {\overline{x}}_{color}=98.2\%,{\overline{x}}_{size}=91.4\%,{\overline{x}}_{orientation}=83.2\%,\kern0.5em {\overline{x}}_{motion}=75.6\%,{\overline{x}}_{shape}=67.8\%. \) For color, size, and orientation p < 0.005 and for motion p = 0.06, permutation test). The high reaction-time slope and low target-selection rate indicate this task was hard for the archerfish. In humans, this task was easy, with a mean reaction-time slope of 2.8 ± 0.8 ms/item and a high target-selection rate (\( {\overline{x}}_{shape}=98.1\% \)).

Fig. 8
figure 8

Search mode of a shape is species dependent. (A) The shape task involving a disk vs. Pac-Men elicited parallel search in humans (2.8 ± 0.8 ms/item) and serial search in archerfish (76.7 ± 17.6 ms/item (N = 3)). (B) The shape task of Qs vs. Os elicited parallel search in both species (for humans: 0.96 ± 0.28 ms/item and for fish -3.1 ± 14.9 ms/item (N = 3))

In the second shape task (Experiment 2B), the target was a solid disk with an orientated line (i.e., "Q") and the distractors were solid disks (i.e., "O"). For both archerfish and humans this task elicited parallel search (Fig. 8B). Specifically, the mean slope for archerfish was -3.1 ± 14.9 ms/item (N = 3) and for humans 0.96 ± 0.28 ms/item.

Asymmetry in conjunction search task

Next, we investigated which visual search mode was elicited in the two species during a conjunction search of size and color (Experiment 3). Each archerfish and human were also tested in feature-search tasks based on the same two features making up the conjunction task (i.e., size and color). This was done to verify that the effects in the conjunction search were not predominantly the result of a single feature.

In the large-and-blue conjunction search task (Fig. 6H), the archerfish (N = 4) exhibited parallel search to detect the target (a mean reaction-time slope of -10.3 ± 7.7 ms/item, Fig. 5A). In the second conjunction task (Fig. 7I), all archerfish (N = 3) exhibited the serial search mode (a mean reaction-time slope of 81.3 ± 6.9 ms/item, Fig. 9B). Unfortunately, one fish died during the experiments from natural causes and only two fish could be tested in both conjunction tasks.

Fig. 9
figure 9

Conjunction visual search. We ran two conjunction tasks involving size and color. (A) When that target was large-and-blue it elicited parallel search in both species (for humans: 3.7 ± 0.5 ms/item and for fish 10.3 ± 7.7 ms/item (N = 4)) and (B) when the target was small-and-blue it elicited serial search in both species (for humans: 8.8 ± 1.6 ms/item and for fish 81.3 ± 6.9 ms/item (N = 3))

The human results were qualitatively similar to the archerfish results. In the conjunction search displaying a large blue target, humans exhibited parallel search (a mean reaction-time slope of 3.7 ± 0.5 ms/item, Fig. 5A), whereas on the second conjunction task they employed serial search (a mean reaction-time slope of 8.8 ± 1.6 ms /item, Fig. 9B).

Difficulty parameter in archerfish efficient search

For the archerfish, all four feature tasks involving color, size, orientation, and motion fell under the same category of parallel search. However, the fish results varied across tasks in terms of reaction time (Kruskal-Wallis, H3, 96 = 15. , p < 0.01) and target-selection rate (Kruskal-Wallis, H3, 96 = 56.5, p < 0.0001), which may imply different efficiency levels. Specifically, a negative correlation (R2 = 0.79, F1, 14 = 52.8, p < 0.0001) was found between the mean reaction time and the mean target-selection rate on each task (Fig. 10A). To better quantify this difference in the results, we defined “task difficulty” as the reaction time divided by the target-selection rate. This measure formalized the intuition that as a task becomes harder, the reaction time increases, or the target-selection rate decreases, or both. Since when considered in isolation there was a negative correlation between the reaction time and the target-selection rate, this new difficulty parameter was expected to identify consistent behavior based on the difficulty of the task even when they belonged to the same initial category.

Fig. 10
figure 10

Difficulty of a task. (A) Scatter plot of the mean target-selection rate vs. the mean reaction-time slope. Marker shapes represent the task (diamond, circle, square, and star shapes represent color, size, orientation, and motion tasks, respectively) and the color of the markers indicates the number of distractors (blue, green, orange, and red for three, six, nine, and 12 distractors, respectively). Black shapes represent the mean of each group. The dashed ellipses are the two-dimensional standard deviational ellipse of each task (σ = 1) representing the spread of the group. The blue line was obtained via linear regression and indicates a negative correlation between the two parameters (R2 = 0.79, F1, 14 = 52.8, p < 0.0001). (B) The difficulty of a task as a function of the tasks. Each color is associated with the number of distractors. For convenience, the tasks are organized in ascending order from low to high mean task difficulty (black lines). The easiest task was color, followed by size, orientation, and finally motion (p < 0.05, permutation test)

Figure 10B presents the difficulty per condition per task, where for clarity of presentation the tasks are organized in ascending order of mean task difficulty (black lines). According to this analysis, the easiest task was color, followed by size, orientation, and finally motion (p < 0.05, permutation test). Again, despite belonging to the same general category of “parallel search,” the difficulty parameter indicates important differences and may serve as a precursor for the performance continuum considered characteristic of humans (Wolfe, 1998b).

Discussion

This study examined the applicability of the fundamental aspects of Feature Integration Theory to non-human species. To do so, we compared the visual-search performance of humans with an evolutionarily distant animal – the archerfish. We found that both species behave identically in feature-search tasks involving color, size, orientation, and motion, as well as in conjunction tasks involving size and color. In shape visual searches, on the other hand, there were discrepancies between humans and archerfish. Whereas a shape search task of a disk versus Pac-Men elicited parallel search in humans, it forced the archerfish into serial search. At the same time, a shape search task of Qs versus Os elicited parallel search in both species (Fig. 11).

Fig. 11
figure 11

Mean reaction-time slopes per species per task. (A) Mean human reaction-time slopes for each task: color 0.9 ± 0.67 ms/item, size 1.1 ± 0.2 ms/item, orientation 1.3 ± 0.7 ms/item, motion 0.4 ± 0.7 ms/item, Disk vs. PacMen 2.8 ± 0.8 ms/item, Qs vs. Os 0.96 ± 0.28 ms/item, large-and-blue 3.7 ± 0.5 ms/item, small-and-blue 8.8 ± 1.6 ms/item. Black dashed line at 5 ms/item splits the bars into two search mode groups. This threshold is based on the slope intervals suggested by Wolfe (Wolfe, 1998a). (B) Mean fish reaction-time slopes per task: color -1.7 ± 5.4 ms/item (N = 9), size 11.8 ± 6.4 ms/item (N = 8), orientation 14.2 ± 11.6 ms/item (N = 4), and motion 22.9 ± 3.3 ms/item (N = 3), Disk vs. Pac Men 76.7 ± 17.6 ms/item (N = 3), Q vs. Os -3.1 ± 14.9 ms/item (N = 3), large-and-blue 10.3 ± 7.7 ms/item (N = 4), small-and-blue 81.3 ± 6.9 ms/item (N = 3). Black dashed line at 45 ms/item splits the bars into two search mode groups. This threshold was obtained by applying Otsu's method (Otsu, 1979) as well as k-means (with k=2) clustering (MacQueen, 1967) on the histogram of all search slopes obtained in our experiments on fish. This threshold splits the slope population into efficient and inefficient slope searches

Overall, this study follows in the footsteps of other visual-search studies in the barn owl (Harmening et al., 2011; Orlowski et al., 2015; Orlowski et al., 2018), pigeons (Allan & Blough, 1989; Blough, 1984; Blough, 1977; Cook et al., 1996), monkeys (Bichot & Schall, 1999; Buračas & Albright, 1999; Matsuno & Tomonaga, 2006; Nothdurft, Pigarev, & Kastner, 2009), rats (Botly & De Rosa, 2011), and archerfish (Ben-Tov et al., 2015; Rischawy & Schuster, 2013). Together they suggest that Feature Integration Theory is pertinent to a variety of non-human species despite their very different brain anatomy, computational power, and habitats.

Do the search slopes of archerfish form a continuum?

In humans, reaction-time search slopes form a continuum ranging from highly efficient to very inefficient (Wolfe, 1998b). In fish, it is not clear whether this continuum exists. This might be due to the small number of experiments accumulated thus far in the literature and their rather limited variety. Here, a bimodal distribution of slopes appeared to represent the data well. However, while color, size, orientation, and motion results appeared to reflect parallel search, a finer resolution measure of performance that takes both accuracy and response time into consideration revealed a spectrum that more closely matched a continuum in task difficulty. This may imply that the efficiency of the search does vary between tasks for fish and suggests that future research in animal visual search should use a more fine-grained measure as a standard.

Shape task efficiency is species dependent

The shape task involving a solid disk versus Pac-Men (Experiment 2A) revealed differences in visual-search behaviors between the archerfish and the human participants (Fig. 8). With steeper reaction-time slopes and lower target-selection rates, this task was significantly harder for the archerfish than the feature-search tasks. In contrast, for humans this task was easy and performance did not differ from the other feature-search tasks.

The difference in performance between archerfish and humans could arise from the difference in familiarity of the species to the shape. For example, searching for a familiar letter target among familiar letters distractors can elicit pop out (Malinowski & Hübner, 2001; Shen & Reingold, 2001). Hence, since the solid disk and the partially solid disk shapes are strongly integrated in human life but not in the archerfish life, search efficiency may have been affected.

Alternatively, negative shape associations and fear attract more attention and thus may affect visual-search performance (Eastwood, Smilek, & Merikle, 2001; Hershler & Hochstein, 2005; Öhman, Lundqvist, & Esteves, 2001b). Though humans might think at first sight that the distractors are reminiscent of Pac-Men, or pies, the archerfish might associate them with an open mouth of a large red predator fish. This interpretation could be threatening and could thus explain why as the set size increased so did its attention to the distractors.

The second shape experiment of Qs versus Os (Experiment 2B) was based on Treisman's classical experiment, which elicits parallel search in humans (Treisman & Souther, 1985). As we also found, a similar task elicits parallel search in archerfish. This result contrasts with reported results on pigeons where line terminators in a circle did not elicit pop out (Allan & Blough, 1989). Overall, these findings may suggest that shape processing is species dependent, and affect visual search in a corresponding fashion.

Conjunction search asymmetry among vertebrates

In this study, we also assessed conjunction search phenomenon in both species. A conjunction of size and color elicited serial search in both species when the target was small-and-blue when the background consisted of large-and-blue and small-and-black distractors (Experiment 3B). At the same time, the dual case of large-and-blue targets elicited parallel search. Since the main difference between the two experiments was the size of the target, it is likely that the parallel search was driven primarily by the species' ability to focus relatively easily on the large solid disks while virtually ignoring the small ones. Thus, the dominance of the large solid disks guided attention and turned the conjunction task effectively into a feature task of color involving large solid disks alone.

In total, these experiments hint at similarities between archerfish and humans in visual-search performance. However, one significant difference between performance was manifested in the relatively extreme mean reaction-time slope of the archerfish in Experiment 2A (large-and-blue). Since the conjunction task was relatively hard, only the more capable archerfish were able to complete this task, causing the population sample to be biased. Alternatively, in most cases, the archerfish that performed the conjunction search task were lab veterans that had performed many different visual-search tasks in the preceding months and, for some, for up to a year. In contrast, the human subjects were only given brief training (see Methods). If humans had been given massive training (or overtraining) their search may have become more efficient (Shiffrin & Schneider, 1977).

Our results also reflect the interesting phenomenon of search asymmetries (Treisman & Souther, 1985; Treisman & Gormican, 1988; Wolfe, 2001b). The data here constitute the first evidence for conjunction asymmetry in a non-human species. Note that a few cases of feature asymmetry in non-human species have been reported. Bees manifested increased rates of reaction time and detection error when the target was blue among yellow distractors compared to the opposite case (Spaethe, Tautz, & Chittka, 2006). In barn owls a bright object was reported to pop out from a background of dark distractors (when measured in terms of reaction time and number of saccades to the target), but the symmetric case (dark target amidst bright distractors) elicited serial search (Orlowski et al., 2018).

Visual features that can only be attributed to non-humans

The literature on visual search in animals has (at least implicitly) supported the idea that anything that pops out for animals also pops out for humans, but not other way around. A few experiments have indicated that while certain shape tasks pop out to humans they do not do so for animals. This “one-way” difference in performance could be due to the fact that the human visual system is much more developed and enables people to process complex scenarios faster and better than animals, whereas many basic pop-out features have common usage in nature to humans and other species. This raises the question of whether there are any visual tasks on which animals outperform humans.

Clearly, visual features that are not observable to humans but can be sensed by animals will give animals an edge if and when they define the target. For example, if the target is composed solely of the polarization of the light it reflects (Calabrese, Brady, Gruev, & Cummings, 2014; Dacke, Nilsson, Scholtz, Byrne, & Warrant, 2003; Horváth, Horváth, Varju, & Horváth, 2004; Schwind, 1991) or by certain optic flow characteristics (Baird, Srinivasan, Zhang, Lamont, & Cowling, 2006; Egelhaaf, 2006; Si, Srinivasan, & Zhang, 2003; Srinivasan, Zhang, Lehrer, & Collett, 1996b; Srinivasan, Zhang, Lehrer, & Collett, 1996b), it may pop out for certain animals but not for humans. Naturally, our open question aims at fairer scenarios.

In the spirit of ecological vision (Gibson, 2014), since the natural environment affects an animal's sensitivity to visual features (Blakemore & Cooper, 1970), a habitat containing a basic feature that is rare in human life could pop out to certain animals living in this environment.

In the same spirit, it is tempting to hypothesize that certain shape experiments would elicit effective (parallel) search in a particular animal but serial search in humans. This type of experiment would need to revolve around a typical shape from the animal's world, which rarely exists in human environments or not at all. For instance, for humans, a human face target pops out among assorted non-face objects. However, an animal face target does not pop out among the same assorted non-face objects despite the similarities between animal and human faces (Hershler & Hochstein, 2005). If such an effect holds true among different non-human species as well, a fish figure or fish face experiment might pop out for fish and not for humans. Our preliminary attempts in the lab have not yet been able to generate a suitable shape experiment of this type, and more exhaustive research on this question is left for future work.