The world surrounding us offers an endless variety of scenes and objects. Just look around. You might see a desk, chairs, books, computers; you instantly recognize them as belonging to a certain category (furniture, electronics, etc.), although your specific desk may be unique to you. The apparent ease of your categorical recognition belies the complexity of this feat: We effortlessly detect and categorize tens of thousands of objects from countless possible angles and distances. And we do so in the blink of an eye (Grill-Spector & Kanwisher, 2005). How is this possible? This article is about psychological and neurobiological studies that try to solve this question.

Our review will focus on perceptual categorization in nonhuman animals. This research field started with pigeon experiments, and perceptual category learning in birds is still a highly prolific area of investigation. As successful this research area is, it suffers from a curious neglect of the neurobiological fundaments of categorization learning in birds. This is a real deficit, as incorporating neurobiology into research on bird learning and cognition could provide two unprecedented opportunities: First, it could deepen our understanding of the mechanisms of category learning in birds from a different angle. Second, it would establish an avian alternative to the successful neuroscientific inquiries on categorization that use rodent and primate models. Therefore, we will also review some of the new developments in neuroscience outside the avian realm and outline their possible impact on perceptual categorization learning research in pigeons. By restricting ourselves in this way, we unfortunately have to ignore a vast and highly interesting comparative literature on abstract concept learning. We apologize to all the investigators whose research we therefore do not appropriately cite owing to this restriction.

Categories and concepts

Modified from Keller and Schoenfeld (1950), Cohen and Lefebvre (2005) defined categorization as an organism’s ability to respond equivalently to members of the same class, to respond differently to members of a different class, and to transfer their reports to novel and different members of these classes. Categorization is a key component of our cognitive system since it drastically reduces information load (Wasserman, Kiedinger, & Bhatt, 1988). Due to the evolutionary relevance of categorization learning, humans (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976; Hegdé, Bart, & Daniel, 2008) and many more animals (e.g., pigeons: Herrnstein & Loveland 1964; Yamazaki, Aust, Huber, Hausmann, & Güntürkün, 2007; dogs: Range, Viranyi, & Huber, 2007; monkeys: Kromrey, Maestri, Hauffen, Bart, & Hegdé, 2010; Sigala & Logothetis, 2002; honey bees: Benard, Stach, & Giurfa, 2006) learn to react in a similar manner to different objects from the same category.

Usually, categories are distinguished from concepts, although, possibly, this distinction is not a binary difference but a continuous transition. A category refers to a set of entities that are grouped by overlapping perceptual features. Take, for example, the perceptual category “book.” Books can come in different sizes, in different boards and scripts, with hard or soft cover. But despite these variations, their overlapping features make it easy to retrieve the category book when seeing one. Importantly, different kinds of books can be treated equivalently as if they were the same, as soon as they are grouped in the same category (Sidman, 1994). Indeed, once objects are classified as belonging to a joint category, memory about individuating features of the diverse items starts to suffer (Lupyan, 2008).

But what about e-books? Are they part of the same category as printed books? And what about the books that were read in the ancient civilizations of Egypt or Rome? These were handwritten on long strips of papyrus and were read by rolling a stretch of papyrus from left to right. After reading, this scroll was placed in a jar, together with further scrolls. What looked like a modern book in ancient Rome was in fact not a book but a codex, and it consisted of official documents bound together. Is it still possible to come up with the category book when taking e-books and ancient scrolls into consideration? This problem grows even larger when we reflect about sayings such as “I can read him like a book.” All of these examples cannot be jointly bound in the perceptual category “book.” They are, however, part of the concept of books (Goldstone, Kersten, & Carvalho, 2017). Concepts are mental elements of knowledge about objects that have joint features or functions that do not need to be perceptual. In a famous saying of Barsalou (1983), the concept of “things to remove from a burning house” means that even children and jewelry become similar and conceptually bound. It is important to distinguish between perceptual categories and concepts. As we will see below, pigeons can learn an astonishing variety of perceptual categories. But they also seem to master some abstract concepts, such as “number,” identity, and higher order relations (for review, see Lazareva & Wasserman, 2017). As outlined above, however, we will solely focus on perceptual categories in the present review.

Behavioral experiments on perceptual categorization in pigeons

Scientific inquiry into categorization took a major shift in the mid-1960s. Until that time, categorization was seen as a language-based cognitive ability that was a realm of human cognition. Then, Herrnstein and Loveland (1964) published their now-classic study in which they demonstrated that pigeons can quickly acquire the category “human” after being conditioned to discriminate between hundreds of photographs of which some depicted humans (Herrnstein and Loveland used the word “concept” in their title). The birds not only easily learned to properly discriminate the training stimuli but also transferred their knowledge to novel photographs. Numerous subsequent experiments further explored pigeons’ categorization abilities using natural stimuli, such as birds versus mammals (Kendrick, Wright, & Cook, 1990), individual pigeons (Nakamura, Croft, & Westbrook, 2003), cartoons (Matsukawa, Inoue, & Jitsumori, 2004), human action poses (Qadri & Cook, 2017), different painting styles (Watanabe, 2011; Watanabe, Sakamoto, & Wakita, 1995), human facial expressions (Jitsumori & Yoshihara, 1997), or human face identity (Soto & Wasserman 2011). Pigeons even successfully discriminate malignant from benign human breast histopathology (Levenson, Krupinski, Navarro, & Wasserman, 2015) and can distinguish English words from nonwords (Scarf, Boy, et al. 2016). Taken together, pigeons seem to have unexpectedly large resources to learn various perceptual categories.

All of these studies adopted the basic procedure employed by Herrnstein and Loveland (1964): They used an initially large number of stimuli during discrimination acquisition, followed by testing transfer ability with novel stimuli. This procedure should ensure that discrimination has been established and that subsequent categorization testing is not based on rote memory. Successful transfer to previously unknown exemplars of the learned category is then usually taken as evidence of open-ended categorization ability (Herrnstein, 1990). However, this approach does not necessarily guarantee that pigeons indeed had only used the presence or absence of humans in photographs to decide between S+ and S−, respectively. The reason is simple: Humans are often depicted alongside furniture, streets, houses, or cars. These items could then be used as an extended feature collection of the category “human. Indeed, pigeons can learn to categorize “man-made objects” (Lubow, 1974). So the animals in the study of Herrnstein and Loveland (1964) may simply have used perceptual background features that co-occur with humans to master the task. Such a strategy would in principle still be based on perceptual categorization, but using different features than what we had expected. Thus, in order to identify the nature of the formed category, we have to find means to reliably identify the utilized visual cues. The relevance of this quest is depicted by results of, for example, Greene (1983), who discovered that her pigeons exploited spurious systematic differences of the background in a “human” categorization task, rather than using the actual presence or absence of people.

Different approaches to this problem discovered that pigeons can rely on a mixture of background and relevant stimulus category information to base their decision on. Wasserman, Brooks, and McMurray (2015) demonstrated that pigeons can successfully learn to categorize in parallel 128 photographs into 16 different categories. Since no background cues were available, the animals had to use pictorial aspects of the stimuli. However, the spatial location of the photographs turned out to affect choices in the beginning of the experiment. This vanished during the progress of learning. Thus, all kinds of cues can affect categorization learning. To demonstrate that pigeons can indeed also learn about human body-unique information, Aust and Huber (2001) trained their pigeons in a people/no-people experiment. During transfer trials, pictures of novel human figures were cut out and pasted on previously seen “no people” stimuli. Thus, the pigeons were now confronted with a stimulus that contained a previously learned negative background that was combined with a novel positive foreground that depicted a person. The animals pecked on such manipulated stimuli, thereby making it likely that they indeed had used features related to the human body (see also Aust & Huber, 2002). Further studies tried to reconstruct the stimulus properties that control categorization behavior by randomly covering parts of the stimuli with “bubbles.” In this experiment, pigeons and humans had to decide if the depicted person had a happy or a neutral expression. By analyzing the success of the subjects relative to the visible components of the stimuli, Gibson, Wasserman, Gosselin, and Schyns (2005) discovered that both humans and pigeons mainly relied on the mouth part of the photographs. To directly reveal the focus of attention of pigeons during perceptual categorization tasks, Dittrich, Rose, Buschmann, Bourdonnais, and Güntürkün (2010) introduced peck tracking to examine the pecking locations of pigeons in a people-present/people-absent task. Their results revealed that pecking location was mostly focused on the head of the human figures. Removal of the heads of the depicted persons impaired performance, while removal of other parts of the human figures did not. Using this technique, Castro and Wasserman (2014, 2017) also demonstrated that pigeons track features that are category relevant and respond less to details that only weakly coincide with the presence of the relevant stimulus. The results of these studies make it likely that pigeons selectively attend to and choose features that are diagnostic for the presence of the learned perceptual category.

Taken together, pigeons can learn an astonishing variety of perceptual categories. Hereby, they seem to focus their attention on specific features that predict the presence of a stimulus that belongs to the rewarded category. If background patterns systematically correlate with the presence of an S+, the behavior of the animals can also come under the control of such patterns. We will dwell on the implication of these observations in the next section.

Learning perceptual categories—Excursion 1: The reward prediction error and dopamine

Several theoretical accounts have been offered to explain category learning. As outlined in the very beginning, we will not review all of these diverse attempts but will only focus on those theories that might inform a mechanistic hypothesis on the neurobiological fundaments of perceptual categories in birds. To this end, we obviously have to start with the mechanisms of learning.

Every stimulus offers a diversity of features, of which only some are category relevant. In the beginning, the animal can do nothing more than proceed by using trial and error to identify the features that are correlated with reward. Consequently, several categorization theories have incorporated error-driven learning rules that allow emergence of a selective attention to relevant stimulus dimensions (Gluck & Bower, 1988; Kruschke, 1992; Rumelhart, Hinton, & Williams, 1986). Assuming that the increase in associative strength between a stimulus and an outcome is proportional to the error magnitude between prediction and outcome, a gradient descent of error is to be expected. This obviously is also the core assumption of the error-driven learning rule by Rescorla and Wagner (1972). Recently, Soto and Wasserman (2010) put forward a common elements model of visual categorization that incorporates this rule as a driving force to learn common features within individual stimuli that belong to a category. None of these features has to be present in all exemplars of a category, and thus none of them can fully explain categorization behavior. Similarly, the common elements model assumes that each image is represented by an overlapping collection of elements. Elements that are only activated by a single image are stimulus-specific properties that drive the identification of this individual stimulus. Elements that are activated by several images from the same category are category specific and drive categorization learning. Cook, Wright, and Drachman (2013), for example, trained pigeons on line drawings of birds and mammals. When analyzing transfer to novel instances, the authors discovered that the birds had mostly learned some key visual features that helped to disambiguate between classes. Thus, the pigeons were able to select the diagnostic parts and were able to tolerate some image alterations as long as the core features were preserved. The common elements model would predict such findings but also those of low-level features that come to predict categorization learning (Greene, 1983; Huber, Troje, Loidolt, Aust, & Grass, 2000; Troje, Huber, Loidolt, Aust, & Fieder, 1999).

Studies of reward processing by dopaminergic midbrain neurons and their target regions in mammalian frontostriatal networks have shed light on the mechanisms by which nervous systems update predictions, associate subjective values to events, and select responses based on feedback (Schultz, 2016). These insights provide a post hoc neurophysiological correlate of the Rescorla and Wagner (1972) theory. Let us therefore make an excursion into the dopaminergic system before later coming back to categorization learning in birds.

Midbrain dopaminergic cell groups in mammals project to frontostriatal targets and beyond. Dopamine neurons show increased activation when a reward is received (Waelti, Dickinson, & Schultz, 2001). This activation has different components, but the most important one for the present account is constituted by a fast activation or depression of the dopaminergic signal that codes a positive or a negative reward prediction error, respectively (Nomoto, Schultz, Watanabe & Sakagami, 2010; Schultz, 2016). When reward is contingently preceded by a cue, dopaminergic activity shifts backward in time to the reward predicting cue such that the burst is now correlated with the CS+, but not with the reward itself anymore (Schultz, 1998). When a predicted reward is not delivered, or is smaller than predicted, dopamine neurons drop their activity at the time point of expected reward delivery. The same neurons show an increase of activation when the reward was larger than predicted (Bayer & Glimcher, 2005). Schultz (2016) argues that these responses function as feature detectors for the goodness of a reward relative to its prediction: If the reward is fully predicted, then no signal change occurs. If the reward is better than predicted, then dopamine neurons emit a positive signal. If the reward is smaller than predicted, then the same neurons emit a negative signal (see Fig. 1a). Dopamine neurons also comply with further assumptions of Rescorla and Wagner (1972), such as, for example, the blocking effect (Waelti et al., 2001). Based on these findings, Montague, Hyman, and Cohen (2004) have argued that dopamine neurons essentially perform the computation proposed by the Rescorla–Wagner theory. Thus, when predictions turn out to be wrong, dopamine retunes the system to improve its predictions. According to this view, dopamine activity carries a teaching signal that modifies the response selection (see Fig. 1b).

Fig. 1
figure 1

Dopamine-mediated reward prediction error hypothesis (Schultz, 1998). a Activity pattern of a dopamine neuron during the process of classical conditioning. In Trial 1 the animal did not expect a reward. Therefore, the dopamine cell responds to reward delivery with a sharp increase of activity. In Trial n, when learning became asymptotic, the dopamine neuron does respond to the conditioned stimulus (CS; the less predictable cue) but no longer to the unconditioned stimulus (UCS; which is predicted by the CS). When the reward is omitted in Trial n + 1, the dopamine neuron ceases to fire at the time point when reward was expected. Thus, dopamine neurons can deliver a message of “better than expected” or “worse than predicted” to forebrain structures. b Dopamine-mediated signals could continuously adjust synapses in target areas to ensure every generated action that produced an unexpected result activates a new surge of synaptic changes to ensure future action adjustments

Learning perceptual categories—Excursion 2: The neuronal basis of categorization in primates

Over the past decade, neurobiological accounts of categorization in primates have taken a new route. About 10 years ago, the inferior temporal cortex (ITC) of monkeys was mostly seen as a mere storehouse of visual images that were selectively retrieved by the prefrontal cortex to enable category-based decisions. Recordings from the monkey ITC had revealed diverse cell populations, each responsive to a distinct but overlapping set of stimuli (Vogels, 1999). ITC neurons are known to respond to diagnostic features of stimuli that enable a successful categorization of these patterns (Sigala & Logothetis, 2002). During the process of learning a categorization task, the population response of ITC neurons shifts moderately to enable higher stimulus selectivity for the critical features (Baker, Behrmann, & Olson, 2002). Due to massive parallel processing of large numbers of neurons, selectivity for a combination of features is markedly enhanced during categorization learning, such that both feature-based and configuration-based coding is enabled within ITC at a single-cell level (Baker et al., 2002).

But is this category-coding ability of ITC-cells a result of top-down instructions from prefrontal areas, or does it emerge from bottom-up input? Already in the mid-1990s, Simon Thorpe demonstrated that when humans watched a stream of images where each image was flashed for just 20 ms, event-related potentials signaled categorical decisions within about 150 ms after stimulus appearance (Thorpe, Fize, & Marlot, 1996). This feat likely made a purely feed-forward-driven visual categorization mechanism. Subsequent studies demonstrated that human fast categorization is based on visual areas comparable to the monkey ITC (Curran, Tanaka, & Weiskopf, 2002). How is such a fast, purely feed-forward categorization at ITC level possible?

The primate ITC can solve categorization problems by rapidly computing a vector that distinguishes between various object categories, despite individual differences between objects within a class (DiCarlo, Zoccolan, & Rust, 2012). Just 50 ms after picture exposure, a population of monkey ITC neurons can be selective for a certain image category (DiCarlo & Maunsell, 2000). This is nicely shown in a study by Hung, Kreiman, Poggio, and DiCarlo (2005): Here, monkeys were shown 77 stimuli that could be categorized into eight classes, such as toys and faces, while ITC neurons were recorded. The activity of these neurons was then analyzed with a linear classifier—a machine learning algorithm that categorizes items based on the value of the linear combination of features of these items. A linear classifier that was trained with the input from the recorded neurons could categorize these stimuli with an accuracy of above 90%. The categorization accuracy of the classifier steeply increased with time, such that 70% accuracy was reached just 12.5 ms after the onset of activity of ITC neurons. Classifier performance also increased linearly with the number of recorded cells, such that only about 100 neurons were needed over a period of less than 20 ms to categorize stimuli with very high accuracy (Hung et al., 2005). These fast responses support bottom-up visual object categorization at ITC level despite changes in object position or background (Li, Cox, Zoccolan, & DiCarlo, 2009). So, is the primate ITC sufficient to run all processes that are needed for perceptual categorization from stimulus onset up to the final decision of the animal? Certainly not.

The prefrontal cortex (PFC) of primates is a second critical area for categorization learning. When monkeys are trained to categorize different objects into two groups, PFC cells show a sharp differentiation between categories (Freedman, Riesenhuber, Poggio, & Miller, 2001). PFC cells also stay active during a delay period when the stimulus is no longer shown, and they display a high level of task-relevant activity during the subsequent decision phase (Lundqvist et al., 2016). In addition, category coding of PFC neurons quickly changes when new category boundaries are created by switching to the new category and ceasing to respond to old category boundaries (Freedman et al., 2001). Thus, PFC neurons flexibly code the current category boundaries, based on reward feedback consecutive to own behavior (Freedman & Miller, 2008).

ITC and PFC interact closely during categorization. Existing category borders of ITC seem to be sharpened by PFC top-down input in a task-dependent manner (Kauffmann, Bourgin, Guyader, & Peyrin, 2015; Pannunzi et al., 2012). This top-down input becomes especially relevant during rule changes when new category borders have to be selected based on PFC circuits (Roy, Riesenhuber, Poggio, & Miller, 2010). Most importantly, even if multiple category borders are already represented at ITC level, it requires the PFC to learn the rules according to which certain categories are selected for action, while other category borders are neglected (Seger & Miller, 2010). Accordingly, monkeys with PFC lesions are, in principle, able to categorize objects but fail to translate this ability into rule-based decisions (Minamimoto, Saunders, & Richmond, 2010).

Taken together, a minimalistic account on the neural fundaments of perceptual categorization in primates involves two structures, ITC and PFC, which have complementary functions. The ITC encodes detailed visual information by a parallel activation of a large neuronal population. These differently tuned cells encode information about all kinds of category borders that can be extracted by simple computational means such as, for example, linear classifiers. In addition, the coding properties of these neurons can be modestly modified by dopaminergic input and by top-down influences from the PFC. Prefrontal areas, on the other hand, receive massive dopaminergic input and are importantly tuned by this prediction error coding feedback. The PFC interacts with the ITC to retrieve existing category-based information and to transform this into decision and action processes. The PFC thereby swiftly switches between different ITC-based category borders, depending on changing task contingencies. It should again be emphasized that this concentration on just the ITC and PFC is a minimalist view on the system—a view that ignores that structures like hippocampus and striatum also have their own unique contributions to specific other aspects of categorization behavior.

But now it is time to return to birds.

The avian visual pathways and the bird “‘prefrontal”‘ system in a whirlwind

Both birds and mammals have two ascending visual pathways to the forebrain; the tectofugal (comparable to the mammalian extrageniculocortical system), and the thalamofugal pathway (comparable to the geniculocortical system).

The tectofugal pathway (see Fig. 2, depicted in dark yellow) ascends from the retina to the optic tectum, to the thalamic nucleus rotundus, and then finally terminates in the forebrain entopallium (Mouritsen, Heyers, & Güntürkün, 2016). In pigeons, the tectofugal system controls visual tasks in which the bird has to look at visual stimuli in its frontal visual field and responds to them (Remy & Güntürkün, 1991; Güntürkün & Hahmann, 1998). Because practically all perceptual categorization tasks in pigeons use setups in which the birds scrutinize the visual stimuli with their frontal visual field and then peck them, it is likely that the tectofugal system is the most important neural component for our understanding of perceptual categorization in pigeons. Entopallial neurons have reciprocal interactions with a cluster of associative areas (jointly depicted in green) that surround the entopallium and that alter their activity patterns during different visual object distinction tasks (Stacho, Ströckens, Xiao, & Güntürkün, 2016).

Fig. 2
figure 2

a Skull and brain of pigeon depicted as combined CT/MRT-based image (Güntürkün, Verhoye, De Groof, & Van der Linden, 2013). b Organization of the visual pathways in pigeons and their projections to the “prefrontal” NCL. Tectofugal system (shown in yellow) starts with the retinal projections to the optic tectum. From there, tectofugal projections reach, via the thalamic n. rotundus (Rt), the entopallium (E), which then projects to surrounding associative visual areas (shown in green). Thalamofugal visual system is shown in blue and begins with the retinal projections to the thalamic nucleus geniculate lateralis, pars dorsalis (GLd), and from there to the visual Wulst. Since the Wulst also projects to the associative visual areas, these structures receive input from both pathways and in addition interact with the nidopallium caudolaterale (NCL), which is a functional equivalent to the mammalian prefrontal cortex. NCL projects to the (pre)motor arcopallium and the striatum. From there, down-sweeping projections realize the motor output of the animal. Glass brain of the pigeon is based on Güntürkün et al. (2013). (Color figure online)

Recordings from entopallial neurons reveal that they respond to a large number of overlapping features. For example, Scarf, Stuart, Johnston, and Colombo (2016) recorded from the entopallium while the animals were required simply to peck on one of 12 different visual stimuli. The authors discovered that many cells in the entopallium were vigorously responding to several stimuli, while showing only modest modifications of their spike trains when processing distinct stimuli. Verhaal, Kirsch, Vlachos, Manns, and Güntürkün (2012) trained pigeons in a go/no-go task while recording from the entopallium. They discovered that entopallium neurons rapidly learn to respond to rewarded stimuli, while quickly ceasing to respond to no-go cues. Colombo, Frost, and Steedman (2001) and Johnston, Anderson, and Colombo (2017a, 2017b) demonstrated, in addition, that entopallium and associative visual area neurons keep firing for a specific cue during the delay period of a delayed-matching-to-sample task, and so possibly support the retention of critical visual information. Thus, entopallium and associative visual area neurons seem to prefer broad, overlapping stimulus classes and can modify their activity patterns according to reward-associated task contingencies.

The thalamofugal visual pathway (see Fig. 2; shown in blue) ascends from the retina to the thalamic nucleus geniculate lateralis, pars dorsalis (GLd) and from there to the visual Wulst in the telencephalon. The thalamofugal system seems to mainly receive visual input from the lateral visual field (Güntürkün & Hahmann, 1998; Remy & Güntürkün, 1991). The Wulst also has reciprocal connections with the visual associative areas. Thus, both major ascending visual pathways merge in this area (Shanahan, Bingman, Shimizu, Wild, & Güntürkün, 2013).

The associative visual areas have reciprocal connections with a structure in the most posterior part of the pigeon forebrain: the nidopallium caudolaterale (NCL; see Fig. 2). A large number of studies make it likely that the NCL is a functional equivalent to the PFC of mammals (Güntürkün, 2012). Similar to the PFC, the associative NCL integrates multimodal information and connects this higher order sensory input to limbic and motor structures, including the striatum (Kröner & Güntürkün, 1999). Thus, like the PFC, the avian NCL is also a convergence zone between the ascending sensory pathways and the descending motor systems (Güntürkün & Bugnyar, 2016). Accordingly, NCL neurons code for different modalities in task-dependent manner (Moll & Nieder, 2015, 2017) and prospectively encode future behavior based on learned stimulus associations (Veit, Pidpruzhnykova, & Nieder, 2015). Importantly, they control what should be remembered and what should be forgotten (Rose & Colombo, 2005), encode future events (Scarf et al., 2011), decision-making (Lengersdorf, Güntürkün, Pusch, & Stüttgen, 2014), action-related subjective values (Kalenscher et al., 2005; Koenen, Millar, & Colombo, 2013), rule tracking (Nieder, 2017; Veit & Nieder, 2013), numerosity (Ditz & Nieder, 2015), visual category (Kirsch et al., 2009), and the association of outcomes to actions (Starosta, Güntürkün. & Stüttgen, 2013; Johnston et al., 2017a; Liu, Wan, Shang, & Shi, 2017).

NCL lesions also interfere with all cognitive tasks that are known to depend on the mammalian PFC (Güntürkün, 1997, 2005, 2012; Kalenscher, Ohmann, & Güntürkün, 2006). The learning-related plasticity of NCL very likely depends on its dense innervation by dopaminergic fibers that release dopamine during learning and executive tasks (Karakuyu, Herold, Güntürkün, & Diekamp, 2007; Herold, Joshi, Hollmann & Güntürkün, 2012). NCL and PFC therefore represent a case of parallel evolution of mammals and birds that resulted in the convergent emergence of brain areas that subserve executive cognitive functions (Güntürkün, 2012).

In summary, the functional organization of the visual and “prefrontal” avian forebrain is highly similar to the mammalian pattern. Together with their input and output structures, they form the visuomotor system of the bird brain. Now we can combine insights from primate research with cognitive and neuroscientific inquiries to develop a mechanistic hypothesis on perceptual categorization in birds.

A mechanistic neuroscientific hypothesis on perceptual categorization in pigeons

Let’s combine the collective evidence discussed so far in a framework that links population-level category coding in the bird visual forebrain with the prediction error-driven neural learning dynamics in the NCL. Let’s first look for evidence of population-level category-specific coding in the bird visual system.

Koenen, Pusch, Bröker, Thiele, and Güntürkün (2016) tested the idea of a population-level category coding of visual associative neurons. Their pigeons were not required to discriminate between stimulus categories but just pecked on any upcoming stimulus to obtain food. The abstract stimuli differed in color, shape, spatial frequency, and amplitude. The spike trains of the simultaneously recorded visual associative neurons were used to post hoc identify the stimulus classes the animals saw. To this end, a representational dissimilarity matrix (RDM) was calculated from the spikes such that neural output of each neuron to each stimulus was correlated with the neural responses to every other stimulus. Thus, each stimulus pair could be depicted with a gray value code that corresponds to the degree of dissimilarity (calculated using the Spearman’s rank correlation coefficient) of the neuronal response patterns. The gray values help to quickly visualize the degree of (dis)similarity of cellular populations to specific stimuli (see Fig. 3).

Fig. 3
figure 3

Cellular coding of variously colored objects and grating patterns in the pigeon’s associative visual areas. Analysis of the population of cells using a representational dissimilarity matrix (RDM) shows that neuronal responses to colored stimuli are discernible in the left upper corner due to highly correlated spike patterns of visual-associative units (dark gray: low dissimilarity; light gray: high dissimilarity). Similarly, responses to grating patterns can be discerned in the right lower corner. These are further subdivided by spatial frequency. Diagonal is black since the neural activity induced by each stimulus is compared to itself, resulting in highest possible similarity. Note. From Koenen et al. (2016), with permission of corresponding author and publisher

Koenen et al. (2016) revealed that basic stimulus categories such as pattern, color, amplitude, and spatial frequency were discernable from the neuronal population responses. It is important to remember that the animals were not conditioned to discriminate between stimuli but were merely pecking at each of them to obtain food. Accordingly, the actual behavior of the animals did not differ between stimulus categories, although their neural responses did. Thus, associative visual neurons of the avian forebrain reveal category-specific population coding, even without any categorization training. This is exactly what would be required to perceive a perceptual coherence in different images, even without categorization training. Indeed, Herrnstein and de Villiers (1980) had proposed that differential reinforcement may not produce but merely disclose perceptual groupings that are bottom-up driven at the pictorial level and produce stimulus generalizations that are then picked up and further shaped by reinforcement contingencies. Consequently, Astley and Wasserman (1992) reported that pigeons perceive similarity among members of basic-level categories, making it likely that the birds had perceived basic-level categories to be perceptually cohesive.

In a subsequent study to Koenen et al. (2016), Azizi et al. (2018) confronted pigeons with a large set of photographs that depicted animate (humans; nonhumans) and nonanimate items (natural objects; artificial objects). The subcategories were partly further subdivided (humans = bodies vs. faces; nonhumans = animal bodies vs. animal faces; natural objects = vegetables vs. fruit; artificial objects = tools vs. traffic signs). This stimulus set had been previously used to reveal category-specific population coding in monkey ITC (Kriegeskorte et al., 2008). As in Koenen et al. (2016), the pigeons in this new study merely had to peck on each stimulus to obtain food. Cellular responses from entopallium and surrounding associative areas were this time analyzed with a linear discriminant analysis (LDA) to identify a linear combination of features with which categories can be separated based on their spike trains. It turned out that individual neurons did not significantly distinguish between categories, while the whole neuronal population did. More specifically, the recorded cells evinced a highly significant categorization along the animate/inanimate border. A more specific analysis revealed that this categorization was mainly driven by the category “human.” Further scrutinizing of the data set finally demonstrated that the computation of this category did not emerge in the entopallium but in the associative visual areas. Pigeon neurons reliably and correctly categorized photographs of human bodies or faces with a linear increase in the number of associative visual neurons analyzed, and they reached practically 100% with just 35 cells. As such, only a small number of neurons in the visual associative forebrain of pigeons are sufficient to recognize the presence or absence of a person in a photograph.

Taken together, even at the level of visual association areas of pigeons, visual categories like “human” can be revealed at the level of small neuronal populations. Since neither categorization training nor differential rewarding paradigms were involved, it is likely that these category-specific coding properties are driven by mere feed-forward stimulus input statistics. These results overlap with the findings of theoretical analyses (Gale & Laws, 2006) as well as imaging and cell recording data in monkey and human visual associative cortex (Kriegeskorte et al., 2008; Stansbury, Naselaris, & Gallant, 2013). Behavioral studies in monkeys and pigeons reveal similar findings. Monkeys that are playing a same–different task are more likely to confuse different faces or different fruits as “same,” based on simple overlapping perceptual features (Sands et al., 1982). Very similar results occur when pigeons are tested with leave forms of different tree species (Cerella, 1979).

As outlined above, population-level category coding in the primate ITC is just one pillar of primate categorization. An additional pillar is the PFC, which learns to select response patterns based on a dopamine-mediated reduction of the prediction error. As a result of this learning process, the PFC starts to select the relevant visual category dimension from the ITC and to choose the appropriate action dimension from premotor cortex and striatum. This is reminiscent of Sutherland and Mackintosh (1971) who proposed that discrimination learning “involves two processes: learning to which aspects of the stimulus to attend and learning what responses to attach to the relevant aspects of the stimulus situation” (p. 21).

There is good evidence that the neural dynamics of the avian NCL are characterized by dopamine-mediated reductions of prediction errors, just like in primate PFC (Puig, Rose, Schmidt, & Freund, 2014). The organization of the avian dopaminergic system and its terminal areas in the NCL are highly comparable to that of the mammalian PFC (Durstewitz, Kröner & Güntürkün, 1999; Durstewitz, Kröner, Hemmings, & Güntürkün, 1998; Herold et al., 2011; Waldmann & Güntürkün, 1993; Wynne & Güntürkün, 1995). As in mammals, dopamine release and local dopamine receptor adjustments in NCL change during the course of learning of associative tasks (Herold et al., 2012; Karakuyu et al., 2007). As predicted by reward prediction error accounts, blocking of forebrain D1 receptors in pigeons abolishes different learning speeds relative to different reward magnitudes (Diekamp, Kalt, Ruhm, Koch, & Güntürkün, 2000; Rose, Schiffer & Güntürkün, 2013; Rose, Schmidt, Grabemann, & Güntürkün, 2009). In addition, and again in line with the mammalian dopamine literature, blocking D1 receptors in the associative pigeon forebrain disrupts local visual attention processes (Rose, Schiffer, Dittrich, & Güntürkün, 2010). A recent study demonstrated that avian learning unfolds identically as in monkeys, by increases and decreases of phasic dopamine release depending on better-than-expected or worse-than-predicted outcomes (Gadagkar et al., 2016).

Taken together, our hypothesis assumes that population-level category coding in visual associative areas and a dopamine-mediated reward-prediction error reduction in NCL constitute the neural core of visual category learning in pigeons. Our explanation does not necessarily require representations at exemplar or prototype level but assumes that the animals simply optimize their reward chances by exploiting the output of their massively parallel visual system that processes low-level visual features. This does not exclude the possibility that pigeons can learn rules or acquire abstract concepts when low-level input is insufficient for appropriate decisions. However, we are convinced that simple parallel analyses are able to explain even quite complex examples of avian categorization behavior. For example, Grainger, Dufau, Montant, Ziegler, and Fagot (2012) and Scarf, Boy, et al. (2016) had shown that both baboons and pigeons learn orthography, seemingly guided by letter-string-based algorithms. However, Linke, Bröker, Ramscar, and Baayen (2017) recently demonstrated that a deep learning network that received mere gradient orientation features as input units and operated according to Rescorla–Wagner learning rules could perfectly predict baboon lexical decision behavior. Thus, the power of a combination of population-level feature analysis and dopamine-mediated reward-prediction error reduction can represent quite a powerful mechanistic explanans for at least a good part of animal categorization behavior. In fact, our hypothesis operates along the same steps of reasoning as the previously discussed common elements model of Soto and Wasserman (2010). What we add is a detailed neurobiological view of the processes that Soto and Wasserman had outlined in computational terms. By this, such an overlapping theoretical view in visual categorization in pigeons now becomes testable from neuron to behavior.

In the following, we both visualize and outline the proposed processes that might unfold during categorization (see Fig. 4): When a pigeon starts to perceive and to respond to various visual patterns, possibly different neural processes emerge within the visual associative forebrain areas and the NCL. In the visual associative structures of the avian forebrain, several million neurons will respond to the stimuli with slightly individually distinct, but largely overlapping, spike patterns. The coding property of each cell mostly will be too broad to enable the discrimination of a certain category. However, at the population level, different kinds of categorical distinctions will be discernable in a feed-forward manner.

Fig. 4
figure 4

Proposed neuroscientific hypothesis on perceptual categorization learning in pigeons. Pigeon perceives stimuli showing humans or cars and then processes t hem in visual pathways. Stimuli seen in pigeons’ frontal visual field are primarily processed in the entopallium and then projected to the visual associative areas. Population coding properties of visual associative neurons can result in categorical distinctions between objects (humans; cars) or their canonical parts (faces; tires). Neurons of the visual associative areas project to the “prefrontal” NCL in a feed-forward manner and from there receive feedback projections. If the animal is playing a human/no-human categorization task and has pecked on a photograph depicting a human or a car, dopaminergic neurons (shown in brown) will adjust their firing frequency relative to the subsequent occurrence of reward or nonreward. Thereby, local NCL networks that receive input from a population of human-coding visual neurons that initiated a key peck will be synaptically strengthened. Back projections from the NCL to visual structure are able to further sharpen category boundaries, depending on experimental conditions. NCL neurons take part in a decision process with which downstream premotor areas are activated to execute the peck on a key that shows a human or a car. (Color figure online)

While all of this unfolds, the pigeon will sometimes be rewarded after pecking on a certain stimulus, and sometimes not. Each reward will trigger a dopamine release in different projection areas of the dopaminergic system, including the NCL. Trial by trial, this dopamine-mediated prediction error adjustment will strengthen those synaptic connections that were active shortly before dopamine release, while other synapses will be downregulated that were not followed by reward. NCL neurons constitute the interface between ascending sensory and descending motor pathways. As a result, specific sensory-prefrontal-motor circuits will be strengthened. Those circuits that code for a rewarded combination of category boundaries are then associated with the actions that produce the rewarded outcome.

The visual associative areas of birds are innervated by only a small number of dopaminergic fibers and have a modest density of D1 receptors (Durstewitz et al., 1999; Wynne & Güntürkün, 1995). Thus, similar reward-dependent synaptic adjustments might also go on, albeit at a lower level, in the avian associative visual structures. In addition, it is conceivable that top-down input from the NCL sharpens category borders of visual neurons according to reward contingencies. These two synergistic mechanisms could explain why visual neurons in pigeons quickly retune their activity patterns according to the reward properties of learned stimuli (Colombo et al., 2001; Johnston et al., 2017a, 2017b; Scarf, Stuart, et al., 2016; Verhaal et al., 2012).

In summary, our hypothesis combines the rich experimental and theoretical tradition of categorization learning in pigeons with neurobiological insights on visual processing and dopamine-mediated plasticity in primates. Our approach associates these until now mostly unconnected areas of research and combines them with a hypothesis that can be tested at different levels of analysis. We hope that our neuroscientific bird’s eye view on category learning will provide new angles of inquiry, thereby enabling fresh insights on this fascinating field of science.