Encyclopedia of Evolutionary Psychological Science

Living Edition
| Editors: Todd K. Shackelford, Viviana A. Weekes-Shackelford

Neurobiology of Language

  • Dieter G. HillertEmail author
Living reference work entry

Latest version View entry history

DOI: https://doi.org/10.1007/978-3-319-16999-6_3334-2



Research on language as a biological and neurological object


The research area neurobiology of language aims to understand the genetic and neurological capacity that enables modern humans to acquire and use language. This language-related neurobiological capacity includes internalized syntactic, phonological, morphological, and semantic structures. Linguistic structures, which are externalized due to linguistic-cultural changes, are therefore not considered. Other relevant approaches comprise relevant genetic foundations, the examination of neural regions and circuits, and comparative studies mainly conducted with songbirds, cetaceans, and nonhuman primates. It is however controversially discussed what language is and how it might have emerged in the lineage of hominids.

Models and Paradigms

Charles Darwin’s (1809–1882) work on natural selection and the evolution of organism represent the foundation for understanding cognitive capacities such as language in an evolutionary context. Already, in his remarkable volume Descent of Man and Selection in Relation to Sex Darwin (1871) explicitly states that the human language system is an “instinctive tendency to acquire an art.” He regards language as a property of the human brain and considers birdsongs as a nearest analogy to the human language. He believes moreover that our ancestors may have used a musical protolanguage to express courtship and basic emotions. Darwin’s view on evolution and language was way ahead of his time. He was however also attacked by linguists such as Müller (1866) stating that “language is the Rubicon, which divides man from beast, and no animal will ever cross it …” In the same year, the Linguistic Society of Paris even banned any debates on the evolution of modern humans. In the nineteenth century, Paul Broca (1824–1880) and Carl Wernicke (1848–1905) pioneered the field “neuropsychology of language” by discovering and modeling different types of aphasic disorders. Historical studies on language and related disorders were mainly concerned about describing and modeling these clinical disorders in relation to cortical lesions. Biological aspects of language such as the origin or evolution of the language capacity were mostly not considered.

In modern times, Eric Lenneberg (1921–1975) took up the questions about the human-specific biological capacity of language. Lenneberg (1967) argues that language is a species-specific trait, learning of this trait is not possible, and language is not an artifact of culture. He is particularly known for the hypothesis of a critical period for language acquisition. In this vein, Chomsky (1965) introduced in context of a nativist theory of language the concept of the language acquisition device and of the universal grammar, an innate human capacity to acquire language. He argues that toddlers acquire language reflex-like typically without explicit instructions, and despite of poverty of input and lack of negative evidence.

Today, most researchers agree that the human language is an innate capacity. It is controversially discussed however whether this language capacity evolved (e.g., Pinker and Bloom 1990) or is the result of a sudden mutation (e.g., Berwick and Chomsky 2016). This dispute mirrors to some extent the historical debate between Darwin and the scientific community, which defends a saltationist view. Noam Chomsky argues that the human language faculty emerged by means of a sudden mutation in the last 50–100 ky. In contrast, others argue that language is the result of a gradual evolutionary process, which took place since the hominid lineage split from genus Pan 4–6 mya. Some argue that our closest biological ancestors employed a protolanguage (Bickerton 1981) and that already our sister species Neaderthals or Denisovans used language much like modern humans do (Dediu and Levinson 2013; Hillert 2015). Fossil findings but also limited genetic data provide indirect evidence about the evolutionary stages of language and the human mind. Since empirical data are mostly incomplete or missing, the debate is at present open and undecided.

More recent approaches try to understand the human language capacity in context of other cognitive capacities of the human mind. Numerous studies report comparable computations between the linguistic and other cognitive domains such as music or vision (e.g., Fadiga et al. 2009). The open question is thus whether the acquisition of language relies on a biological capacity, which is specific to language or to the human mind in general. A lion’s share of attention received an article by Hauser et al. (2002), who consider the “narrow language faculty” as unique to modern humans, in particular with respect to its property of linguistic recursion. Originally evolved as an all-purpose computation, it refers to a computation, which calls itself such as it is the case when one sentence is embedded in another sentence. In contrast, the “broad language faculty” would share its properties with nonhuman animals. More recently, the operation (binary) Merge, which takes two syntactic objects A and B and forms a new object and which has also the property of recursion, is considered as the fundamental characteristic of language. The linguistic recursion hypothesis, which is part of the Chomsky’s minimalist program, has been criticized for various reasons: nonhuman animals are able to discriminate recursive strings, phonological (non-syntactic) recursive structures can be found, some languages do not show evidence of recursive structures, and Merge and recursion can be also found in nonlinguistic domains such as music (e.g., Corballis 2011).

Other approaches might also provide some clues how the human language capacity might have evolved. For different reasons, particular populations do not use a full grammar but seem to rely on a conceptual strategy to express their thoughts (Jackendoff 2002). Pidgin languages, for example, lack subclauses, grammatical words, and inflections. When a Pidgin language develops to a Creole language in the next generations, it becomes grammatically fully fledged (Bickerton 1981). Another well-known case is the “Idioma de Señas de Nicaragua.” Without the help of teacher or other types of instructions, deaf children developed their own sign language. First, they started to use pidgin-like home signs. Later, younger children modified this pidgin-like signs used by older children and added grammatical features much like when a Creole language emerges. Another example refers to agrammatism in Broca’s aphasic patients, who suffer from lesions in Broca’s area (pars operculus and pars triangularis) and surrounding areas in the left inferior frontal gyrus (IFG). These patients rely in production and comprehension on a basic subject-verb-object (SVO) strategy, avoid non-canonical sentence structures (e.g., wh-questions or sentences with relative clauses) and tend to drop grammatical words and inflections. It is an open question whether these and similar examples mirror an innate and hardwired language capacity of our biological ancestors.

Our understanding of the neurobiology of language depends thus on the definition of language. The minimalistic program tries to specify the core property of language, that is, the ability to generate hierarchical internalized structures. It is claimed that this property is unique to humans and thus language would not have evolved but emerged when H. sapiens dispersed out of Africa about 75,000 ya. Others emphasize that the human language system is the product of a gradual evolution of different cognitive components and consider brain growth in the hominid lineage as an indicator of reorganization and new behavior. The gradual approach usually favors the assumption that the human language capacity had a precursor in our biological ancestors. Since evidence about the possible evolution of the human language capacity is incomplete or missing, an inductive or empirical approach has at present its limitations.

Molecular Genetics

From a molecular viewpoint, the birth of a new life is a biochemical package, which includes information about development, regulation, and maintenance of cellular structures, which ultimately also determines the neural circuits scaffolding language processing. Our closest living relatives, the chimpanzees, share 96 % and Neanderthals 99.7 % base pairs with our DNA sequence. We might thus assume that Neanderthals’ (or Denisovans’) biological capacities for cognition were similar as compared to anatomical modern humans. Indirect evidence such as perforated and pigmented shells supports this hypothesis but any evidence about linguistic behavior is missing. DNA analysis shows that Neanderthals and H. sapiens diverged from the common ancestor H. heidelbergensis or H. erectus between 350–400 kya. Our estimates about the age of the language capacity is simply based on the fact about (a) the age of the Omo remains – the first discovered fossils of anatomical modern humans – and about (b) the time period between 50–100 kya when modern humans dispersed out of Africa.

No genetic data can be at present linked to the language capacity. Evidence has been found, however, that the forkhead box protein P2 (FOXP2) is expressed in different cortical areas including the basal ganglia and the IFG. Mutations of FOXP2 in humans cause severe speech disorders and have been associated with verbal dyspraxia (e.g., Fisher et al. 1998). Members of the KE family suffer for three generations from a rare speech disorder while language comprehension seems to be preserved. Genetic analysis reveals that the found mutation was the result of an arginine-to-histidine substitution at position 553. R553H is part of the FOXP2 transcription factor, represents a loss-of function mutation, and is conserved in the KE family. Magnetic resonance imaging (MRI) shows that members of the KE family have a bilateral volume reduction of 25 % in the caudate nucleus and basal ganglia. These subcortical structures closely correlate with motor functions. Again, a reduction of gray matter was found in Broca’s area, which might be related to the impaired cortico-basal ganglia circuits regulating control of articulation.

The human version of this gene differs from chimpanzees in two amino acid coding positions, which encode protein sequences. In this respect, modern humans and Neanderthals do not differ. We do not know however whether the speech-related mutations took place on these two protein coding regions. The noncoding regions of FOXP2 seem to be different between Neanderthals and humans. Since the protein differences are minimal between humans and chimpanzees, the efforts focus on finding regulatory changes along with nonprotein coding differences. So far, it appears to be science fiction to establish a link between these findings and linguistic computations (or human cognition in general).

Numerous genes seemed to have contributed to the expansion of the neocortex in the hominid lineage. SRGAP2 is of particular interest as the occurrence of particular hominids does not only correlate with copies of SRGAP2, which trigger brain growth and probably cortical reorganization, but also with the appearance of different artifacts (cf. Hillert 2014, 2015). Again, no conclusions can be drawn regarding the genetic foundation of language. A different approach to determine the role of genetics in brain development is to compare the cortical structures of monozygotic and dizygotic twin pairs. In one study an increasing similarity of brain structures has been found in subjects with an increasing genetic affinity (Thompson et al. 2001). The genetic factors showed a significant impact on Broca’s and Wernicke’s area as well as on frontal regions. Future studies may thus provide specific information about language-related genetic factors. They trigger the formation of particular neural structures, which in turn support particular linguistic structures such as the computation of inflectional morphology.

The brain of modern humans is highly efficient in terms of acquiring and computing language. Natural selection seems to have preserved the adaptive phenotype language and the brain has the plasticity to keep track of fast-changing linguistic information. It might have been a continuous evolving process, including genetic adaptation and the Baldwin effect, which led hardwired linguistic computations.

Linguistic Microcircuits

An epigenetic model of language acquisition assumes that genetic expressions of neural growth interact with organism-specific experiences. In the case of humans, it has been argued that neural growth in the perinatal period involves the transition from gene expression to an experience-expectant phase, in which ubiquitous stimuli common to all members are required to modify the neural circuits. Neurons, which are in excess, are pruned and dendritic attrition shapes neural circuits. These circuits are thought to be relatively permanent, but can be used for subsequent neural sculpturing. Parameters are set for this critical or sensitive period phenomena. In contrast, during the experience-dependent phase idiosyncratic information, which is unique to the individual, are stored. Thereby, active formation of new synaptic connections in response to the input will be stored. Both mechanisms apparently work against each other: stable and optimized neural circuits are the result of neural pruning and at the same time new neural circuits can be developed to assimilate new information. The formation of particular neural circuits for internalization of linguistic information such as phonology and syntax seems to follow a hardwired program, while other circuits are shaped by neural plasticity to cope for example with new lexical semantic information. For instance, the acquisition of canonical SVO (subject-verb-object) sentence structures can be directly mapped onto our sensory-motor experience. These structures might be acquired by experience-expectant processes, while more complex sentence structures may primarily involve experience-dependent processes. It remains to be seen whether this difference can be verified by the formation of different fiber track systems in young children.

Figure 1 compares language-relevant fiber streams in newborns and adults. The data are based on MR diffusion tensor imaging (DTI). Newborns and adults have one dorsal and one ventral stream in common.
Fig. 1

Fiber tracking in newborns (above) and adults (below) by using MR-DTI (Adapted and modified, Perani et al. 2011; © Proceedings National Academy of Sciences)

The ventral stream (green) connects the temporal lobe (TL) with the inferior frontal gyrus (IFG) via the extreme capsule (EC) and the dorsal stream (yellow) connects the TL with the premotor cortex via the arcuate fasciculus (AF) and the superior longitudinal fasciculus (SLF). In addition, children develop a second dorsal stream (purple: as shown for adults). This stream connects the TL with the IFG (including Broca’s area) via the AF and the SLF.

At present, the core language model of an adult consists of a dual-stream system: The dorsal route maps auditory input to motor speech representations and the ventral route maps the auditory input to semantic representations. It has been suggested to divide the dorsal stream into an additional temporo-parietal-frontal segment. The anterior and posterior language areas are connected by means of a dorsal stream with the direct AF stream and the indirect MLF and SLF streams and by a multimodal ventral stream. Resting state functional connectivity (RSFC) in humans and autoradiographic data in monkeys suggest the following language-relevant circuits (see Fig. 2):
Fig. 2

Fiber connections of language circuits based on RSFC in humans and autoradiographic data in monkeys (abbrev. see text; adapted and modified; © García et al. 2014; based on Kelly et al. 2010 © Federation of European Neuroscience Societies and Blackwell Publishing Ltd)

SLF connects the anterior supramarginal gyrus (aSMG; homolog in monkeys is PF, part of inferior parietal area) with the premotor Brodmann area (BA) 6, the posterior SMG (pSMG; homolog is PFG, part of inferior parietal area) with BA 44 and the angular gyrus (Ang; homolog is PG, middle parietal area) with BA 45B and 45A; AF connects the superior temporal sulcus (STS) and the superior temporal gyrus STG with BA 44 and 45B; the middle longitudinal fasciculus (MLF) connects STS and STG with the pSMG and Ang; the ventral stream connects anterior parts of the STG, STS and MTG with Broca’s area (BAs 44 & 45).

In particular, the dorsal stream seems to have undergone significant changes in the human evolution. They assume thus that the development of this auditory-vocal circuit was a key event for enabling modern language. Along with the acquisition of voluntary control over the larynx and the supralaryngeal tract by means of a direct projection to the brainstem, vocal motor neurons and the phonological loop with rehearsal operations may have emerged. Again, phonological computations may have contributed to the development of more complex syntax.

Functional MRI and diffusion tensor imaging (DTI) studies show that BA 44 is connected via the dorsal stream to STG/STS (Wernicke’s area). Again, it has been assumed that the frontal operculum (BA 45 & 47) is connected to STG via a portion of the EC fiber system and the uncinate fasciculus of the ventral stream. Again, BA 6, which is part of the premotor cortex and adjacent to BA 44, is mainly involved in orofacial muscular control and seems to be stronger connected to inferior part of the parietal lobe. In contrast, BA 44 connects to the posterior part of the parietal lobe. Similar, RSFC analysis indicates greater BA 45 connectivity to the Ang and to the superior and middle temporal gyrus (STG and MTG) relative to BA 44. This connectivity difference has also cytoarchitectonic reasons. Layer IV sends efferent fibers to the thalamus and has a reciprocal excitatory and inhibitory connection to the thalamus and adjacent areas. BA 45 is granular as it has a well-developed layer IV. BA 44 can be considered as dysgranular as layer IV is only rudimentary developed. BA 6 is agranular, that is, it is lacking layer IV of the cortical structure.

BA 44 seems to be critical for generating rapidly (100–200 ms) hierarchical syntactic structures as verified by early left anterior negativity found in event-related potential (ERP) studies. In contrast, BA 45 and 47 seems to be more involved in generating local phrase structures (Friederici 2009). Similar, it has been proposed that basic syntax (ventral stream) involves phrase structures, lexical retrieval and interpretation and extended syntax (dorsal stream) hierarchical, composed phrase structures, abstract rules and displacements (movements). Analogously basic (ventral) and extended (dorsal) computations were assumed with respect to the morphological and phonological level: Basic morphology requires the generation of whole words such as irregular, derived, or high-frequent regular forms and extended morphology involves compositional structures as in the case of multimorphemic or regular forms. Again, basic phonology would consist of selecting discrete elements and retrieval of fixed lexical forms and extended phonology involves the composition of complex syllables and higher prosodic structures (Lely and Pinker 2014). Extended computations across all three linguistic levels would be scaffold by the AF, which connects the left IFG with the superior posterior temporal lobe regions. It is assumed that this dorsal route evolved more recently in the human lineage. Also, it has been argued that the extent to which fiber pathways are engaged depends on cognitive demand rather than on the type of linguistic information to be processed.

In addition, right-hemisphere functions such as prosody, intonation, and accentuation contribute as well to language processing in typical right-handers. Accordingly, it has been assumed that a right ventral stream integrates information over longer timescales while shorter timescales seem to be bilaterally represented (see Hickok and Poeppel 2007). Accordingly, the authors assume that during word recognition the left-sided sampling rate ranges between 25 and 50 Hz and the right-sided rate between 4 and 8 Hz. This model is based on the finding that rapid spectral changes such as formant transitions occur in a time window of 20–40 ms, while syllabic and prosodic structures are processed in the range 100–200 ms (Fig. 3).
Fig. 3

Dual stream model of speech processing (p posterior, a anterior, d dorsal, mp mid-post, ITS inferior temporal sulcus, PM premotor cortex, Spt Sylvian parietal-temporal; other abbreviations used in text; adapted and modified, Hickok and Poeppel 2007; © Nature Publishing Group)

The finding that only severe speech perception disorders can be found if the patient suffers from bilateral (but not unilateral) lesions supports the idea of a bilateral ventral stream for speech. As already pointed out, the dorsal stream involves the posterior planum temporale, the premotor cortex as well as the IFG. The PT, which is typically larger in the left hemisphere as compared to Broca’s area, is located posterior to the auditory cortex (Heschl’s gyrus). It is the core of Wernicke’s area as part of the STG and the parietal lobe. The PT is apparently involved in early auditory processing, in absolute pitch recognition (as found in music), but also in the analyses of many different types of sounds. Phonological information is mainly computed in the STS, whereas the interface of phonological-conceptual information is mainly recruited in the MTG and STG. Again, the anterior temporal lobe area computes semantic-syntactic information, which is basic according to the models mentioned above. The dorsal stream is responsible for auditory-motor integration. In addition to motor areas, STS and the Sylvian parietal-temporal region are involved in sensory-motor integration. Although top-down processing is not crucial for speech perception per se, it can facilitate speech recognition. The model discusses forward predictions: In the case of the ventral stream they are mediated by priming and context, and in the case of the dorsal stream by motor functions.

It is an extremely challenging research task to relate neuroanatomical structures to particular language functions. Future progress will strongly depend on the development of new techniques to examine noninvasively development and function of our cortical structure that enables us to acquire and generate sentences in a reflex-like manner. It is moreover extremely important to develop also appropriate theoretical frameworks to model and predict hardwired and internalized linguistic computations. Another important approach to be mentioned here in this brief review involves comparative studies.

Comparative Models

The study of cortical systems along with cognitive-behavioral patterns in nonhumans can be relevant for understanding the human language system. This approach can provide valuable analogies how neurobiological circuits evolve and enable cognitive computations. Comparative studies focus in particular on our closest living relatives (great apes and monkeys) or on nonhuman vocal learners (e.g., songbirds and cetaceans).

The leftward asymmetry of Broca’s area, in particular of BA 44, is not human-specific as this asymmetry can be also found in genus Pan (chimpanzees and bonobos) and gorillas. BA 44 seems to serve the coordination of gestures and vocalizations in great apes. It is possible that BA 45 expanded on the basis of BA 44 and increased cortical folding in the left IFG. The homolog of Wernicke’s area is the left PT in genus Pan, which is much like in humans larger in the left hemisphere. Wernicke’s area has been considered as an older cortical structure than Broca’s area. The temporal-parietal region (Tpt) can be also found in some prosimians (e.g., lemurs and lorises). Macaque monkeys control their facial musculature by means of BA 44 and process meaningful vocalizations in the left hemisphere. The left superior temporal gyrus (STG) and the left inferior parietal lobe are homologs of Wernicke’s area.

Based on findings with nonhuman primates, four different segments of the AF-SLF stream have been proposed: three SLF streams and the AF (Petrides and Pandya 2009). The SLF III stream has been in particular considered as a possible language stream as it connects the inferior parietal lobe with the homologs of BAs 44 and 45 (Broca’s area). Several studies indicate however that in macaque monkeys the area Tpt (homolog to Wernicke’s area) projects to the dorsal and lateral premotor area rather than to the language homologs 9/46d, 6d, and 8Ad. Again, significant differences of the neural pathways between humans, chimpanzees, and macaque monkeys revealed a comparative DTI study (see Fig. 4). In humans, the AF strongly connects in the left hemisphere the frontal lobe with the medial temporal gyrus (MTG) and the inferior temporal gyrus (ITG), including Wernicke’s area. This region corresponds in macaque to the extrastriate visual cortex, adjacent to the primary visual regions. The MTG and ITG enlarged disproportionally in the hominid lineage following the split from genus Pan. The pathways in chimpanzees are slightly more developed as compared to those in macaque and may reflect some prior conditions that enable connections between meanings and motor sequences. However, it is apparent that along with the increase of the human brain size the white-matter volume increased in frontal and temporal areas, in particular the dorsal stream.
Fig. 4

Schematic average tractographic results found for macaque monkeys, chimpanzees, and modern humans, (IFS inferior frontal sulcus, IPS intraparietal sulcus, PrCS precentral sulcus, CS central sulcus, STS superior temporal sulcus, PS principal sulcus, AS arcuate sulcus (BAs in red/orange: Broca’s area BAs 44 and 45 with extension BA 47; BAs in blue/turquois: Wernicke’s area BAs 22 and 40 with extension BA 37; adapted and modified, Rilling et al. 2008; © Nature Publishing Group)

Moreover, a unique set of neurons have been first discovered in macaque monkeys (Rizzolatti et al. 1996). They recorded single visuomotor neurons in the premotor cortex (area F5) of the macaque. The homolog of monkey’s area F5 is ventral BA 6, which extends to BA 44 and the inferior parietal area in humans. These premotor neurons fired when the macaque performed an action (e.g., reaching for food). Some of the neurons fired as well as when the macaque observed a similar action by another monkey or by a human (e.g., picking up food). These neurons are called mirror neurons as they do only discharge when an action is performed or observed but not when an object is presented without an action. One-third of the F5 mirror neurons are classified as strictly congruent, that is, they discharge when observing and performing an action (motor encoding). Two-third of the F5 mirror neurons are defined as broadly congruent as they discharge when observing an action without motor encoding. When a macaque, moreover, hears the sound of an action typically associated with an object audiovisual mirror neurons fire in F5, but for most neurons the discharge is smaller for sound per se than for sound and vision. Also, F5 seems not to be exclusively related to manual tasks. For instance, in rhesus monkeys the cortical larynx representations overlap with F5 and BAs 12 & 45 are involved in vocalization. In sum, these single-cell recordings in primates revealed that some properties of mirror neurons are located in the premotor and parietal cortex.

More recently, the role of mirror neurons in the evolution of language has been more broadly discussed as action mirror neurons might have contributed to the evolution of spoken language (Arbib 2011). Accordingly, he divides between seven different evolutionary stages depending on each other. In particular, the evolution of complex vocalization does not need to depend strictly on gesture. It is also possible that the vocalization system coevolved with the gestural system. There is no specific reason to assume that structures developed by the gestural system have been step-by-step transferred to the vocal system. Most of the communication in monkeys and apes is vocal (besides body language), and it is possible that auditory mirror neurons play a role in vocal imitation. Similar to a motor model for speech, vocal mirror neurons simulate the auditory input and can be therefore used to compare auditory input with own articulation. The basic vocal mirror neuron system may have become more complex by an increase of voluntary control of the vocalization mechanism. This has been probably accomplished by an increase of the auditory working memory in the prefrontal areas.

Research on the evolution of cortical circuits of vocal learning has been also performed with respect to speech and language in humans (Petkov and Jarvis 2012). The auditory pathway of vocal learners such as songbirds and humans was inherited from amniotes, which evolved during the Carboniferous period ca. 320 mya from amphibian reptiliomorphs and include today all reptiles, birds, and mammals. Vocal learning is a process that refers to the ability to acquire vocalization by imitation rather than by “instinct.” Thus, a species can modify their vocalization patterns as a result of experience. It is apparent that vocal learning depends on auditory learning but auditory learning does not depend on vocal learning. Dogs, for example, can recognize spoken words but cannot vocalize them. Again, vocal learners must be able to use auditory percepts to correct their vocalization. The mammalian as well as the avian group have members, which can be considered as vocal nonlearners. More recent studies provide evidence that chimpanzees and other primates are also vocal learners but significantly less as for example songbirds. Thus, the capacity of vocal learning did not evolve from a common ancestor of both groups. Most interesting, the acquisition of human speech and of vocalization in songbirds appears to be comparable as both have sensitive periods before adulthood to acquire and maintain phonological or vocal sequences. Humans are highly prolific vocalizers by acquiring virtually infinite new phonological sequences during their complete life span. These sequences typically correspond to particular meanings. In contrast, vocal learning in birds such as male zebra finch is hardly possible after puberty. Their songs have an emotionally rewarding experience such as mating or defending a territory.

In humans, vocal learning relies on pathway within the forebrain (prosencephalon). It consists of thalamic structures and the cerebrum (telencephalon), which includes the cerebral cortex and the basal ganglia. In contrast to vocal nonlearning birds, complex vocal learning birds such as songbird, hummingbirds, and parrots have particular forebrain neuron clusters (nuclei).

It is said that primates have an innate, involuntary vocalization system, which consists of connections from the amygdala, the orbitofrontal cortex (OFC), and the anterior cingulated cortex (ACC) to the periaqueductal grey (PAG) in the midbrain. Again, PAG neurons synapse to neurons of the reticular formation (RF); in turn, they synapse to the α-motoneurons in the nucleus ambiguous (Amb), which control the muscles of the larynx for vocal production. In nonhuman primates, BA 6 (area 6vr) projects in addition to the RF and from there to the Amb. Area 6vr and the ACC are also connected with the primary motor cortex, amygdala, and thalamic structures (not shown).

Humans rely more on the learned vocal pathway during speech. The learned vocal pathway projects from the face area of the primary motor cortex in BA 4 (laryngeal motor cortex, LMC) to the Amb (Fig. 5: red arrow) and a cortico-striatal-thalamic loop for learning vocalizations (white arrows). This matches in songbirds the direct forebrain projection to vocal motor neurons in the brainstem, that is, from robust nucleus of the arcopallium (RA) to the twelfth nerve nucleus (XIIts). It has been stated that there is no direct cortico-bulbar projection in vocal nonlearners such as chickens and monkeys. In the case of a lesion in the LMC, learned vocalization is impaired in humans. Thus, it has been assumed that the humans’ ability of speech and spoken language is related to the formation of a direct pathway from the LMC to the Amb. In general, studies with nonhuman primates were not able to report evidence for a direct pathway between analogous areas of the LMC (e.g., 6vr) and Amb. A recent study with lab mice, however, might have revealed a region, which seems to be homologous: it is active during vocalization and makes a direct but sparse projection to the Amb (Arriaga et al. 2012). Connections between the anterior forebrain and the posterior vocal motor circuits are marked in Fig. 5 with dashed lines. Similar to humans, in mice two pathways converge on Amb, one from PAG and one from the primary motor cortex (M1).
Fig. 5

Schematic illustration of the vocal subsystems in (a) humans, (b) macaques, (c) mice, (d) songbirds, and (e) chickens; ADSt anterior dorsal striatum, Amb nucleus ambiguous, Area 6V ventral part of Area 6 premotor cortex, Area X a song nucleus of the striatum, ASt anterior striatum, AT anterior thalamus, DLM dorsalateral nucleus of the mesencephalon, DM dorsal medial nucleus of the midbrain, H hindbrain, Hp hippocampus, HVC letter based name, LMAN lateral magnocellular nucleus of the anterior nidopallium, LMC (BA 4) laryngeal motor cortex, M midbrain, M1 primary motor cortex, M2 secondary motor cortex, nXIIts 12th tracheosynringeal motor neurons, PAG periaqueductal grey, RA robust nucleus of the arcopallium, RF reticular formation, T thalamus, VL ventral lateral nucleus of the thalamus (Adapted and modified, © Arriaga et al. 2012; see Petkov and Jarvis 2012)

According to new evidence that the vocalization system seems to be more flexible in modulating or mimicking sound patterns in species, which were assumed to be nonvocal learners such as mice and monkeys, a continuum hypothesis of vocal learning has been proposed. In general, auditory processing can be easier modeled in nonhuman animals compared to learned vocal communication. However, not all patterns of human speech can be modeled in animals and not all vocalization patterns in animals can be modeled in humans. Since a certain group of songbirds and humans are excellent vocal learners, the question arise to what extent human sentences and bird songs can be compared at the syntactic level.

First, human language consists of a lexicon and syntactic rules. This lexical-syntactic system externally interfaces phonological and phonetic information in perception and production and internally it interfaces semantics including concepts, intentions, and emotions. Typically, birdsongs do no express abstract meanings, if at all some intentional or emotive states such as the attempt to attract mates or to defend territories. Thus, we may say that birds (or typically animals in general) use pragmatic meanings rather than semantic meanings. Again, birds make use of a phonetic syntax rather than of phonological syntax as the term phonology refers to the mental representation of an abstract sound system to convey meanings. In contrast to birdsongs and to animal communication systems in general, human language uses discrete semantic units and morphosyntactic rules to combine and create open-ended variations of new meanings.

Birdsongs use variations at the phonetic level but obviously not at the semantic level. For instance, a nightingale can rearrange clusters that generate hundreds of different songs. They may be used for identifying individual birds and express the level of sexual arousal. Canaries use sexy syllables to increase mate attractions, but this does not change the meaning of the song and indicates only the motivation level. Again, humans are able to synchronize the speech-meaning dimension with physical movements (dancing and acting) and/or complex music patterns (beat and harmonics) for a high-coordinated meaningful and/or pleasurable performance. Since a behavioral comparison is not possible at the semantic level, to what extent is the song-syntax of songbirds comparable with the human sentence-syntax?

Human language consists of nonlinear relationships between words. For instance, in the sentence The guitarist, who plays Bossa Nova, went on stage, the noun phrase (NP) the guitarist has to be linked to the verb phrase (VP) went on stage to understand the sentence meaning, that is, it is the guitarist, who went on stage. These kinds of nonlinear links are called nonadjacent dependencies, which are inherently part of the hierarchically organized structure of syntax in human languages. Moreover, nonadjacent dependencies are systematically ordered and require simultaneous processing of different structures to link these codependencies. Our nonadjacent dependency example above involves a nested center-embedded dependency, which are generally called as being recursive. The dependencies are embedded within one another as in the structure a1a2a3b3b2b1 … an−1bn−1, whereas an−1 is the unit to be linked to bn−1. In contrast to this context-free grammar version, a finite-state form has the structure a1b1a2b2a3b3 … (ab)n−1, in which iterations are appended to the end of a pair (Berwick et al. 2011).

In examining acoustic pattern recognition in European starlings evaluated the syntactic recursion hypothesis that the capacity of self-embedding is unique to the human language faculty (see Fig. 6). They conclude that the use of syntactic recursion is not unique to humans as their trained starlings have shown the ability to recognize these recursive patterns.
Fig. 6

Conditioned European starlings’ sonograms of (a) rattle-warble iterations and (b) rattle-warble nesting (Adapted and modified, Gentner et al. 2006 © Nature Publishing Group)

Berwick and colleagues (2011) emphasize however that the patterns tested do not match those sentence structures used by humans. While European starlings seemed to be able to recognize nesting after training, which needs to be differentiated from natural settings, they did not provide evidence of the bird’s ability to recognize dependencies of the kind that pairs specific a’s with specific bs. The starlings tested above recognized an equal number of a’s and b’s in terms of rattle and warble sound classes (rattle3warble3 patterns), but it has not been examined whether these rattle-warble patterns were properly paired off. Thus, at this point there is lack of evidence that songbirds or any other nonhuman species are able to use a strict context-free grammar. Birdsongs seem not to be able to generate hierarchical structures, which is one basic principle of human syntax.

For example, acoustic features of rattle or warble are not used to classify the rattle-warble sequence as a new unit, which in turn could be then used in more complex structures. Some songbirds however such as Bengalese finches seem to be able to perceive 3–4 notes as a single unit. In one study, the method of the classical psycholinguistic click experiment was applied, in which human subjects were presented with a click in the middle of a phrase (e.g., ate the apple). The participants tend to report that the click occurred at the beginning or end of the phrase. In using this click protocol, the authors reported that the Bengalese finches responded in a similar fashion: They perceived the click at the c or e of a cde unit. The Bengalese finches do no only perceive such units but can also produce them. Thus, it appears that they are able to combine notes into units much like humans use words to combine them to noun or verb phrases and to use these units somewhere else. However, in contrast to human syntax, Bengalese finches seem to lack the ability of dependent nesting to manipulate these combined notes. In this vein, it has been demonstrated that cotton-top tamarins (Saguinus oedipus) are able to parse synthetic stimuli sequences generated by finite-state grammars, but not those generated by a phrase structure that implies a simple recursively hierarchical structure. It is characteristic for human syntax that not a specific order of individual words is acquired but the specific order of word classes, also called phrase structures. Each single word can be inserted into a syntactic structure that provides a placeholder for this particular word class.

Also, new words can be created or different noncompositional strategies can be used to express figurative meanings. In the case of idioms, for instance, a sequence of different word classes can be treated as a unit that the syntactic structure does not map onto the compositional semantic structure. To understand the meaning of It rains cats and dogs, the phrase needs to be processed as a single unit, as a string of words, and excludes combinatory computations between the individual words. Such kind of linguistic parsing flexibility to express nuances of different meanings seems to be unique to humans.


In context of this brief overview of the field neurobiology of language, we were constrained to focus on the mainstream issues currently discussed. Thus, many important aspects and contributions to the field could not be considered here. In principle, all scientific approaches may provide useful insights about the fundamentals of mind and language. In this vein, humanities, social sciences, and life sciences seem to come closer in the attempt to map linguistic models to neurobiological capacities (and vice versa). It is obvious that this research agenda requires further specializations to accommodate cross-fertilization between established scientific programs. Here, not only the development and use of new technologies are of great significance but also theorizing is most important in terms of methodology and predictions. Advancing the research at both ends is essential for improving our understanding of language as a biological system.



  1. Arbib, M. A. (2011). From mirror neurons to complex imitation in the evolution of language and tool use. Annual Review of Anthropology, 40, 257–273.Google Scholar
  2. Arriaga, G., Zhou, E. P., & Jarvis, E. D. (2012). Of mice, birds, and men: The mouse ultrasonic song system has some features similar to humans and song-learning birds. PLoS One, 7(10), e46610.Google Scholar
  3. Berwick, R. C., & Chomsky, N. (2016). Why only us? Language and evolution. Cambridge, MA: MIT Press.Google Scholar
  4. Berwick, R. C., Okanoya, K., Beckers, G. J. L., & Bolhuis, J. J. (2011). Songs to syntax: The linguistics of birdsong. Trends in Cognitive Sciences, 15(3), 113–121.  https://doi.org/10.1016/j.tics.2011.01.002.
  5. Bickerton, D. (1981). Roots of language. Ann Arbor: Karoma Press.Google Scholar
  6. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.Google Scholar
  7. Corballis, M. C. (2011). The recursive mind: The origins of human language, thought, and civilization. Princeton: Princeton University Press.Google Scholar
  8. Darwin, C. (1871). The descent of man, and selection in relation to sex. London: John Murray.Google Scholar
  9. Dediu, D., & Levinson, S. C. (2013). On the antiquity of language: The reinterpretation of Neandertal linguistic capacities and its consequences. Frontiers in Psychology, 4, 397.  https://doi.org/10.3389/fpsyg.2013.00397.
  10. Fadiga, L., Craighero, L., & D’Ausilio, A. (2009). Broca’s area in language, action, and music. Annals of the New York Academy of Sciences, 1169, 448–458.  https://doi.org/10.1111/j.1749-6632.2009.04582.x.
  11. Fisher, S. E., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P., & Pembrey, M. E. (1998). Localisation of a gene implicated in a severe speech and language disorder. Nature Genetics, 18(2), 168–170.  https://doi.org/10.1038/ng0298-168.
  12. Friederici, A. D. (2009). Pathways to language: Fiber tracts in the human brain. Trends in Cognitive Sciences, 13(4), 175–181.Google Scholar
  13. García, R. R., Zamorano, F., & Aboitiz, F. (2014). From imitation to meaning: Circuit plasticity and the acquisition of a conventionalized semantics. Frontiers in Human Neuroscience, 8, 605.  https://doi.org/10.3389/fnhum.2014.00605.
  14. Gentner, T. Q., Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440(7088), 1204–1207.Google Scholar
  15. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it and how did it evolve? Science, 298, 1569–1579.Google Scholar
  16. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402.  https://doi.org/10.1038/nrn2113.
  17. Hillert, D. (2014). The nature of language. Evolution, paradigms and circuits. New York: Springer.Google Scholar
  18. Hillert, D. (2015). On the evolving biology of language. Frontiers in Psychology, 6.  https://doi.org/10.3389/fpsyg.2015.01796
  19. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press.Google Scholar
  20. Kelly, C., Uddin, L. Q., Shehzad, Z., Margulies, D. S., Xavier Castellanos, F., Milham, M. P., & Petrides, M. (2010). Broca’s region: Linking human brain functional connectivity data and non-human primate tracing anatomy studies. European Journal of Neuroscience, 32, 383–398.Google Scholar
  21. Lenneberg, E. H. (1967). Biological foundations of language. New York: Wiley.Google Scholar
  22. Müller, F. M. (1866). Lectures on the science of language: Delivered at the Royal Institution of Great Britain in April, May, & June 1861. London: Longmans, Green.Google Scholar
  23. Perani, D., Saccuman, M. C., Scifo, P., Anwander, A., Spada, D., Baldoli, C., Poloniato, A., Lohmann, G., & Friederici, A. D. (2011). Neural language networks at birth. Proceedings of the National Academy of Science, 108(45), 18566.Google Scholar
  24. Petkov, C. I., & Jarvis, E. D. (2012). Birds, primates, and spoken language origins: Behavioral phenotypes and neurobiological substrates. Frontiers in Evolutionary Neuroscience, 4, 12.Google Scholar
  25. Petrides, M., & Pandya, D. N. (2009). Distinct parietal and temporal pathways to the homologues of Broca’s area in the monkey. PLoS Biology, 7(8), e1000170.Google Scholar
  26. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707–784.  https://doi.org/10.1017/S0140525X00081061.
  27. Rilling, J. K., Glasser, M. F., Preuss, T. M., Ma, X., Zhao, T., Hu, X., & Behrens, T. E. J. (2008). The evolution of the arcuate fasciculus revealed with comparative DTI. Nature Neuroscience, 11, 426–428.Google Scholar
  28. Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Brain research. Cognitive Brain Research, 3(2), 131–141.Google Scholar
  29. Thompson, P. M., Cannon, T. D., Narr, K. L., van Erp, T., Poutanen, V.-P., Huttunen, M., Lönnqvist, J., Standertskjöld-Nordenstam, C. G., Kaprio, J., Khaledy, M., Dail, R., Zoumalan, C. I., & Toga, A. W. (2001). Genetic influences on brain structure. Nature Neuroscience, 4, 1253–1258.  https://doi.org/10.1038/nn758.
  30. van der Lely, H. K. J., & Pinker, S. (2014). The biological basis of language: Insight from developmental grammatical impairments. Trends in Cognitive Sciences, 18(11), 586–595.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.US San DiegoLa JollaUSA

Section editors and affiliations

  • Christopher D. Watkins
    • 1
  1. 1.Division of Psychology, School of Social and Health SciencesAbertay UniversityDundeeUK