Encyclopedia of Evolutionary Psychological Science

Living Edition
| Editors: Todd K. Shackelford, Viviana A. Weekes-Shackelford

Musical Protolanguage

  • Hadas ShintelEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-16999-6_2856-1



Musical protolanguage refers to a hypothetical intermediary stage in the evolution of language, intervening between a non-linguistic stage and the possession of true language. Musical protolanguage at that stage consisted of non-referential song and served as the common ancestor of both language and music.


Theories of language evolution often invoke a hypothetical protolanguage stage, a stage that serves as a precursor and as a scaffold for the evolution of modern language. The idea of such an intervening stage, and the notion that this protolanguage was akin to song and is the common ancestor of both music and language, can be traced back to Darwin. In The Descent of Man and Selection in Relation to Sex (Darwin, 1875), Darwin considers the evolution of language and claims that “[…] primeval man, or rather some early progenitor of man, probably first used his voice in producing true musical cadences, that is in singing, as do some of the gibbon-apes at the present day; and we may conclude from a widely-spread analogy, that this power would have been especially exerted during the courtship of the sexes,—would have expressed various emotions, such as love, jealousy, triumph,—and would have served as a challenge to rivals” (p. 87). Drawing on an analogy to birdsong, which like language needs to be learned and is practiced by the young members of the species, Darwin argues that human vocal imitation was driven by sexual selection. He then traces the emergence of meaningful words to the vocal imitation of natural sounds, other animal vocalizations, or man’s own instinctive vocalizations: “It is, therefore, probable that the imitation of musical cries by articulate sounds may have given rise to words expressive of various complex emotions” (ibid.).

Fitch (2010), in a modern framing of Darwin’s theory, says “prosodic protolanguage was a system with phonological generativity, using a small set of elements to build hierarchical structures. These structures were vocally generated, voluntarily controlled, and learned, sharing core aspects of speech and song” (p. 476). Fitch argues that the musical protolanguage model can explain the design features shared by language and music, such as the use of a vocal/auditory channel, discreteness, productivity, and cultural transmission (for a complete list, see Fitch 2010, p. 469). He further argues that comparative data documenting the evolution of vocal learning in several non-bird species, as well as neuroscientific data showing an overlap in the mechanisms underlying the processing of music and language, provide support for Darwin’s musical protolanguage hypothesis. For example, people with congenital amusia, who are impaired in their ability to process music, also exhibit reduced ability to process emotional speech prosody, suggesting a common acoustic code (Thompson et al. 2012). It should be noted though that overlap in processing mechanisms, while consistent with Darwin’s hypothesis, can be explained by seeing music as dependent on processes that evolved for language or vice versa, and does not necessarily imply a common evolutionary ancestor.

In Fitch’s terms, musical protolanguage can be described as bare phonology, that is a system with generative phonology such that units can be hierarchically combined but devoid of semantics, with some elements of syntax (units that can be combined and hierarchically arranged in phrases). The main problem for the musical protolanguage model is explaining how such “bare phonology” evolved into a meaningful system that combines phonology and semantics.

Modern Versions of the Musical Protolanguage Hypothesis

The idea of a musical protolanguage, a common precursor of language and music, has been revived by modern scholars. Brown’s (2000) musilanguage model provides one such example. Brown traces the emergence of musilanguage to primate emotive referential vocalizations – emotive calls elicited in response to a specific class of objects, for example, the alarm calls of vervet monkeys. According to Brown (2000), in the first protolanguage stage, musilanguage involved discrete-level pitches and the use of pitch for conveying semantic meaning, similar to the way lexical tones can differentiate meanings in tonal languages. The second stage involved combinatorial formation of phrases and expressive phrasing. However, the idea of discrete-level pitches, found in music but not in speech, was later (Brown 2017) replaced with the idea of an imprecise relative pitch system (high vs. low). Another way tones can potentially convey meaning is by capitalizing on cross-modal mappings between pitch and non-auditory properties such as size, brightness, and position (Brown 2017). Indeed, research has shown that such cross-modal mappings underlie some sound symbolic words and can also be exploited for conveying meaning through prosody (see Perniss et al. 2010).

Another model is Mythen’s Hmmmm model (Mithen, 2005), termed so because it sees protolanguage as holistic, manipulative (used for manipulation of emotion and behavior rather than for reference), multi-modal (using movement, such as gesture and dance, as well as sound), musical, and mimetic. Unlike the musilanguage in Brown’s model that consists of discrete-level tones, in Mithen’s Hmmmm model, utterances are holistic, rather than combined out of elements that can be recombined. The evolution of modern language then involved the segmentation of Hmmmm.

Non-music Models of Protolanguage

Other theories accepted the idea of a protolanguage stage but rejected the notion that this protolanguage was musical in nature (for an extensive review of protolanguage models, see Fitch 2010). One such model is Bickerton’s (1990) lexical protolanguage model, according to which protolanguage is a propositionally meaningful system, consisting of a lexicon of meaningful words, but no syntax. Bickerton sees language as primarily representational, rather than communicative. He thus rejects the continuity between language and animal communication systems, focusing instead on the conceptual system as the precursor underlying protolanguage.

Another model is the gestural protolanguage model (see Arbib 2005; Corballis 2002). Although speech can iconically represent meaning through speech sounds or prosody (Perniss et al. 2010), gesture allows far more possibilities for iconicity. Gestural protolanguage is propositionally meaningful, thereby avoiding the difficulty that musical protolanguage faces in explaining the emergence of semantics. The modern discovery of the mirror neuron system suggests a mechanism that enables parity in gesture understanding, which may serve as a precursor to the complex imitation observed in humans and to protolanguage (Arbib 2005). The problem facing gestural theories is explaining the transition from gesture and the manual modality to speech.


The idea of a musical protolanguage, that is a common precursor to both music and language, was originally suggested by Darwin in The descent of men and has since been revived and reformulated by contemporary theorists. Although all models share this idea of a common ancestor, the presumed characteristics of the musical protolanguage differ between theories (e.g., synthetic vs. holistic). New evidence for shared brain mechanisms underlying music and language has been interpreted as supporting theories that posit a common evolutionary origin and led to renewed interest in Darwin’s musical protolanguage model. The transition from musical protolanguage to meaningful speech remains a challenge for the musical protolanguage hypothesis.



  1. Arbib, M. A. (2005). From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28, 105–167.CrossRefGoogle Scholar
  2. Bickerton, D. (1990). Language and species. Chicago: Chicago University Press.CrossRefGoogle Scholar
  3. Brown, S. (2000). The “musilanguage” model of music evolution. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 271–300). Cambridge, MA: MIT Press.Google Scholar
  4. Brown, S. (2017). A joint prosodic origin of language and music. Frontiers in Psychology, 8, 1894.CrossRefGoogle Scholar
  5. Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton: Princeton University Press.Google Scholar
  6. Darwin, C. (1875). The descent of man, and selection in relation to sex (2nd ed.). New York: D. Appleton and Company.Google Scholar
  7. Fitch, W. T. (2010). The evolution of language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  8. Mithen, S. (2005). The singing Neanderthals: The origins of music, language, mind, and body. London: Weldenfeld and Nicolson Ltd.Google Scholar
  9. Perniss, P., Thompson, R., & Vigliocco, G. (2010). Iconicity as a general property of language: Evidence from spoken and signed languages. Frontiers in Psychology, 1, 227.CrossRefGoogle Scholar
  10. Thompson, W. F., Marin, M. M., & Stewart, L. (2012). Reduced sensitivity to emotional prosody in congenital amusia rekindles the musical protolanguage hypothesis. Proceedings of the National Academy of Sciences, 109(46), 19027–19032.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of PsychologyCenter for Academic StudiesOr YehudaIsrael

Section editors and affiliations

  • Guilherme S. Lopes
    • 1
  1. 1.Department of PsychologyOakland UniversityRochesterUSA