Field of Linguistics, The
KeywordsSign Language Machine Translation Lexical Decision Task Human Language Meaningful Unit
The scientific study of human language(s) and/or the human language faculty.
“Asking a linguist how many languages they speak is like asking a biologist how many pets they have.” -Unknown
Comparing biologists with linguists illustrates one of the more common misunderstandings about linguistics: although language is the object of study in linguistics (like living things are the object of study in biology), the relationship between linguists and language is most often one of observation rather than ownership or fluency. Using a language and being able to describe how a language works are two very different things: for example, if you’re reading this, you are presumably fluent in English. But, unless you’ve studied linguistics before, it’s unlikely that you can say whether the ‘p’ at the end of the word stop is aspirated or not. (For interested readers, in the case of the word stop, the p is not aspirated; say pin and stop with your hand directly in front of your mouth. When you say pin, there will be a puff of air – this is aspiration, which is absent in stop.)
This highlights another major misconception about linguists and linguistics: the vast majority of linguists are concerned with observation and description, not with policing and correction. Linguists want to know why languages work the way they do, revealing why language in general works how it does, and illuminating new facts about the human mind in the process. Grammatical ‘errors’ (like stranding a preposition at the end of a sentence) are the sorts of phenomena that linguists are especially fond of. The types of errors we make in speaking or writing can be revelatory about how we process language in the brain (Harley 2013) and can even indicate shifts in the way a particular language works. Speakers of Middle English would find our Modern English “ungrammatical” to say the least – it might even be incomprehensible in some cases. However, this doesn’t mean modern English is wrong, only that it has changed over time (and space, e.g., British vs. American English). As a culturally transmitted phenomenon, languages are constantly changing and evolving at a rapid rate.
The main focus of this entry will be descriptive linguistics, wherein linguists are concerned with understanding and describing how language is, a feat traditionally accomplished by field work, wherein linguists make a specific effort to document a language in situ in the culture where it is spoken. This can be contrasted with prescriptive linguistics, which proscribes how particular parts of a specific language should be. Having prescriptive rules in a language is necessary particularly for teaching and writing contexts, and very often, descriptive work on a particular language will eventually inform prescriptive rules that are taught to learners or writers.
However, the vast majority of linguists engage in descriptive work of some kind, and it is particularly this type of work that is usually more revelatory about human evolution and evolutionary psychology. If linguists approach the study of language with a prescriptive slant, they risk overlooking important and potentially revealing phenomena. This means that while we consider a language to be a cohesive unit, at least conceptually, linguists not only acknowledge but are particularly interested in linguistic variation both within and across speakers.
This entry will start with an operational definition of what language is. Although humans use it almost constantly – and there is a sense that language is somehow different from other animal communication systems – there is a specific set of features which set language apart. After defining the object of study in more detail, the entry will delve into a more technical breakdown of how linguistics is studied.
Like other sciences such as biology, psychology, and physics, linguistics is fundamentally diverse and includes several subfields which illuminate different aspects of human language and cognition. After an introduction to the different facets of linguistics, the entry will conclude with a brief overview of the major ongoing debates and divisions in the field, many of which mirror larger issues being grappled with in other areas of cognitive and social science. Understanding these broad divisions the field is crucial to understanding how it interleaves and interacts with evolutionary psychology and cognitive science more generally.
What Is Language?
Human language is a complex phenomenon, a fact often lost given how easily children learn languages, and how effortlessly we use language day to day both for communication and thought. Although the question “what is language?” may seem at first glance to be a trivial one (it’s what you’re reading right now, it’s what we speak every day, etc.), a formal definition of language is more difficult than it seems. There are currently estimated to be between six and seven thousand different languages spoken on the planet, and they exhibit considerable variation (Dryer and Haspelmath 2011). In many respects, there is still debate about what language is, what language isn’t, and what it is for. This section will give an overview of a well-accepted operational definition of language and provide more detail about how language is organized and studied.
Design Features of Language
In 1960, the linguist Charles Hockett identified 13 design features of language. Originally, the first feature was that languages occur in the vocal-auditory channel; at the time, there was a paltry understanding of the richness of sign languages which use a gestural-visual channel, as well as little acknowledgment of the role of the gestural channel even in spoken languages (McNeill 2000; Tomasello 2009). Given this updated understanding, this feature is now often omitted. To be concise, this entry will generally refer to speech and hearing, but unless otherwise noted, anything that applies to a speech signal being sent/uttered or heard/received in a spoken language context also applies to a linguistic gesture being signed/uttered or seen/received in a sign language context. The remaining 12 features identified by Hockett (1960) have stood the test of time and provide a useful operational definition of language by delineating what language is (and is not) in terms of what it can do and how it does it.
Language has (i) broadcast transmission with directional reception, meaning that while a speech signal can be heard by anyone within auditory range, the speaker (i.e., the direction from which the utterance came) is readily identifiable. Language is also (ii) rapid fading, meaning that speech sounds can only be perceived as they are made and are by definition impermanent (a feature shared by many other animal communication systems).
Writing systems naturally change this dynamic by providing a visual analogue of spoken language which lasts much longer, potentially thousands of years. However, particularly in an evolutionary context, writing systems tend not to be a central focus, since they are young relative to spoken language. While the oldest writing systems are estimated to be perhaps 5–6,000 years old, human language itself is much older, with estimates ranging from at least 100,000 years old to millions of years. Nonetheless, written language interacts with spoken language where there is a written form (this will be further discussed in detail in the “Tools and Methodologies” section).
Language is also fundamentally (iii) interchangeable – that is, speakers and hearers can uniformly use the language in the same way. This is a property not shared by many animal communication systems, which may exhibit cross-species communication where sender and receiver cannot articulate the same signals, or where conspecifics exhibit marked sex differences (e.g., in many bird species, males have complex songs, while females do not). Another feature of language is (iv) feedback, that is, a speaker can hear themselves and control and modify their own signal as they produce it. Linguistic signals are also (v) intentional, that is, they are intended consciously by the speaker to be communicative, unlike, for example, how a limp in a prey animal would unintentionally communicate weakness to a predator.
Language is simultaneously (vi) semantic and (vii) arbitrary. Language conveys specific meaning or semanticity, but the form of a linguistic signal has an arbitrary relationship with its meaning. In other words, while English speakers share a convention that the word rose represents a specific type of flower, there is nothing about the sounds in the word rose that point to that meaning beyond this shared convention – in other words, a rose, by any other name, would still smell as sweet. Although some words in language are nonarbitrary – for example, in an onomatopoeic word like hiss the sound of the word is similar to and representative of the sound a snake makes – the vast majority of linguistic symbols are arbitrary.
Language also exhibits (viii) discreteness and (ix) duality of patterning – language is discrete in that it is made up of distinct units which combine and recombine in rule governed ways. Duality of patterning means that these discrete units occur at two levels: individual sounds – which are themselves meaningless – form together to make meaningful units, which can in turn combine again to make meaningful utterances.
These properties contribute to another feature, (x) productivity. Productivity refers to the fact that the rule-governed recombination of meaningless and meaningful units gives rise to the ability for speakers to create and interpret novel utterances (this feature is sometimes also known as generativity). Duality of patterning – and thus, the level of productivity present in human language – is arguably unique to human language and has not been definitively shown in any other animal communication system. However, relatively little is known about dolphin communication, and the possibility remains that it (or other understudied animal communication systems) exhibits one or both of these features to some extent.
Language also enables displacement, or the ability to talk about and refer to things that are not necessarily present, concrete, or even true. While other animal communication systems might have both arbitrariness and semanticity (e.g., macaque warning calls to signal the presence of different predators), which should in theory allow for displacement, in practice, this type of displaced reference is thought to be unique to human language. Although instances of deception have been documented in other primates, these are relatively rare (Whiten and Byrne 1900). Using language, we can talk about language itself (a property known as (xi) reflexiveness), and unlike many other forms of animal communication, language is not constrained to honesty: although one could easily exaggerate their personal athleticism using words, the signal an animal would use to display physical fitness is often the physical fitness itself, and so exaggeration is virtually impossible (see Zahavi and Zahavi 1990).
Finally, language is crucially (xii) learnable and (xiii) culturally transmitted. In order to persist over generations, language has to be both learnable and usable – for example, although binary code can be used to convey complex messages, and the use of only two minimal units (1 and 0) is arguably highly efficient, children would have a very difficult time learning a rich expressive system in binary, and so language structure takes a very different form. It is also crucial that language learners acquire their language from other speakers, and other learners will in turn acquire from their production. Many researchers argue that many of the properties we observe in language are the result of biases in learners which are amplified by cultural transmission (Chater and Christiansen 2008).
Given these design features, it is generally accepted that no other known natural communication system can do what language does. While many animal communication systems may share some features with human language, no known system combines all of these features in the way that human language does.
Levels of Language
Phonetics and Phonology
Phonetics and phonology both involve the sounds of language, but the distinction between the two areas is important. While phonetics studies speech sounds that humans can produce in language generally, phonology deals with the rule-governed, patterned organization of speech sounds in a particular language. Phonetics deals with the acoustic properties of human speech, while phonology deals with sound patterns in specific languages. As an illustration, while aspiration is a phonetic property which occurs in many languages, the phonology of aspiration is different in different languages (e.g., in English, p is aspirated depending on where it is in a word, whereas in Thai, p is aspirated depending on what the word means).
A description of a language’s phonology will include an inventory of its phonemes, defined as the sounds in a language that are contrastive with respect to meaning. Although sign languages have no acoustic component, they do have what are considered to be phonologies, only the “phonemes” are visual features such as hand shape, position, and facial expression (Brentari 2011). Although the human vocal tract can make hundreds of distinctive sounds (and human hands can make hundreds of distinct gestures), individual languages use these available units as phonemes to varying extents – the language with the most phonemes is !Xũ (spoken in southern Africa, with 141 phonemes), while the language with the least is Rotokas (spoken in Papua New Guinea, with only 11 phonemes; Crystal 2010).
A language can make use of a particular sound, or phone, without it necessarily being classed as a phoneme. To return to the aspirated version, while both p and ph occur in both English and Thai, these are only two separate phonemes in Thai, where the presence or absence of aspiration can change the meaning. On the other hand, one could aspirate the p in the English word stop, and it might sound strange or emphasized, but it wouldn’t be a different word. This means that in English, p and ph are allophones, defined as variants of a phoneme which change slightly depending on context.
In order to describe phonology with maximum accuracy, linguists employ the International Phonetic Alphabet (IPA), which uses hundreds of symbols to describe the sounds found across human languages. This tool is crucial for an accurate description of a language’s phonology – for example, the Roman alphabet has only 26 letters, but English itself has about 44 phonemes (slightly more or less depending on the variety of English being described). Beyond this, the IPA allows researchers to describe and compare the phonologies of disparate languages in a systematic way, such that direct comparisons can be made between genetically distant languages with different writing systems like English and Thai.
What sounds are relevant in a language is equally as important as how those sounds combine. The rules that govern sound combinations in a language are known as phonotactics. While a language might include, for example, 30 phonemes, these sounds are not generally combined in unrestricted ways. For example, while the sound -ng (the sound at the end of the word interesting, represented in the IPA as ŋ) occurs frequently in English, it never occurs at the beginnings of words. This may seem like a physical constraint to a native English speaker (i.e., they have difficulty pronouncing ŋ without a vowel before it), but this sound does occur word-initially in many of the world’s languages, so this is simply a phonotactic quirk of English.
While in phonology phonemes provide contrasts in meaning, by definition they do not encode meaning. That is, although pat and bat mean different things, and differ only in the sounds p and b, the sounds in and of themselves mean nothing. This changes at the level of morphology, where a morpheme is defined as the smallest unit of meaning in language. Phonology and morphology interact considerably: first, morphemes are necessarily governed by a language’s phonological and phonotactic rules. Second, phonemes are the minimal units which comprise language and recombine to form the minimal meaningful units represented by morphemes – this property is often known as combinatoriality, which forms one of the two parts of the crucial duality of patterning identified by Hockett (1960).
Morphemes are often entire words, like pat and bat, known as free morphemes. Morphemes can also be bound, meaning that they contain semantic information, but only occur bound to other morphemes. For example, the morpheme un- contains meaning in that it can negate an adjective (e.g., unbearable, uninviting), but it cannot stand on its own as a word-unit – it must be attached to what is known as a root (in the previous example, bearable and inviting) via affixation.
Bound morphemes also come in two general types: inflectional and derivational. Inflectional morphemes alter meaning (un-like) but cannot change the category of a word (i.e., bearable and unbearable are both adjectives). Derivational morphemes, on the other hand, do alter category – for example, the bound morpheme -able transforms the verb bear into the adjective bearable, and can be similarly productively applied to make adjectival forms of other verbs (e.g., share → shareable).
While these examples all come from English for ease of understanding, these morphemic categories can be used to describe any language, although different languages use different morphological processes to different degrees. Languages can be classified according to their predominant morphological strategies along two main continua: the synthetic-analytic continuum and the fusional-agglutinative continuum. In analytic languages, most words are a single morpheme (i.e., bound morphemes are rare, and words tend to be uninflected). On the other hand, highly synthetic languages have single words which may contain many morphemes. Among languages with some degree of synthesis, inflection can be either predominantly fusional or agglutinative. In fusional languages, bound morphemes are indistinguishable from the roots to which they attach and may involve subtle changes in stress or tone. In agglutinative languages, bound morphemes are well delineated and readily identifiable and are generally attached to roots via affixation.
Morphology interfaces to a large extent with syntax, roughly defined as the rules governing word order in language. This interface is comprehensive enough that some research focuses specifically on the intersection between morphology and syntax, known as morphosyntax. The morphological properties exhibited by a language often give hints as to some syntactic properties of the language. For example, languages which are highly synthetic are more likely to have some flexibility in syntactic rules. Since in a synthetic language properties of a word’s component parts provide grammatical information, word order is often varied or may be used to convey nongrammatical information. For example, in Italian, verb affixation encodes the subject (e.g., io cammino [I walk] vs. tu cammin i [you walk]), such that speakers can simply say cammino and drop the subject all together, or invert the order (cammino io) to emphasize the subject. In English, on the other hand, subject drop is impossible. Without the subject pronoun, substantial semantic information is lost – the obligatory subject inclusion and fixed subject-verb order has too heavy of a functional load to be optional.
Beyond its interaction with morphology, syntax is a large independent field within linguistics. Although the roots of linguistics generally grow out of anthropology, the field underwent a formative shift in the late 1950s. This shift was spearheaded by the publication of Noam Chomsky’s book Syntactic Structures (1957, 2002), which drew more heavily on mathematical formalisms than traditional descriptive field work. Chomsky’s work focused on creating a generative grammar of a language given some minimal sample of it. In other words, he sought to formalize a minimal set of rules from which all valid utterances of a language could be generated.
This influential work had the effect of making an understanding of syntax central to an understanding of language, and thus, central to modern linguistics. Syntax is sometimes identified as the feature of language which makes it unique to humans (Hauser et al. 2002), and, coupled with morphosyntax, as the primary driver of linguistic productivity – the other half of duality of patterning (meaningful subparts coming together to form larger meaningful wholes). The syntactic structure of language means that a finite set of rules can generate an infinite set of valid utterances, and generally underlies a form of hierarchical structure which makes language particularly productive: recursion.
Recursion is the process by which a particular type of phrase can include an instance of itself, meaning that it can build upon itself indefinitely. As a visual example, think of two mirrors facing each other, resulting in an image of mirrors inside of mirrors inside of mirrors, ad infinitum. In broad terms, recursion means that you can have sentences inside other sentences. This means that in theory, a single sentence could have infinite length (e.g., John knows that [Mary knows that [I know that [David knows...]]]), although in practice memory and processing limitations generally place constraints on this. Some research has even tied this recursive ability to the ability to recursively represent the mental states of others (also known as theory of mind) (Corballis 2014), a cognitive ability that is relatively rare in the animal kingdom (Hauser et al. 2002). While there is considerable syntactic variation among the world’s languages, most all languages make use of recursion, and where this hasn’t been definitively shown (most notably in Pirahã; Everett 2013), all humans seem to have the capacity for recursion, a feature not definitively demonstrated even in unusually sophisticated language trained apes (Hess 2008).
Although syntax is productive, and makes language well suited to conveying complex ideas and relations, language is also fundamentally defined by the fact that it can reference meaning (i.e., the semanticity identified by Hockett). Syntax is to some extent divorced from meaning, known in linguistics as semantics, and there are two main pieces of evidence for this.
First, there are animals whose communication systems exhibit many features of syntax, but the systems do not engage in referential meaning. For example, both birdsong and whale song have been found to have repeating elements and features of hierarchical structure (Cate and Okanoya 2012), but the meaning these carry is broad and unchanging (something along the lines of, “mate with me”). Meaning in human language, however, is incredibly nuanced, expressive, and constantly changing, to the point where it can express entirely new ideas with ease. Although some rudimentary form of structure seems to exist in some animal communication without rich semantic reference, semantic productivity is fundamentally tied to syntactic and morphological processes. For example, morphological processes mean that in the space of a couple of decades speakers can go from barely having a shared concept of [search engine], to having nouns (google), verbs (to google), and adjectives (googleable) growing out of this concept.
The second piece of evidence for a divide between syntax and semantics comes from human language itself. First, sentences can be syntactically well formed but semantically vacuous or nonsensical. Chomsky’s famous example was the sentence “Colorless green ideas sleep furiously” (Chomsky 2002). In terms of a grammar, this sentence is perfectly well formed, but semantically, the sentence is at best nonsense and at worst downright confusing. Second, brain damage as a result of stroke or injury can selectively effect syntactic or semantic processing. Language disorders known as aphasias (from the Greek a- “without” and phasis “speech”) can be divided into Broca’s aphasia, which seems to predominantly effect sequential processing (resulting in halted speech), and Wernicke’s aphasia, wherein patients display speech often syntactically fluid but deficient in terms of content words (e.g., nouns, verbs) (Harley 2013).
As with the other levels of language, languages vary semantically, although this is generally less straightforward to quantify than phonological, morphological, or syntactic variation. Researchers can take text from a language and make very good inferences about its general syntactic structure with little semantic information, but gathering detailed semantic information requires considerable effort and, ideally, access to native speakers. Broadly, semantics is studied in from a variety of perspectives, with approaches ranging from formal logic with a root in philosophy to approaches from psychology which consider meaning from a more cognitive perspective. Prior to Chomsky’s syntactic revolution, two prominent figures in linguistics, Edward Sapir and Benjamin Whorf, went so far as to suggest that linguistic meaning could in fact shape cognition itself, arguing that the language one speaks can constrain thought. For example, while a language like English makes a distinction between the concepts of on and in, a language like Korean does not (using a single word to express both). Sapir and Whorf argued that this kind of difference could have meaningful consequences in terms of cognition. Although a strong version of this hypothesis (known as linguistic determinism) is now generally rejected, there is some evidence for an interplay between concepts in language and concepts in cognition (Everett 2013).
Semantics deals with meaning which is encoded within linguistic signals themselves; roughly, how we know that the sequence of sounds in rose generally means a particular type of flower. Pragmatics, on the other hand, deals with meaning that is not necessarily conventionally encoded in the linguistic signal itself, but arises from communicative context and common ground. Communicative context is roughly defined as the context in which an utterance is made, for example, while the (spoken) utterance “that’s my rose” is a perfectly well formed sentence both syntactically and semantically, what it actually means is highly ambiguous out of context. What “that” refers to and what is intended by rose (an actual flower, a ceiling rose, a person named Rose, etc.) is impossible to resolve without access to the communicative context. Common ground is a specific kind of context that comes from having a history of interactions (e.g., between close friends) or shared group status: for example, if a hearer was already aware that the speaker knew someone named Rose, it would resolve much of the ambiguity in the utterance “that’s my rose”.
Pragmatics can be challenging to study, since it is in essence the study of how we say things without ever actually saying them – for example, it is trivial for someone to understand the declaration “I’m cold” to actually mean “close the window,” even though the speaker never said anything about windows or closing. Studies in pragmatics aim to illuminate how context resolves ambiguities and potentially even imbues meaning barely hinted at in the linguistic signal itself, how common ground is built, and the rules that govern linguistic interactions (as opposed to linguistic signals). Pragmatic rules vary greatly from culture to culture, as well as between subcultures and contexts. For example, in the context of press conferences surrounding North Korean nuclear negotiations, American officials were more likely to offer outright refusals to answer a question, while Chinese officials were more likely to avoid a question or provide a circumspect answer (Jiang 2006). Outside this context, avoidance might be interpreted as a genuine lack of knowledge, but contextualized in a culture where this strategy is commonplace, avoidance pragmatically communicates a desire not to answer the question, without being aggressive or threatening to the questioner.
Pragmatics necessarily interfaces with sociology, anthropology, and psychology in specific ways while also interacting heavily with all other levels of language. Theory of mind and mind reading are central to pragmatics: without the ability to hypothesize about the internal states of others, it becomes difficult to imagine how we would draw meaning from context or common ground, particularly where most of the meaning is drawn from these factors (e.g., as with using “I’m cold” to actually mean “close the window”). In terms of other levels of language, pragmatic considerations can govern variation at the phonological, morphological, and syntactic level. For example, a phonetic phenomenon known as “g-dropping,” wherein a speaker says “goin” or “drawin” (in lieu of “going” or “drawing”) is often conditioned by context, such that it is more likely in informal contexts (Labov 2001). Likewise, word choice (e.g., “died” vs. “passed away”) or sentence structure is often pragmatically conditioned, even within a single speaker.
Tools, Methodologies, and Subfields
Tools and Methodologies
Language is most usually studied at least partially through the lens of the areas of linguistics identified above, but how this study is approached can vary greatly. Methodologically, approaches to the study of language usually fall into one of three categories: naturalistic data, experiments, and computational models.
A linguistic corpus (plural corpora from the Latin for “body”) is a body of text which can be used as data to study language. Corpora can be used to study phenomena at almost every level of language – there are even pragmatically annotated corpora – and they have become increasingly essential in linguistic research in the last few decades. While text sources have long provided a useful source of data for linguists, until relatively recently, their reach was necessarily limited. Corpora were generally small and not randomly sampled – for example, a collection of published fiction can reveal a lot about style in fiction and popular themes, but it does not provide a good representative sample of a language.
In the advent of the Internet age, corpora have become much more widely available as well as more comprehensive – the freely available Google Books corpus covers over 150 billion word tokens in English and also has (usually slightly smaller) corpora in several other written languages (Linell 2004). This means that linguists have been able to gain new insights into many properties of language, such as linguistic change occurring over hundreds of years (Cuskley et al. 2014) and language varieties diverging as they disperse across the globe (Michel et al. 2011). Likewise, other fields such as complex systems science and physics have begun to overlap and interact with linguistics due to the ready availability of a great deal of linguistic data which can shed light on dynamic social and cultural dynamics more generally.
While there are available corpora of spoken natural language, such as the CHILDES Corpus of child language (MacWhinney 2000) and the Switchboard Corpus (Godfrey and Holliman 1993), available language corpora are overwhelmingly drawn from written language. Due to its permanence, written language provides a rich source of data for linguists to study, but caution is required when it comes to general conclusions drawn from most corpora. First, while written language is a mirror of its spoken counterpart to some extent, some results gleaned entirely from written corpora might only apply to written language.
More problematically, many general conclusions drawn about language are heavily skewed towards languages which have a written form, meaning interesting features which may occur in languages which are exclusively spoken are overlooked or difficult to even discover in the first place. This forms part of a general bias towards making assumptions and conclusions about human language generally from a relatively confined and homogenous set of European languages (Linell 2004). Even where spoken corpora are used, these are overwhelmingly drawn from languages which have a written form. Thus, there is still a very important place in linguistics for traditional fieldwork, particularly to gain knowledge about languages spoken by smaller populations which may not have written forms.
Experimental approaches in linguistics are relatively new to the field – before the latter half of the twentieth century approaches were predominantly anthropological or theoretical. The experimental methodologies used in linguistics are often borrowed from experimental psychology. Common experiments with high relevance for linguistics are Artificial Language Learning (ALL) experiments, lexical decision experiments, and grammaticality judgments.
In ALL experiments, participants learn miniature artificial languages. How quickly and accurately they learn these languages, and what kinds of errors they make in learning, are often revealing about how we process language. Lexical decision tasks involve participants making split-second decisions about words, often about their meaning in a categorical sense (e.g., is the word “spaghetti” a noun or a verb). These experiments can reveal a lot about how the levels of language interact in processing, for example, semantic information might interfere with making syntactic category judgments (such that “destruction” would be more difficult to identify as a noun than “spaghetti”). Grammaticality judgments involve asking participants to rate the acceptability of different sentences, in order to allow for a comprehensive description of the generative rules which govern a language. Aggregate grammaticality judgments over many participants allow researchers to describe in greater detail the rules that underlie, for example, why the girl rode the bike is acceptable, but *the girl looked the bike is not.
Various other experimental methods are employed in linguistics, and many of them are combined with other experimental approaches, such as having a participant complete a lexical decision task inside an fMRI machine to better understand how language processing works in the brain. However, as with corpus linguistics, some caution is warranted given the fact that participants in studies are not only overwhelmingly Western, literate, and often specifically European but, like in psychology generally, tend to be a specific subset of this already confined sample (often university undergraduates; Henrich et al. 2010).
Computational models are also an invaluable resource in linguistics. Although there is a necessary level of abstraction in computational modeling, the approach also allows for considerable control, and in many cases a view of language processing and/or transmission which is not language specific. At the individual level, computational modeling allows for insights into how we process language. For example, by trying to teach a simple network to accurately learn the English past-tense given minimal input data, we can gain insights into how our own brains might solve this problem (Pinker and Ullman 2002). Computational models also offer the opportunity to examine large-scale population level phenomena by using populations of artificial agents. This allows for crucial insights into how language is used across large groups and transmitted between learners, providing a valuable complementary resource to the smaller-scale data provided by experimental approaches.
Computational approaches are essential for more practical applications in linguistics, such as machine translation and advances in artificial intelligence. However, despite considerable progress in both of these areas in recent decades, artificial intelligence is far from having comprehensive linguistic competence. In terms of machine translation, while there is unprecedented accuracy at the level of individual words, the error rate in automatic translation of longer texts is considerable, and this is to say nothing of machines interpreting much noisier natural speech. While the ability to do this has improved considerably in recent decades, there is still a noticeable level of error particularly with different language varieties (e.g., many speech recognition systems do not handle non-standard accents well). The area where a failure of AI to meet human performance is most noticeable is arguably in pragmatics. Although an AI might be able to interpret and answer a single question (e.g., like Apple’s Siri © or Amazon’s Alexa ©), and produce fairly realistic-sounding speech, realistic sustained conversation is an open problem. While there is clearly room for improvement in all of these areas, more advances have been made in the last decade than in the previous century.
While naturalistic data, experiments, and computational approaches arguably form the methodological core of linguistics, other approaches provide valuable contributions to the field, which, like any other field, is constantly changing and growing. Comparative approaches which aim to describe the communication systems of other species can provide valuable insights into human language, especially attempts to teach language-like systems to nonhuman animals. Formal mathematical approaches have been revolutionary in syntax, and information theory and complex systems science have made promising inroads into formalizing broad properties of language.
The previous sections provided a concrete operational definition of what language is, identified the different levels at which it is structured, and briefly outlined how the study of language is approached methodologically. While basic facts about language are broadly agreed upon in the field, there are of course divisions within the field and open questions. Researchers in the Chomskian tradition are proponents of Universal Grammar (UG), which proposes that some minimal aspects of language are innately (or even genetically) encoded. However, many functional or cognitive linguists argue that learning plays a crucial role not just in the individual development of language but in shaping the structure of language itself over time (Chater and Christiansen 2008). Very broadly, this debate mirrors the “nature-nurture” question at the heart of many problems in cognitive science.
A related open question concerns the domain specificity of language. Proponents of UG lean towards the notion that the processes involved in language are specific to language and specifically evolved for language. On the other hand, cognitive or functional linguists argue that more domain general constraints – such as memory or a general preference for simplicity over unnecessary complexity – govern and shape language.
Nonetheless, researchers broadly agree that language is defined by some essential features, the particular combination of which makes language unique among other communication systems. Particularly, the discreteness, duality of patterning, arbitrariness, and productivity of language allow it to engage in displacement. This means that unlike many other communication systems, language is theoretically an infinite system (particularly due to recursion) and can be readily used to engage in communicating things that are neither present nor true. Language is formally divided into minimal meaningless units (phonemes), which combine to create minimally meaningful units (morphemes), which in turn recombine in syntactically rule-governed ways to create an endless array of meaningful utterances. The areas of phonology, morphology, and syntax study these properties in different languages, while the areas of semantics and pragmatics concern how meaning is encoded both within linguistic signals and in linguistic interactions. There are many ways to approach the study of language, but most often any given approach can be categorized as either leveraging naturalistic (often corpus-based) data, using controlled experiments, or targeted computational modeling.
- Cate, C. ten., & Okanoya, K. (2012). Revisiting the syntactic abilities of non-human animals: Natural vocalizations and artificial grammar learning. Philosophical Transactions of the Royal Society of London B, 367(1598), 1984-1994.Google Scholar
- Chomsky, N. (1957). Syntactic structures. Walter de Gruyter.Google Scholar
- Crystal, D. (2010). The Cambridge encyclopedia of language (2). Cambridge, UK: Cambridge University Press.Google Scholar
- Dryer, M. S., & Haspelmath, M. E. (2011). The world atlas of language structures online. Retrieved 13 May 2016, from http://wals.info
- Godfrey, J., & Holliman, E. (1993). Switchboard-1 release 2 ldf97s62. Philadelphia linguistic data consortium. Retrieved from https://catalog.ldc.upenn.edu/LDC97S62
- Harley, T. (2013). The psychology of language: From data to theory. Hove: Psychology Press.Google Scholar
- Hess, E. (2008). Nim chimpsky: The chimp who would be human. New York: Bantam.Google Scholar
- Labov, W. (2001). Principles of linguistic change: Social factors. Oxford: Blackwell.Google Scholar
- Linell, P. (2004). The written language bias in linguistics: Its nature, origins, and transformations. London: Routledge.Google Scholar
- MacWhinney, B. (2000). The CHILDES project: The database. Hove: Psychology Press.Google Scholar
- O’Grady, W.D., Dobrovolsky, M., & Katamba, F. (1996). Contemporary Linguistics. Longman.Google Scholar
- Tomasello, M. (2009). The cultural origins of human cognition. Cambridge: Harvard University Press.Google Scholar
- Zahavi, A., & Zahavi, A. (1990). The handicap principle: A missing piece of Darwin’s puzzle. Oxford: Oxford University Press.Google Scholar