1 Introduction

The ability to build complex concepts from simple parts is fundamental to human language. Thus a central part of the cognitive neuroscience of language is to develop a model of where, when and how this type of composition occurs. Neuroimaging experiments have shown that sentence processing engages a large network of primarily left-lateralized, interconnected regions (Binder et al. 2009; Jefferies 2013). One of these regions in particular, the left anterior temporal lobe (LATL), appears to be directly implicated in the basic process of combining words into phrases, such as the combination of nouns with adjectival modifiers (red boat) or verbal predicates (eats meat) (Bemis and Pylkkänen 2011, 2013a; Pylkkänen et al. 2014; Westerlund et al. 2015; Westerlund and Pylkkänen 2014). In order to develop and test hypotheses about combinatory responses in the LATL, recent research has tested representational distinctions arising both from linguistic theory and neuropsychological studies of patients with temporal lobe damage. This work has revealed that LATL responses generalize across various word classes (Westerlund et al. 2015), but are sensitive to the conceptual nature of the combining words. For example, the conceptual specificity of the composing items robustly modulates the combinatory responses in the LATL (Westerlund and Pylkkänen 2014; Zhang and Pylkkänen 2015) and composition such as numeral quantification (e.g., two boats), which arguably does not add any conceptual features to a noun, does not elicit LATL effects at all (Del Prato and Pylkkänen 2014; Blanco-Elorrieta and Pylkkänen 2016). Thus the LATL appears mostly sensitive to the composition of elements whose meanings are in some sense clearly conceptual, as opposed to composition in a more general sense. In other words, the presence of semantic composition alone is not predictive of combinatory effects in the LATL but rather the semantic composition needs to fit a certain a conceptual profile. Given this, the psychological literature on conceptual combination is a potentially a useful cognitive basis for studying the LATL, a connection we explore in this article.

A large psychological literature has carefully investigated the way that meaning is constructed at the conceptual level and developed models of possible combinatory mechanisms (see Murphy 2002 for discussion). Therefore, these models could be useful for generating predictions about the LATL’s combinatory role. In this chapter, we outline two major models of conceptual combination, ‘schema-based’ models and ‘relation-based’ models and discuss how these models may be able to constrain the hypothesis space regarding the role of the LATL in composition.

2 The LATL as a Central Combinatory Region

The LATL’s anatomical location makes it an excellent candidate for a central combinatory region. It is well connected to primary sensory and motor areas, along with their related association cortices (Catani and De Schotten 2008; Gloor 1997), is close to the medial temporal lobe, which supports memory processes, and is functionally connected to several critical language regions, including the left inferior temporal gyrus and the middle temporal gyrus (Hurley et al. 2014).

Several neuroimaging experiments have implicated the LATL in sentence processing above the word level. It is consistently engaged by contrasts between structured, meaningful sentences and word lists or jabberwocky sentences (Bottini et al. 1994; Brennan et al. 2010; Brennan and Pylkkänen 2012; Crinion et al. 2003; Friederici et al. 2000, 2003; Humphries et al. 2001, 2005, 2006, 2007; Mazoyer et al. 1993; Pallier et al. 2011; Rogalsky and Hickok 2009; Scott et al. 2000; Stowe et al. 1998; Vandenberghe et al. 2002; Xu et al. 2005). Most critically for its putative status as a combinatory region, a series of magnetoencephalography (MEG) experiments has shown the LATL to be the region most reliably engaged by the composition of basic adjective-noun phrases (Bemis and Pylkkänen 2011, 2013b; Del Prato and Pylkkänen 2014; Pylkkänen et al. 2014; Westerlund and Pylkkänen 2014). The LATL shows consistent differences in neural activity, peaking at around 250 ms, between adjective-noun phrases (e.g. red boat) and the same nouns in isolation (preceded by an unpronounceable consonant string, e.g. xqk boat) (Bemis and Pylkkänen 2011). Importantly, there is no equivalent increase in activity when nouns are presented in a list (e.g. cup, boat) rather than in a phrase, suggesting that the LATL is specifically engaged by the composition of words into a coherent phrase, rather than by a simple increase in the amount of lexical information present.

Of course, composition is not limited to the combination of adjectives and nouns, but extends across all word classes: verbs and adverbs, verbs and their objects, prepositions and their arguments, to name but a few. Within linguistics, at least one prominent model of semantic composition outlines two broad types of composition: the satisfaction of a predicate’s argument position ( argument saturation) , and the optional modification of a predicate (modification) (Heim and Kratzer 1998). Argument saturation represents a core process of semantic composition. Several word classes, such as verbs, prepositions, and determiners, require arguments in order to exist in well-formed expressions. For example, in the phrase ‘eat meat’, the verb ‘eat’ takes the direct object ‘meat’ as its internal argument. Composing the verb with its argument saturates its argument requirement, allowing it to be interpreted. While argument saturation is crucial to the construction of well-formed sentences, language also contains optional elements that serve to enrich the meaning of a well-formed expression. For example, though ‘I ate meat’ is a perfectly interpretable sentence, the meaning conveyed is greatly changed when meat is modified with an adjective such as ‘spoiled’. This type of optional composition is typically described as modification.

Westerlund et al. (2015) investigated whether LATL combinatory responses generalize across these two composition types by presenting words in either modification or argument saturation contexts (black sweater and eats meat). Both types of composition elicited larger responses in the LATL compared to the same second words presented in isolation. In both cases, the effects peaked at ~250 ms. Furthermore, similar results were found in Arabic, a language with post-nominal adjectives (e.g. sweater black). These results provide evidence that the LATL plays a general role in composition, independent of composition type, language, or word order .

The LATL has also been implicated in the combination of simpler concepts to create more complex concepts at the single-word level (e.g. boy can be represented by the concepts young and male) (Baron and Osherson 2011; Baron et al. 2010), which suggests that the LATL’s role in composition may be at the conceptual level, rather than at the level of semantic composition as traditionally conceived of in linguistics. Specifically, most linguistic theories of semantic composition would not consider the construction of a complex single word concept out of its primary features to be composition in the same sense as the composition of words into phrases (i.e., the semantic complexity of boy would not in most formal semantic theories correspond to structural complexity that required combinatory steps).

Relatedly, neuropsychological investigations of patients with damage to the LATL have shown that their sentence comprehension and production remain mostly intact, particularly if semantic demands are kept low (Cotelli et al. 2007; Gorno-Tempini et al. 2004; Grossman et al. 2005; Hodges et al. 1992; Kapur et al. 1994; Kho et al. 2008; Noppeney et al. 2005; Wilson et al. 2012). Instead, damage to the LATL leads to a disorder called semantic dementia (SD), in which patients suffer a severe, amodal, memory loss for concepts that manifests across a variety of tasks, including picture naming, word-picture matching, delayed-copy picture drawing, and categorization (Gainotti 2006, 2007, 2011; Garrard and Carroll 2006; Garrard and Hodges 2000; Hodges et al. 1992, 1995; Mummery et al. 1999, 2000; Patterson et al. 2006; Rogers et al. 2004; Snowden et al. 1989). One well-documented aspect of the pattern of conceptual memory loss in SD is that more specific concepts are disproportionally affected by LATL degradation. Patients with SD use progressively more general labels and lose the ability to distinguish similar concepts, mistaking, for example, a zebra for a horse.

Westerlund and Pylkkänen (2014) investigated the relationship between conceptual specificity and composition effects in the LATL, with the goal of characterizing whether these two variables modulate the same LATL activity. The conceptual specificity of the noun was varied in adjective-noun phrases (e.g. blue canoe vs. blue boat) with results showing that combinatorial responses in the LATL at ~250 ms were sensitive to the specificity of the noun, with less specific nouns (blue boat) eliciting greater combinatory responses in the LATL than more specific nouns (blue canoe). For a fuller assessment of the position-by-position interplay between single word specificity and composition, a follow-up experiment manipulated the conceptual specificity of both the modifier and head in noun-noun compounds (e.g. tomato vs. vegetable, soup vs. dish) (Zhang and Pylkkänen 2015). More specific modifiers (tomato) elicited the greatest responses in the LATL when composed with a less specific head (dish). In both of these studies, the effects of single word specificity were subtle or even completely absent when composition was not at play, such as in the head word position when no modifier was present (boat vs. canoe) or in the modifier position when the head word had not yet been seen (vegetable _ vs. tomato _). In contrast, the effect of specificity on composition was robust. Thus these data suggest that at least this LATL-localized MEG activity, occurring at ~250 ms, is most strongly driven by the composition of concepts and not by access to already stored representations .

In sum, these combined results show that although LATL composition effects generalize across composition types and word order (Westerlund et al. 2015), they can be significantly diminished if the composition does not result in a substantial boost in the conceptual specificity of the head word: modifying an already specific noun (blue canoe; Westerlund and Pylkkänen 2014) does not elicit a strong combinatory response perhaps because the noun is already quite specific and modifying a noun with a very general modifier also fails to engage the LATL measurably (vegetable dish; Zhang and Pylkkänen 2015). Westerlund and Pylkkänen (2014) argued that these results suggest that the LATL’s role in composition might be directly related to its role in semantic memory .

The pattern of semantic memory loss in the LATL has led researchers to hypothesize that it acts as a ‘semantic hub’ (Lambon Ralph et al. 2010; Patterson et al. 2007; Rogers et al. 2004; Rogers and McClelland 2004; Rogers and Patterson 2007), and Westerlund and Pylkkänen (2014) suggested that the semantic hub model might be extended to account for the LATL’s involvement in semantic composition. The semantic hub model assumes that concepts are represented as sets of features (for example, a dog is brown, has eyes, and barks), and that these features are represented in a distributed manner across the brain in modality-specific areas (color and shape in visual areas, sound in auditory areas, etc.). However, these distributed features alone are not sufficient to account for the abstract amodal representations that humans have of concepts. Though elephants and mice share very little in common visually, we categorize them both as animals based on other overlapping features. Rogers and colleagues argue that these categorization abilities require the existence of a hub, located in the LATL, which stores amodal representations of concepts by coordinating the distributed features. Because more specific concepts share several overlapping features (e.g. all poodles have curly hair, four legs, and floppy ears), it is more difficult to distinguish their representations, which leaves them more vulnerable to damage as the LATL deteriorates. In other words, when presented with two poodles, as opposed to a poodle and an elephant, the hub must work harder to identify the few distinguishing features of each poodle (say, a few inches difference in size) than to identify one of the many differences between poodles and elephants (color, size, sound, location, etc.). Note that the hub model is most plausibly a model of concrete concepts, unlikely to suffice as a general account of the semantic space which obviously includes concepts and combinations such as original idea, lacking any obvious distributed modal features (for discussion, see e.g., Shallice and Cooper 2013).

Proponents of the semantic hub model argue that the pattern of amodal semantic memory loss in SD points to the LATL as the most likely candidate for a semantic hub. However, this model has been criticized on the grounds that the pathology of SD is not always neatly contained in the ATLs (Hodges and Patterson 2007; Brambati et al. 2009), which can make it difficult to assert that the ATLs are the locus of the critical damage (Simmons et al. 2010). Furthermore, in the intact brain, the LATL is not consistently activated across all tasks that require conceptual processing (Simmons et al. 2010). However, this latter fact may be due in part to technical challenges in imaging the ATLs, related to their proximity to the nasal cavities (see Visser et al. 2010 for discussion). The presence of LATL effects in the MEG experiments listed above, which do not suffer from similar imaging challenges, lend some support to this interpretation.

Critics of the semantic hub model have also pointed to the ATL’s engagement in social and emotional processing (Olson et al. 2007; Zahn et al. 2007) and theory of mind tasks (Olson et al. 2007), as well as in the recognition of famous and familiar people (Damasio et al. 2004; Gorno-Tempini et al. 1998; Sergent and Signoret 1992) to argue that the ATL’s role is specific to the social domain (Simmons et al. 2010). Yet other competing models of the LATL propose that LATL plays a role in recognizing unique items, such as familiar people and places (Damasio et al. 1996; Grabowski et al. 2001; Gainotti et al. 2003; Gainotti 2007; Ross and Olson 2012); or that that semantic information is simply grouped by semantic categories, organized along an anterior-posterior gradient across the temporal lobes (Chao et al. 1999; Martin and Chao 2001).

It is of course possible that the LATL plays multiple roles in conceptual processing. Though we discuss the LATL as a single unified region for the purposes of simplicity, the LATL in fact has several anatomical and functional subdivisions (Fan et al. 2014; Rogalsky and Hickok 2009; Sanjuán et al. 2014; Visser et al. 2012; Visser and Lambon Ralph 2011). Therefore, it is difficult to determine whether these models represent competing or complementary interpretations of LATL activity. We believe the semantic hub model to be the simplest model that can reconcile results from SD with results from the sentence processing literature. In fact, when a noun is modified by an adjective , the outcome is a modified conceptual representation, and Westerlund and Pylkkänen (2014) suggested that the semantic hub might also be involved in the construction of that modified representation . This might then account for the observed interaction within the LATL between conceptual specificity and combinatory responses. However, this can only be a rudimentary starting point for characterizing the precise contribution of the LATL, as the semantic hub theories offer no mechanistic model of how complex concepts are composed. Since the richest body of work on this topic can be found under the term “conceptual combination” within cognitive psychology, we turn next to these models in order to evaluate extant LATL data in light of predictions arising from this work.

3 Theories of Conceptual Combination

Within psychology, theories of conceptual combination have focused almost exclusively on the modification of nouns, either by adjectives (e.g. red apple) or by other nouns (e.g. city dog). In modification, the concept being modified is the ‘head’, while the other is the ‘modifier’. Psychological models for how concepts are combined fall into two general camps: ‘schema-based’ models and ‘relation-based’ models. Schema-based models propose that interpretation arises from the features of the individual constituents (Cohen and Murphy 1984; Hampton 1987, 1997 ; Murphy 1988; Rumelhart 1980; Smith et al. 1988; Wisniewski 1997), whereas relation-based models focus on how interpretation arises from the general relationship between constituents (Downing 1977; Gagné 2001; Gagné and Shoben 1997, 2002; Gleitman and Gleitman 1970; Levi 1978).

3.1 Schema-Based Models

Schema-based models start with the assumption that concepts contain sets of features organized into a ‘schema’ (Cohen and Murphy 1984; Rumelhart 1980). Rather than a disorganized list of features (is red, is sweet, is round, etc.), features within a schema are organized into a set of dimensions that are important to that concept. For example, the concept apple might have dimensions for color, shape, and taste, amongst others, but not, say, for speed. Within each dimension, features are weighted according to how typical they are for the concept. In the color dimension, apple might have the possible features green, red and brown as features, and red would be weighted higher than brown.

Within this framework, conceptual combination involves the modification of the head’s schema by a feature of the modifier. Smith et al. (1988) laid out a prominent version of this model, in which each dimension is weighted by how ‘diagnostic’ it is for a concept; in other words, how useful the dimension is for distinguishing it from similar concepts. For example, the taste dimension is useful for distinguishing an apple from other small, round objects, as well as from other fruit. When two concepts are composed, the modifier’s feature is placed into a particular dimension in the head’s schema. In red apple, for example, the feature red is placed into apple’s color dimension. The process of modifying a dimension also makes that dimension more diagnostic for the concept. This captures the intuitive fact that knowing that a concept is red is even more important for identifying a red apple than an apple.

While useful, this model struggles to account for instances of modification in which the modifier is more complex than a simple adjective like red. Murphy (1988) provides the adjective corporate as an example. The dimension that corporate modifies in a composed phrase is dependent on the head noun: a corporate car is one owned by the company (modifying the ownership dimension), a corporate lawyer is one who works for the company (modifying the employment dimension), and corporate stationery has the company’s logo on it (modifying a visual dimension). Furthermore, there can be complex interactions between the various dimensions of a concept when it is modified (Medin and Shoben 1988; Murphy 2002)—for example, the modifier brown does not just modify the color dimension of apple, but also affects our idea of its taste (unpleasant) and texture (mushy).

Murphy (1988, 1990) proposed that these complications can be accounted for by adding world knowledge to the model. Comprehenders can then draw upon their existing knowledge of cars, lawyers and stationery when deciding what dimension is likely to be modified by corporate. Furthermore, after the appropriate dimension has been identified and modified, world knowledge can be used to make further inferences about the concept (Murphy 1990). For example, a comprehender might realize that a brown apple is probably rotten, and use this knowledge to draw conclusions about its likely taste and smell. One possible mechanism for this process is ‘extensional feedback’ (Hampton 1987 )—essentially, particular instances of a composed concept can be retrieved from memory, and those memories can be used to refine the representation of the concept. For example, after reading brown apple, you might remember the last brown apple you had the misfortune of biting into, and you can incorporate aspects of that memory (the bad smell, the soft mush, etc.) into your representation of the concept.

3.2 Relation-Based Models

The central insight behind relation-based models of conceptual combination is the fact that concepts are often combined according to certain patterns. In the phrases glass bottle, pea soup, and leather purse, the modifiers and heads are in a similar relationship; in all three cases, the head is ‘made of’ the modifier. Relation-based models propose that comprehenders make use of these statistical regularities in their language to constrain the process of composition. For example, when a comprehender reads the phrase glass bottle, she can recognize an instance of ‘made of’ relationship, and therefore immediately understand that a glass bottle is a bottle ‘made of’ glass without needing to access the specific features of the particular concepts.

The most prominent relational model of conceptual combination is the RICE (Relational Interpretation Competitive Evaluation) model (Spalding et al. 2010), previously the CARIN (Competition Among Relations in Nominals) model (Gagné and Shoben 1997). This model argues that there is a fixed set of ‘primitive’ relations into which all combinations can be classified (Downing 1977; Levi 1978), and that comprehenders store distributional information about the types of relations in which the words of their language tend to occur. The process of combining concepts therefore consists of retrieving the appropriate relationship between the modifier and head and linking the concepts according to that relation. As an example, the noun mountain, when used as a modifier, occurs much more often in a location relation (e.g. mountain cloud) than in an about relation (e.g. mountain magazine). Gagné and Shoben (1997) showed that participants were quicker at judging the sensicality of phrases in which modifiers were in a more frequent relation to the head (i.e. mountain cloud was easier to interpret than mountain magazine).

Several researchers have argued that the assumption that there is a constrained set of relational ‘primitives’ that are stored separately in the mental lexicon is not necessary for a relation-based model. Using Wordnet (Seco et al. 2004), Devereux and Costello (2005) examined the semantic similarity of the constituent words for all the phrases used in Gagné (2001), and concluded that semantically similar words tend to be used in similar kinds of relations. For instance, gas and propane are lexically very similar, and also tended to be used in combinations with similar kinds of relations (gas crisis, propane shortage). Maguire et al. (2010) confirmed that the type of relations in which constituents tend to occur can be predicted by the semantic nature of the constituents, in a study of all noun-noun combinations in the British National Corpus. Using WordNet, they classified all the constituents in the corpus into 25 different high-level semantic categories, such as ‘substance’, ‘artifact’, or ‘emotion’, and found a strong relationship between the semantic categories of the constituents in the combinations and their relations. They argued that distributional information based on the semantics of the constituents can guide the construction of a relation between them without necessitating the assumption of a separately stored relation.

Importantly, there is evidence that people are sensitive to these category-level statistical patterns. Maguire et al. (2010) showed that participants were able to evaluate a combination like chocolate taste just as quickly as a combination like chocolate rabbit, even though only chocolate rabbit conforms to the highest relation frequency for chocolate (‘made of’). The RICE model would instead have predicted that chocolate rabbit would be interpreted more rapidly, as it is the phrase with the more frequent relation for chocolate. The authors suggested that this contradictory result arose from the fact that substance-attribute combinations like chocolate taste most frequently occur in the ‘has’ relation, which can make up for the specific preference that chocolate has for a ‘made of’ relation like chocolate rabbit.

3.3 Summary

In sum, schema-based models and relation-based models each address different aspects of the composition process. Schema-based models focus on the internal conceptual structure of constituents, and on how this internal structure is modified when the concepts are combined. Relation-based models focus on statistical regularities in how concepts tend to compose, and argue that this information guides the composition process. Importantly, these models are not necessarily mutually exclusive. Maguire et al. (2010) suggest that statistical information could serve a similar function to world knowledge within the schema modification process, by identifying candidate dimensions for modification based on previously encountered similar examples. They provide the example of stone squirrel. If a comprehender matches the phrase to the common combination pattern ‘substance-object’, and knows that this combination is most often used in a ‘made of’ relation, he can then avoid retrieving irrelevant features such as is alive or eats nuts, and instead be guided towards form features like has a tail and has four legs.

4 Processing Predictions of Schema and Relation-Based Models

4.1 Storage and Retrieval

The most important assumption of schema-based models of conceptual combination is that concepts are represented as a collection of features organized into structured schemas. This assumption is also the basis of the semantic hub model of the LATL, in which features are organized in modality specific areas, representing dimensions, and the hub serves to organize all the relevant dimensions into a schema.

Composing concepts requires the selection of the relevant feature of the modifier, as well as the corresponding dimension of the head; therefore, schema-based models also assume that composition first requires the retrieval of the feature representations of both constituent concepts. According to the semantic hub model , these feature representations are mediated in the LATL. Together, these models predict that the LATL is involved in retrieving the feature schema for the modifier and head noun, and that this process is the necessary first stage of conceptual combination.

Relation-based models argue that language users store statistical information about the distribution of concepts in their language across relation types. Maguire et al. (2010) suggest that the relevant distributional information about composed phrases is at the level of general semantic categories, such as plant-plant (e.g. flower bud) and substance-substance (e.g. wax paste), rather than at the level of individual words. Under this type of account, one might then expect certain neural responses to encode combinations of category-level information. However, as regards the LATL, Westerlund and Pylkkänen (2014) and Zhang and Pylkkänen (2015) have shown that LATL combinatorial responses are sensitive to the specificity of the nouns being composed, even when the nouns are in the same general semantic category (for example, blue canoe and blue boat might each be categorized as ‘attribute-object’). Therefore, category-level information alone does not appear to be driving LATL responses, though these results of course do not directly rule out relation-based models. Determining whether the frequency of a given combination affects combinatory responses in the LATL would be a more direct test of the predictions of relation-based models. Furthermore, as proposed by Maguire et al. (2010), it is possible that statistical information is accessed prior to the composition response measured in the LATL, and used to guide feature selection.

More generally, if distributional information does guide the retrieval of features in the LATL, we would expect different representations to be retrieved for the same word, depending on the type of phrase it is presented in. Indeed, there is behavioral evidence that the features that are retrieved for a concept are context-specific (Barclay et al. 1974; Barsalou 1982; Tabossi and Johnson-Laird 1980). For example, Tabossi and Johnson-Laird (1980) presented the same word within two different contexts that each emphasized a different feature of the concept.

  1. (1)

    The goldsmith cut the glass with the diamond

  2. (2)

    The mirror dispersed the light from the diamond

Participants were faster at verifying the truth of the subsequent sentences diamonds are hard or diamonds are brilliant when these were compatible with the feature emphasized by the context. However, there is as of yet no specific evidence that distributional information of the kind posited by Maguire et al. (2010) constrains feature selection in the LATL.

In sum, schema models, which assume that a concept’s feature schema is retrieved prior to composition, have direct theoretical overlap with semantic hub models of the LATL. The idea that distributional information might direct feature retrieval is open to further investigation.

4.2 Composition

4.2.1 What Is the Combinatory Process?

After the features and/or relations of the constituent concepts have been retrieved, a composed concept is created. A composed concept is not simply the knowledge that, for example, cooked pasta is pasta that has been cooked. Above and beyond the features of the constituents, composed concepts have features that are true of the combined representation but not of the individual constituents, or ‘emergent features’. For example, comprehenders know that cooked pasta is soft even though pasta itself could be hard. Thus composition can also result in the ‘deletion’ of features. Understanding how, when, and even whether comprehenders arrive at a complete composed representation is crucial to understanding the combinatory process.

In schema-based models, the core process underlying combination involves the selection of a relevant feature of the modifier and the related dimension of the head. Thus, these models predict that it will be more difficult to select the appropriate features for constituents with more complex internal representations , and that such phrases will therefore be more difficult to compose. There is behavioral evidence that this is in fact the case. Participants were slower to understand novel phrases with noun or non-predicating adjectives modifiers (such as prostitute committee) than novel phrases with predicating adjectives (such as edible food) (Murphy 1990). The speed of interpretation was also affected by the typicality and abstractness of the modifier (inedible food was interpreted more slowly than edible food) (Murphy 1990; van Jaarsveld and Drašković 2003; Xu and Ran 2011). If the LATL is involved in the process of composing concepts, rather than in simply retrieving the relevant representations, we might expect to see greater LATL activity for more complex modifiers. This is consistent with the results of Zhang and Pylkkänen (2015), in which composing more specific modifiers, with more features, elicited greater LATL activity on the head noun.

It is less clear how to reconcile this prediction with results from Westerlund and Pylkkänen (2014) showing that less specific heads elicit greater LATL combinatory responses. Less specific concepts have fewer features and therefore a less complex internal representation; however, it may be more difficult to select the appropriate dimension to modify in a more general concept . For instance, what is a brown animal? If it’s a mammal, this might be an animal with the feature brown in the color-of-fur dimension. Alternatively, it might be color-of-scales if it’s a fish, or color-of-feathers if it’s a bird. A noun denoting a general category of concepts, comprising multiple disparate subcategories of concepts, may not have a single readily accessible dimension that can be modified, in which case it might be more difficult to compose. Of course, this hypothesis makes the prediction that combinatory responses will vary depending on the modifier; a modifier that targets a feature at the more general level might be easier to compose with a more general head (e.g. dead animal).

Schema-based models also assume that a combined concept will inherit most of the features of its constituents. Other than the dimension(s) being modified, the features of the head noun should remain stable in the resulting representation . For example, the representation of a red apple should still have the features of apple (i.e. is sweet, is a fruit, grows on a tree, etc.). Connolly, Fodor , Gleitman, and Gleitman (Gainotti 2007) argue, however, that the feature inheritance assumption is incompatible with a phenomenon they term the ‘modification effect’—the fact that properties that are believed to be true of a concept are judged to be less true of a modified concept. As a typical example, people rate baby ducks have webbed feet as less true than ducks have webbed feet, despite the fact that webbed feet should be inherited from duck into the composed phrase (Connolly et al. 2007). The modification effect also holds for non-word modifiers, which have no semantic content (e.g. brinn bottles are cylindrical) (Gagné and Spalding, 2011).

Spalding and Gagné (2014) also provide intriguing evidence for a ‘reverse modification effect’ , in which noun properties that are evaluated as false (e.g. candles have teeth) are judged as less false when the noun is modified (e.g. purple candles have teeth). Participants appear to be both less comfortable attributing a true property (having webbed feet) and more comfortable attributing false properties (having teeth) to modified nouns.

Gagné and Spalding (2014) argue that the modification and reverse modification effects are more compatible with relation-based models than schema-based models. In the RICE model, they propose that the initial output of the composition process is an underspecified representation, which they term a ‘relational gist’, limited to the relation between the two constituent concepts . A comprehender’s first-pass understanding of a composed concept might thus correspond roughly to the idea that ‘a mountain cloud is a cloud located in the mountains’, or that ‘a corporate car is a car used by a corporation’ (Gagné and Spalding 2014; Spalding and Gagné 2014). In other words, they argue that no features are inherited during the first stage of the composition process. Any further information about the composed concept, such as the fact that mountain clouds are white, fluffy, and block out the sun, is accessed if and only if the context necessitates further interpretation.

The intuition that combined concepts are initially underspecified, and are only fully fleshed out if the context requires it is similar to a prominent model of sentence processing in psycholinguistics, in which comprehenders construct representations that are ‘good enough’ for the task of understanding the meaning of the sentence but do not reflect a complete syntactic and semantic analysis of the sentence (Ferreira et al. 2001, 2002; Ferreira and Patson 2007). However, this idea is not as incompatible with schema-based models as Spalding and Gagné suggest. It is somewhat implausible to assume that every feature of a concept is retrieved and active during the process of composition: in composing the constituents baby and duck, for example, a comprehender would have to activate every single feature that he or she knows about ducks, including the fact that they have eyes, a heart and lungs, lay eggs, breathe, etc. Instead, context and world knowledge can limit feature retrieval. Therefore, if one assumes that in the absence of a supporting context the specific feature having webbed feet is not part of the initial combined concept of baby duck, schema modification models do not strictly conflict with modification and reverse modification effects . Instead , these effects might arise out of a post hoc pragmatic reasoning process which assumes that a speaker is providing exactly the necessary information and no more, leading participants to assume that the modifier plays an important contrastive role in the phrase (Gagné and Spalding 2014; Grice 1975; Hampton et al. 2011; Jönsson and Hampton 2012).

In sum, modification and reverse modification effects themselves are potentially compatible with either model of composition. Instead, the major difference between both models lies in whether the activation of conceptual features is necessary in order for composition to occur. Schema-based models posit that conceptual features are the elements being composed, whereas relation-based models assume that a ‘good-enough’ relational gist, lacking specific featural information, is the initial output of the combinatory process.

The fact that the complexity of a modifier’s schema affects composition in the LATL is therefore most compatible with the predictions of schema models. In the absence of a context requiring the retrieval of specific features, the relational gist hypothesis does not straightforwardly predict that the conceptual specificity of the constituents would affect LATL responses. Schema-based models further predict that combinatory processes will be disrupted by damage to the LATL; for example, it should be very difficult to comprehend a phrase in which the modifier addresses a dimension of a more specific concept , since that concept’s representation will be degraded. On the other hand, if distributional information can be used to construct a relational gist without accessing a concept’s features, patients with SD should not struggle to comprehend composed phrases. In this case, we might expect these ‘gist’ representations to reflect category-level generalizations, with patients constructing similar interpretations of, for example, red apple and red cherry. To the best of our knowledge, patient studies have not yet provided evidence about patients’ combinatory abilities that is fine-grained enough to allow us to verify these predictions.

4.2.2 Timing of Composition

Both schema-based and relation-based models are compatible with the existence of a first-pass, shallow composition process, followed by a more elaborative stage. According to schema-based models, the first stage of the combinatory process involves the retrieval of concept features followed by an elaboration stage aided by world knowledge ; therefore, combinatory activity in the LATL, peaking at 250 ms, could provide an estimate of the timing of this first stage of feature retrieval. On the other hand, the first stage of composition in the RICE model is the construction of a relational gist, and concept features are only accessed, if necessary, in a second stage. Therefore, the most straightforward way to reconcile this model with current evidence about the LATL’s combinatory responses is to either assume that combinatory activity in the LATL represents a second processing stage after a relational gist has been constructed, or that LATL activity is tangential to the composition process.

Furthermore, because the first stage of composition in the RICE model does not involve the retrieval of conceptual features, this makes the prediction that features that are irrelevant to the combined concept will never be activated at any stage of processing. Several behavioral experiments have been conducted to address this prediction by investigating the accessibility of emergent and deleted features at various points during composition. Early evidence suggested that phrasal features were actually accessible faster or at the same time as the features of the individual constituents (Gagné and Murphy 1996; Glucksberg and Estes 2000; Hampton and Springer 1989; Potter and Faulconer 1979). For example, Springer and Murphy (1992) found that subjects were faster at evaluating the truth of a sentence like boiled celery is crispy than boiled celery is green. At first glance, these results are compatible with the predictions of the RICE model .

However, these early experiments measured relatively slow response times (around one second), and therefore may have primarily reflected late-stage reasoning processes. When McElree et al. (2006) used a speed-accuracy tradeoff task to examine the online accessibility of noun and phrasal features, they showed that noun features (water pistols have triggers) were verified more accurately than emergent features (water pistols are harmless) at early stages of processing. Importantly, subjects were also slower to reject deleted features (water pistols are dangerous), suggesting that irrelevant noun features are retrieved during the combinatory process. Furthermore, Swinney et al. (2007) used a cross-modal priming task to confirm that deleted features (peeled banana-yellow) are primed during composition, and more quickly than emergent features (peeled banana-white).

These results suggest that irrelevant noun features (e.g. bananas are yellow) are in fact activated during composition, contradicting the predictions of relation-based models . Furthermore, phrasal features (e.g. peeled bananas are white) emerged relatively rapidly, in the absence of explicit pragmatic demands. This suggests that at least some phrasal features are an automatic outcome of the composition process, though they emerge after noun features have been retrieved.

In sum, behavioral and neurophysiological evidence suggest that constituent features are retrieved very rapidly, at or before 250 ms after the onset of composing word. This is followed by the retrieval of features of the composed phrase, a process that may still be ongoing at around 600 ms (Molinaro et al. 2012). Therefore, current evidence supports a two-stage model of conceptual combination. First, at least some of the features of both constituents are activated. This feature activation may be guided by an existing context, and might also be guided by statistical information about the semantic category of the constituents. Then, the modifier’s feature modifies the representation of the head concept , leading to the emergence of new phrasal features and the suppression of irrelevant noun features. This can be followed by a further stage of explicit reasoning processes, possibly only to the extent that this is necessary in the comprehension context (Hampton 1987 ; Murphy 1988, 1990). Of course, composed phrases are often encountered in the context of a phrase or paragraph, in which case contextual information may be available prior to the start of the composition process.

Though further investigation is necessary to determine precisely how phrasal features are retrieved, and how contextual and world knowledge information guide retrieval, the fact that irrelevant features are retrieved during the composition process is more compatible with schema-based models than with relation-based models, though it remains unclear whether these irrelevant features are retrieved in the LATL.

5 Conclusions and Future Directions

In this chapter, we have laid out the general predictions and assumptions of schema and relation-based models of conceptual combination. Schema-based models argue that composition involves the activation of a both constituents’ feature schemata and the modification of the head’s schema by a feature of the modifier, whereas relation-based models focus on the importance of distributional information in guiding the interpretation of a composed concept.

The fact that LATL combinatory responses take place mostly on the head, after both feature representations can be accessed, and are sensitive to the conceptual specificity of the composing constituents provides preliminary evidence more compatible with the predictions of schema-based models than with the predictions of relation-based models. Schema-based models can be combined with the semantic hub model to predict that the LATL is the locus of feature retrieval and of schema modification , though it is likely that this latter process involves an interplay between the LATL and other higher-order regions, such as the LIFG (Thompson-Schill et al. 1997, 1999) and possibly the angular gyrus (Molinaro et al. 2015; Price et al. 2015), particularly if world knowledge and contextual information can be used to guide the composition process.

Furthermore, schema-based models make the following testable predictions about the LATL as the neural center of composition: (i) that the specific features activated for the composed constituents should vary to some degree depending on the surrounding context and possibly on distributional information about the constituents, (ii) that the difficulty of integrating the modifier’s feature into the head’s schema will affect the amplitude or timing of combinatory responses, (iii) that supporting contextual information should mitigate these effects, and (iv) that phrasal features will be retrieved later in composition. Equipped with these predictions, we can now guide our investigation of the LATL in order to construct a detailed model of its combinatory role and its relationships to other language regions. Of course, we do not have enough evidence to rule out relation-based models entirely, and therefore should further investigate the role of distributional information on LATL responses.

In this review, we have focused exclusively on theories of modification, restricted to the composition of adjective-noun and noun-noun phrases. In light of evidence that the LATL shows similar combinatory responses across composition types (Westerlund et al. 2015), it is important to determine whether we can extend current psychological models of composition to include other composition types.

While this challenge is beyond the scope of this chapter, we do note that Bornkessel-Schlesewsky and Schlesewsky (2013) have advanced a general compositional model in which all words have an ‘actor-event schema’, which focuses on actors and actions. For example, an actor-event schema for paint might include the typical actor (humans) as well as the typical actions performed around it. This idea is similar in intent to an idea put forth by Wisniewski (1997) that concept schemas should include ‘scenarios’ corresponding to verbs describing actions or events relevant to the concept. These schemata are then composed in much that same manner as modified phrases, with the head’s schema being altered by the information it is composed with. For example, in a phrase like the doctor paints, the actor dimension would be filled with the subject doctor. Bornkessel-Schlesewsky and Schlesewsky (2013) propose that these general schemas are combined in the LATL, and therefore provide a possible first step towards investigating the composition of other types of concepts than simply adjectives and nouns .