A dyadic brain model of ape gestural learning, production and representation
- 128 Downloads
It has been argued that variation in gesture usage among apes is influenced either by differential sampling of an innate ‘gesture space’ (Hobaiter and Byrne in Anim Cogn 14:745–767, 2011) or through the ‘mutual shaping of behavior’ (Halina et al. in Anim Cogn 16(4):653–666, 2013) referred to as ontogenetic ritualization. In either case, learning must play some role in how individuals come to use particular gestures—either through reinforcement within the set of innately specified gestures, or through the ritualization of some action following periods of direct interaction between pairs of individuals. Building on a prior computational model detailing learning during ontogenetic ritualization (Arbib et al. in Philos Trans R Soc Lond B Biol Sci 369(1644):20130414, 2014, https://doi.org/10.1098/rstb.2013.0414), we here present a single integrative dyadic brain model (simulating selected brain and body dynamics of two interacting apes) that can account for many observed gestural patterns, while additionally showing that both of the claimed paths toward competent gestural performance are predicated on social influences—even the usage of inherited gestures demands learning about others’ behaviors.
KeywordsComputational model Gesture Social learning Apes Ontogenetic ritualization Dyadic brain modeling
Ape manual gestures are flexible, intentional acts that seek to influence the behavior of others. Apes’ production of gestures takes into account the attentional state of the recipient: the gesturer monitors the recipient’s comprehension, and future gestural bouts are often a function of past success or failure with using a particular gesture (Arbib et al. 2008; Cartmill and Byrne 2007; Hobaiter and Byrne 2011, 2014). Gestural communication is widely accepted to be more flexible and expressive than vocal communication in apes and this has informed theories emphasizing the role of gesture in human language origins (Arbib et al. 2008; Gentilucci and Corballis 2006; Greenfield 1991, 1998; Tomasello et al. 1997). Additionally, social learning in the apes is often cited to be more extensive (Dean et al. 2012) than in other nonhuman primates, with ‘cultural’ traditions arising among populations of individuals (Whiten et al. 2003, 2011)—though some groups of monkeys have been shown to have idiosyncratic behaviors transmitted through some social means too (De Resende et al. 2008). Moreover, comparative neuroanatomical and functional studies have found suggestions that particular connectivity and gross response profiles, respectively, are more similar between apes and humans than between apes and macaques (Hecht et al. 2013a, b). It is apparent, then, that an understanding of the mechanisms coordinating both gestural communication and social learning in apes is critical to properly contextualizing human language and human social cognitive skill.
Previously, we offered a model of the formation of gestures in apes via ontogenetic ritualization (OR), in which repeated interactions between a pair of individuals results in a ritualized gestural form that derives directly from these interactions to achieve a similar goal (Arbib et al. 2014; Gasser et al. 2014). Crucially, OR suggests that each individual in the pair is learning about the other’s behavior, though one learns to recognize and respond to the emergent gesture, while the other learns to produce a gesture whose form bears a physical similarity to the actions from which it derives (Halina et al. 2013). In our 2014 model, we were able to show how repeated interactions between two avatars—in our example, a simulated mother and child ape—can result in a ritualized gestural form that communicates the content of the interaction and can elicit appropriate responses by the observer. Importantly, the model was robust to variation in internal parameters and internal state variables.
Though it is widely accepted that not all gestures arise through OR, some authors argue that the majority of the ape gestural repertoire is innate, with variation in the set of gestures expressed in individuals and groups explained by differences in “expressing” some gestures rather than others (Byrne et al. 2017; Hobaiter and Byrne 2011)—though even these authors accept that apes can acquire novel gestures through an OR-like process when the other agent is a human (Byrne, personal communication, 2014; see the notion of human-assisted ritualization, Arbib 2012, p. 219). But whether or not one accepts that OR can play a role in the ontogeny of some gestures (as do Halina et al. 2013) or one holds that all or most gestures are innate—or that myriad factors may impact acquisition, including the spontaneous, online creation of gestures during interactive bouts (Pika and Fröhlich 2018)—it is clear that some manner of learning is involved, but what those processes are, how the role of the agent factors in (gesturer, recipient) and what that implies for the neural and cognitive machinery managing their learning and use has remained unclear. As noted above, past success or failure of a particular gesture in a particular context predicts its future usage (in that context at least), showing modulation as a result of feedback—specifically, the others’ actions. This shows that monitoring others’ behavior is involved in the learning process, giving social interaction an important role, whether or not novel gestures can be acquired through observing their use by others (Gariépy et al. 2014; Ghazanfar and Santos 2004). The model proposed here relies on feedback on the success of a gesture to affect its continued usage and adaptation, whether it has emerged via OR or is part of an innate gesture. We do not model acquisition of novel gestures through observation alone of their use by others, but do not rule it out as a target for modeling as new data become available though we do briefly consider observational priming effects (Fig. 6e, f).
Our primary aim in this paper is to present an integrated, dyadic brain model—a computational model of two interacting agents—of neural mechanisms that could support such social impacts on learning, in the hope that this will support new studies of apes which can assess the more unified approach to gestural acquisition presented here. By extending our prior model, we have defined a ‘base’ model for both gestural signaler and gestural recipient to show how differing roles and differing motivational and learning states can contribute to variation in behavior, both in terms of how gestures are learned and how they are responded to differentially. The variations in behaviors that can be seen are a result of the changes to initial states of the agents in the simulation, or a result of changes in learning in either, or both, agents. The patterns of gestural behavior broadly satisfy both main claims for gestural learning, and show how the space of innate gestures may be differentially sampled in such a way as to result in particular patterns of variation, while both novel and common gestural forms can be ritualized over a variable number of interactions. We show how modeling complex social behaviors, like gestural communication, with brain-based interacting models of distinct agents—dyadic brain modeling (Arbib et al. 2014; Gasser et al. 2014)—can inform the behavioral and neuroscientific fields.
Before presenting the model, we close this section with a sample of empirical data relevant to such assessment. Luef and Liebal (2012) describe how non-infant gorillas adjust their communicative strategies to infants in a sort of ‘motherese’ in which gestural sequences are extended and use of tactile gestures increases as a way to facilitate comprehension. The interactions are infant–specific in that these elaborated sequences are not employed when communicating with non-infants. Additional data from ape infant development suggests a general progression in the patterns of initiations of, for example, play behavior. Bard et al. (2013) showed that chimpanzee infants proceeded through a stage where, at first, their interactions were initiated by others (in this study, the human caregivers), and only later did the chimpanzee infants themselves initiate, and then request, particular interactions, like tickle play. Schneider et al. (2012a) showed a developmental pattern across ape species (though not orangutans) where tactile gestures preceded visual-only gestures in usage, suggesting that at first, infants that remain at close proximity to the mother used tactile gestures accordingly, but as independence from mother increased, use of gesture from a distance—that is, visual-only gestures—gained prominence. In general, the emerging notion seems to be that a multitude of factors impact socio-cognitive development (Bard and Leavens 2011), and that while innate programs for some gestures undoubtedly influence the developmental pattern for infant gesturing, the very fact that communication involves social partners demands that learning processes must involve processing social variables to adapt one’s own behavior (Schneider et al. 2012b).
Our model offers a first step toward considering these varied (social) influences on gestural usage. Elsewhere, we placed these results within a broader perspective on developing a computational comparative neuroprimatology, with special emphasis on its implications for hypotheses on evolution of the language-ready brain (Arbib 2016a, b). Still, important questions remain, such as disentangling these social influences and charting and explaining how they vary with development to better understand how others’ responses during development (whether actively via ‘scaffolding’ behavior, or otherwise) influence future communicative behavior. Key technical details of the model are presented in the ESM Appendix, but rather than explore the details of our simulations here, we offer analyses of diverse cases of possible gestural learning and show how our integrative model generates behaviors supportive of both competing hypotheses on gestural learning in apes. The aim is to encourage further work on dyadic brain modeling as a tool for further analyses of data from fieldwork and experimental studies of primate behavior and gesture, and the relation between them. The Supplementary Material offers a complementary effort—presenting the first iteration on a database, the Gesture and Behavior Database (GBDB), whose adoption would help systematize data on primate behavior and gesture both as an end in itself and as a resource for development and testing of future computational models.
Dyadic brain modeling
The present section offers a brief guide to the key features of the dyadic brain modeling methodology that serve as background for the following sections. Our group has a long record of modeling the brains of various species including comparative modeling informed by data on macaque neurophysiology and human brain modeling (Arbib 2016b). The effort here builds on the prior model (Arbib et al. 2014) of brain mechanisms supporting OR which in turn built on four of our earlier models of brain mechanisms for the execution and recognition of manual actions: the FARS model of parietal–premotor interactions in primate control of grasping (Fagg and Arbib 1998), the MNS learning model for the emergence of grasp-related mirror neurons (Oztop and Arbib 2002) and its extension MNS 2 (Bonaiuto et al. 2007), and the ACQ model that examines the role of mirror neurons in the opportunistic scheduling of one’s own actions (as distinct from recognizing the actions of others) (Bonaiuto and Arbib 2010). The OR model offered two advances: (1) it introduced dyadic brain modeling, simulating the brains of two interacting agents rather than focusing on simulation of a single brain (Fig. 1 and text below), and (2) it supported the hypothesis that apes could exhibit OR and monkeys could not because the latter had insufficient brachio-manual proprioception to support much in the way of intransitive manual gestures.
Figure 1 explicates the general notion of the dyadic brain modeling methodology, wherein an action–perception cycle between a pair of individuals is entered into, with one’s changing behavior being constantly analyzed by the other, and vice versa. While the methodology is general, the focus here is on interactions between a mother and child ape, implemented as two very simple avatars with the same body plan but of different sizes.
The basic idea of OR is that a behavior that can elicit a desired response becomes ritualized into a gesture that can elicit the same response but with less effort. Arbib et al. (2014) showed that their dyadic modeling could yield, just based on its initial conditions, the scenario hypothesized by Gasser et al. (2014), in which a hypothetical reach-to-grasp-to-be-hugged action becomes ritualized into a ‘beckoning’ gesture. In the initial state, the child wishes to be hugged by the mother and induces her response by reaching out to grasp her arm and tug it in his direction, changing her motivational state from indifferent to one for social bonding (Fig. 2a–c). (The content of the initial motivational state of the mother is less important to our current analysis; rather, the key fact is that she must switch goal states as a function of analyzing the child’s behavior and react accordingly.) The basic intermediate stages involve the mother coming to recognize the child’s behavior earlier and earlier and thus changing her motivational state (and thence, responding accordingly) earlier and earlier. Consequently, the child makes the transition over multiple trials, from (1) starting to emit the complete behavior but terminating it once the mother responds to (2) simply intending to make a truncated motion that is sufficient to elicit the mother’s response (Fig. 2d). In short, the model shows how a novel ‘beckoning’ gesture is added to the child’s motor repertoire and the mother’s perceptual repertoire over several discrete episodes of interaction.
An integrative model of gestural learning
Before presenting our new model, a few observations may put it, and the goals it seeks to achieve, in perspective. For one, it is known that significant overlap exists in the gestural repertoires of geographically separated groups of apes, leading many researchers to claim that these gestures must be innately determined (Hobaiter and Byrne 2011). As discussed later, however, some grasping behaviors—universal for humans—can be understood as resulting from learning routines tuning a more general ‘reach-to-object’ behavior (Oztop et al. 2004) to adapt to differrent types of object affordance. Thus, the universality of a behavior need not directly imply that such a behavior is innately determined—perhaps instead, universal learning routines act on a more general behavior to yield what is observed throughout the whole population. In the example from our original model, and which we reproduce here, a common interaction between child and mother “seeds” their simple learning routines and yields a ritualized gesture. Thus, it may be the case that similar gestures that are geographically widespread may still have been ritualized spontaneously.
Secondly, there is greater similarity in ape gestural usage for peers than non-peers (Schneider et al. 2012b). This reflects the fact that the behavioral needs of a juvenile (e.g., a request to be carried, or to initiate play with a peer) differ from those of an adult. Moreover, the repertoire of recorded gestures can vary greatly between individuals and we infer that the observed variation is large enough to conclude that the gestural repertoire in actual use also varies greatly between individuals. Thus, even for innate gestures, learning mechanisms are still required to explain why particular gestures are expressed or not within the repertoire of a specific individual.
With this, we can turn to our new integrative model of gestural learning (Fig. 3). The model can explain both OR (ritualizing gestures through dyadic interaction) and the varied pattern of usage for innate gestural schemas. (Schemas are just functional units within the model that correspond to some more complex representation. The motor schema for a gesture would contain the information required to reproduce the motor movements, for example.) We simulated a number of variations to the brain model to get a sense of learning processes under different circumstances, but the architecture of the model remained the same (see the ESM Appendix for details of the parametric variations, and all equations used). As in the earlier study of OR, the model is dyadic, with two agents interacting. Both agents have the same brain architecture, but each instantiation differs according to its respective role (e.g., child versus mother), motivational states, action/gestural repertoire, etc.
For each episode of interaction, the agents enter a perception–action loop wherein perceptual information is processed to yield updated estimates of the value of performing each available action, given the internal state of the individual. Action selection is determined by value, with a selection mechanism selecting the maximally active plan. As these updates modify the relative value of each action, new behavioral output is computed. Following this, the other agent cycles through the perception–action loop, generates motor output, and these new data are assessed by the former agent, continuing the cycle. In general, we simulate conditions wherein the child is initialized with a goal state to socially bond with the mother, while the mother is initialized in a neutral state. Each agent is initialized with a repertoire of available actions: walking, reaching, grasping, and in the cases where we simulate innate gestural pruning, the child has multiple gestural motor programs in its repertoire as well (see below). These actions are selected by their internal action selection mechanism according to the goal state they occupy. Additionally, they can switch states dynamically as a function of recognizing that the others’ behavior corresponds to one of those goal states: thus, the mother can switch goal states following the child’s actions, and select those actions in her repertoire that bring her (and the child) closer to achieving that goal. We detail the internal machinery below (and see Fig. 4 for a pictorial walk-through of the impact of changing activations and internal states over multiple episodes of interaction).
Visual, haptic and proprioceptive inputs in this model are processed by two streams: one assesses data on the other agent as a basis for recognition of the other’s action, and the other assesses data relevant for self-action. The latter determines those actions that are available to the agent given its internal state and the state of the environment. (For example, an action to reach out to grasp something is not available if no appropriate object is within reaching distance.) In the simulations, visual and proprioceptive information are maintained for the shoulder joints, elbow joints, wrist position and head position of each agent. Internally, each agent represents these values relative to themselves—thus, neither agent has direct access of the others’ data. Instead, information about the other must be computed from the visual and/or haptic information made available to their perceptual systems. (Haptic information is estimated algorithmically since there is no model of touch sensation in the simulations reported here.)
The action recognition system consists of a recurrent neural network for analyzing reaching movements and an algorithmic module for assessing attentional states and other actions like walking. This system, which receives visual and haptic (when available) information, determines what it is that the other agent appears to be doing. The details on the recurrent neural network can be found in the ESM Appendix, but in short, it consists of three output neurons, each of which corresponds to a unique action it can recognize (following appropriate training). As a time series of data is input to the network, the output neuron corresponding to those input data becomes more active. However, recognition of the others’ actions does not automatically change one’s own behavior. Rather, recognizing others’ behavior may cause one to ‘flip’ one’s goal states, to align with the goals of the other and so then cause a cascade in how one changes one’s behavior. In the case of OR, this linkage between recognition and goal switching must be learned (see below for Learning).
The socio-cognitive layer and motivational layer of the model manage how and whether these internal states are flipped. As we saw, the action recognition layer above may signal that reaching-to-grasp is being performed by the child, but the linkage that this signals an attempt at bonding may not yet have been established. Thus, this link to the socio-cognitive layer is modifiable by learning to associate another’s action with a change in one’s own goal state. As we discussed in our earlier paper, it is this learned linkage that drives the mother’s learning about the child’s behavior that can yield ritualized gestures. In our simulations of conditions under which OR is possible, these associations must be learned. However, when we test variation of use of innate gestures, we hypothesize some innate linkages between visual recognition of a gestural form and changes in the recipient’s internal state—which then yield changes in their behavior. The motivational layer, then, maintains a single goal state. Each individual is initialized in a particular state, and their subsequent behavior is a function of that state. However, we have seen above that internal state changes elsewhere can flip these states, yielding new behaviors.
The action selection layer selects the highest active motor schema for an action in the agent’s repertoire. Actions have value according to which the goal state is currently active and are also a function of the environment as assessed by the action recognition system: should the mother be looking elsewhere, a visual-form gesture would not be selected here, no matter how closely linked it may be to the current goal state of the child. Similarly, one cannot select a reach-to-grasp action if no object is within reaching distance. It is important to note that we allow for dynamic addition of actions into one’s motor repertoire as a result of learning. Following sufficient learning, a schema corresponding to the newly ritualized gesture may be added to the child’s action repertoire and be available for selection here. Additionally, when we discuss gestural pruning, we do not mean to imply schemas are lost: rather, the value they have as a function of the goal state may simply be reduced, and thus not often be selected by this module.
The motor control modules—including the internal model shown in Fig. 3—manage the motor performance of each agent. Arm motions are a function of affordance signals and postural signals. Affordance signals guide the arm toward explicit targets in the peripersonal space of the agent, such as the arm of the mother when the child reaches. Postural signals guide the arm in such a way as to reproduce past arm motions that have led to reward. In the case of OR, these arm motions are learned, and derive from the ‘seeded’ reach-to-grasp (affordance-driven) interactions. In the case of innate gestures, these signals encode the arm trajectory to be reproduced and are initialized at the onset of simulations.
The learning and value processing module manages all manner of learning in the model. Each agent can learn to re-value actions as a function of goal states (i.e., reinforcement learning), can learn to associate others’ behavior with particular goal states (facilitating rapid goal switching following recognition of others’ actions), and can learn properties of their own actions: for example, to gradually reproduce the posture of a reaching motion (at the expense of the physical target it was originally directed toward; i.e., ontogenetic ritualization). For each of these learning routines, there is a unique learning rate parameter that manages the rapidity with which learning-related changes occur. Further details for all modules and the learning systems of the model are provided in the ESM Appendix, but are not necessary for understanding the main text.
Methods and simulation results
We performed a multitude of simulations under differing conditions, varying parameter values, distribution of goal state activation, and the physical postures of the agents. For most of what we report here, our results are described qualitatively: the data available to construct well-constrained models are lacking (e.g., neural architecture) and simplifications are necessary to have a tractable problem to confront. We report the patterns of behavior that result under varied conditions and assumptions, which can lead to future analysis and experimentation. These results show the model to be highly adaptive to circumstances, with the avatars reacting according to variation in their internal states. Moreover, the behavior of each individual is shown to influence how each agent learns. We discuss how to interpret these results in terms of their implications for primatological and neuroscientific research.
In what follows, the names or descriptions of the gestures used in the simulations do not necessarily correspond to those in the primatological literature (e.g., Hobaiter and Byrne 2011). In time, it would be useful to have a close correspondence between what is able to be simulated and what is actually meant by the terms. The Supplementary Material presents an account of the Gesture and Behavior Database (GBDB) which could be extended to provide a space in which diverse groups of field workers and experimentalists could negotiate shared data (or a thesaurus of synonyms and nuances); these data in turn could support more comprehensive modeling that addresses the details of diverse datasets.
Ritualizing a reach-to-grasp gesture through repeated interactions
Because the model described here is updated in relation to the previous iteration (Arbib et al. 2014), we reproduced the general result of successfully ritualizing a reach-to-grasp gesture through mutual shaping of behavior. In this scenario, as described above, the child is initialized to seek social bonding with the mother, while the mother is in an indifferent state. It is also assumed, in this scenario, that no innate gestures are available to the child—and so the child must, at first, mechanically interact with the mother. We then performed similar parametric variation within both agents and recorded the number of discrete episodes until a gesture was spontaneously used by the child. (See ESM Appendix for details.) In particular, parameters that vary the responsiveness of an agent as a function of the other’s actions (as discussed above) greatly influence the emergence of a ritualized gesture, as do the learning rates involved.
As shown previously, there is a wide range in the progression of learning to ritualize, and this effect can be driven by either agent, mother or child, thus demonstrating that both are key actors in the process and that the learning involved cannot be attributed to one alone: gestural learning in this way is not driven solely by the gesturer. By varying these parameters, the progression of gestural learning changes and influences when, or even if, a gesture can emerge (see Table 1 in the ESM Appendix). It is also evident that under particular circumstances—such as when the mother is unresponsive to the child’s actions, as indicated by a higher threshold for goal switching—a gesture will not become ritualized, providing suggestions of how dyadic interactions may subtly influence gestural behavior.
Additionally, we can disrupt the mother’s recognition of the child’s gesturing behavior, to observe how the child responds to failed communicative acts. Following a variable number of bouts of unsuccessfully producing the ritualized gesture, the child can revert back to the original action sequence. Failed communicative attempts are met with a learning signal to downgrade the new schema for the gesture, but the original action sequence is unaffected. Following repeated unsuccessful gesturing bouts, the learning system devalues that particular gesture in that particular context, until the original action sequence holds the highest value, leading to its execution. Thus, the model is robust to substantial variations in internal and external conditions. We will return to circumstances where ritualization fails below.
We can also demonstrate successful recognition of the reach-to-grasp action via the simulated neurophysiological responses in the mother’s action recognition neural network (Fig. 5). Output unit responses from the recurrent neural network are correlated to the child’s actions (left), while the neurons in the socio-cognitive layer (right) integrate this activity over time. For both graphs on the left, the action recognition network is signaling recognition of the movement of the child’s arm, as indicated by the rising activation of the neuron. However, in this scenario the mother must learn about the relationship between the child’s action and changes in her own goal state. As discussed above, this is mediated by a socio-cognitive layer that causes goal switching. At first (bottom right), the integrator neuron cannot pass the threshold until substantial haptic input drives the neuron. As learning associates the output neuron (left) to this integrator neuron (right), rapid recognition and rapid goal switching eventually occur (top right). Again, it is this learning-related change that greatly contributes to ritualization. Of course, these single neurons stand in for whole neural populations in the real ape brain. However, the simplifications here reveal the logic essential to ground more subtle computational models as well as fieldwork designed to test the implications reported here.
Ritualizing a variant reach-to-grasp gesture through repeated interactions
Having shown that reach-to-grasp gestures can be ritualized under at least some circumstances, we now show that the particular posture of the mother relative to the child, and the particular target of the child’s grasp, can be varied and still result in a ritualized form of a gesture. Indeed, the gestural form ritualized in these circumstances takes a different posture owing to its causal relation with the ‘seeded’ reach-to-grasp gesture of the child. Since the child’s praxic form—which is object-directed—varies as a function of the mother’s posture, the resultant gestural form inherits the varied reaching form from which it derives. In one simulation, the mother is initialized with her arm above the child’s head, thus forcing the child to reach higher to grasp her arm. Following a variable number of interactions as above, the child is capable of spontaneously using the gesture communicatively, but since his motor learning is dependent on the posture of his reaching movements, the form of this ritualized gesture is varied as compared to (1) (see Fig. 6A, B). In fact, a wide range of variable, ritualized gestural forms are possible in this way, and so there is no correlation with the exact initialized posture of either agent. Indeed, this is further proof that such a simple form of learning may plausibly be involved in a wide range of interactive behaviors.
Ritualization fails when mother is too unresponsive
There are circumstances where ritualization does not occur, and analyzing these circumstances can be informative. Two parameters can be varied that result in, effectively, an inability for the child to ritualize the reach-to-grasp action, though the child remains capable of achieving his goal state through praxic means. For higher values of the mother’s threshold for switching goal states, the mother will never complete the action of the child, and can only be mechanically acted upon for her to bond with the child. Additionally, if the mother is a slow learner in this respect, she fails to adapt to the child, and so to anticipate his behavior, leading to a lack of responsiveness. For the child, increasing the parameter which controls consolidation of the learned postural form can effectively leave the child unable to ever ritualize in this way. Critics of OR might argue that such interactive learning is possible, but that gestures just do not become consolidated often—and so, such a high parameter value may be hypothesized. Conversely, those supportive of OR claims may suggest a lower parameter value here, and so greater ease with which new motor programs of this sort can be consolidated. In all of these scenarios though, the child is still capable of achieving his goal through purely praxic means.
Ritualization fails when the mother is too proactive in bonding with child
We have seen that an unresponsive mother may prevent ritualization from occurring. More interestingly, we can assess how varying the mother’s proactivity—by varying her initial goal state activation—influences the progression of gestural learning. When the mother is initialized in the appropriate motivational state—to achieve physical bonding with the child—she immediately moves to embrace the child, regardless of the child’s behavior, with the result that no learning about the reach-to-grasp occurs in the child. While this seems trivial, it is important to note that in limited numbers of interactions, there may be influences on the engagement of mothers to infants which change how or whether the child has the opportunity to ritualize a gesture. There have been recent longitudinal studies on mother–infant dyads and how interactions influence future socio-emotional and gestural behaviors (Bard et al. 2013; Schneider et al. 2012a, b)—more on this in “Discussion”.
Pruning innate gestural repertoires
Until here, we have reaffirmed the results of our original model, and then further shown the robustness of OR to variations across parameter values and physical characteristics. We now turn to demonstrating how the same model architecture responds when innate gestures are initialized in the action and perceptual repertoires of the child and mother. In these instances, three more motor schemas are initialized. We can systematically vary their baseline ‘action value’ for the child, to assess how varied internal biases affect future usage, and the linkages to the set of goal states for the mother, or vary her past experience in observing these gestures. In this way, we can assess how gestural performance for innate gestures varies over time as a result of learning processes and hypothetical internal differences. In addition, we can ask how, or even whether, OR can proceed when innate gestures are available to the child.
The three gestures we innately program are schemas for different visual, arm-based gestures: an arm raise (the arm is raised above the shoulder, near the head of the gesturer), an arm-to-ground (the arm moves downward and the hand makes contact with the ground), and an arm swing (the arm is rapidly moved back and forth at the side of the gesturer). (Again, we do not hypothesize these names correspond to those in the literature.) Hobaiter and Byrne (2011) suggested that infants, whom they found to gesture more frequently in sequence bouts than other ages, produce a wide variation in gestures at first, only to prune their gestural selection to fit contexts appropriately, thus becoming more efficient. We can show the same effect.
In one batch of simulations, we initialized the child with a random distribution of values associated with each innate gesture. These values predicted the relative probability of selecting a particular gesture in the given context. Reinforcement learning then updated these values after each usage according to the feedback received: increasing the value when it leads to reward (here, satisfaction of the goal to socially bond with the mother) or else decreasing the value. The result of failed gesturing attempts does not complete an episode of interaction. Instead, the child selects a new gesture, and so on, with the effect that the child gestures in a sequence, as described in the literature (Hobaiter and Byrne 2011). Over several episodes, this simple learning mechanism promotes one or another gesture enough so that in future contexts, communication is more efficient (see Fig. 6C, D). In this way, the observed individual variation in a population with a large set of putatively innate gestures can be reproduced as a result of small variation in internal activations of the individuals.
We next ran a batch of simulations similar to the above, but now with variation in the weights linking action recognition to changes in goal states in the mother (see above). As before, the child can differentially sample from this innate set of gestures, but the success or failure of a given gesture is also dependent on the mother’s varied behavior. In these simulations, the mother may be biased toward responding to a particular gesture, versus being less responsive to others. It is observed again that, over time, the child tends to utilize the particular gesture the mother most responds to. Still, in situations where the mother does not possess a great bias toward one gesture, she can come to learn to respond to the child’s gesturing gradually.
To test how our model handles disruption to motor or action recognition performance in the child and mother, respectively, we performed two more batches of simulations: one where the child’s motor performance of these gestures are abnormal and the other where the mother’s action recognition network is not properly trained (see ESM Appendix). In the former case, the child can select a gesture, but his performance does not match what the mother expects: for example, in an arm swing, he may rotate his arm near his head rather than near his midline. In the other cases, the mother is unable to recognize the performance. Interestingly, if the scenario essentially prevents the dyad from agreeing on a gesture for that context, because of the dynamic changes in valuation of actions, the child will eventually revert to mechanically interacting with the mother as described above—even going so far as to ritualize a new gesture. Thus, even in conditions where multiple gestures are initialized in the repertoire of the child, he may still learn to ritualize a new gesture when his performance of these innate gestures does not satisfy his goals.
Up until here, we have tested cases centered around the achievement of a single goal: social bonding. However, because developmental stages correlate with changes in salient goals (nurturing from mother as an infant, to play behavior with peers, to foraging and sexual behaviors as an adult), we tested how learning affects the usage of gestures not just over time, but across different goals. To do this, instead of re-initializing the agents following multiple episodes of interaction, we maintained their learning-related changes and only re-initialized them in different goal states, before simulating their interactions once again. This way, we can test how the differential sampling occurring over time with respect to a given goal may affect the ‘pruning’ with respect to a new goal. In these simulations, the effects are what would be expected from a reinforcement-learning perspective: because the goal states are independent, the learning associated with previous goal states does not impact gestural performance in novel contexts. Thus, if an arm swing gesture was unsuccessful for achieving one particular goal, it can still be found to be successful for an alternate goal. In the literature, there is not just disagreement over how to categorize gestures, but different studies find putatively the same gesture used in the service of different goals and among different contexts (Hobaiter and Byrne 2014). For example, Bard et al. (2017) observed hundreds of instances of a ‘touch’ gesture, careful to record the variability in hand form, target location (i.e., body part of recipient), and the more than two dozen varying contexts. Hobaiter and Byrne (2017) thoroughly categorized and catalogued the gestures documented in ape species according to criteria like body part used, kind of movement, and use of object. In both cases, it is apparent that the sources of meaningful context resolve ambiguities for the chimpanzees, although touches on one’s body or arm motions to a naïve observer may at first appear similar. Future modeling must incorporate a diversity of sources of context and so tease apart how these subtle differences impact learning and future gestural behavior.
Finally, we did preliminary testing of observational priming effects by introducing a third-party observer during certain simulations (see Fig. 6E, F). Imitation of gestures is in wild populations of apes is debated (Arbib et al. 2008; Schneider et al. 2012b), though the notion of what constitutes ‘imitation’ versus other forms of social learning contributes to the confusion (Chang et al. 2013; Gariépy et al. 2014; Gasser and Arbib 2017a, b). The methodological details are explained in the ESM Appendix, but the notion is that information on the visual performance of the gesture and the feedback received as a function of the gesture—whether a success or not—are made available to a third instance of our model (the observer). We then allow for simple reinforcement of the activated motor schema: positive reinforcement for successful gestures, negative reinforcement for unsuccessful attempts. Following periods of interaction between the gesturer and the recipient, the observer was initialized to seek social bonding from the mother. Here, we observed that the patterns of gesture usage from the third-party observer followed the patterns of success or failure from the original dyad: successful gestures are more likely to be observed (and lead to goal achievement) than the gestures that were unsuccessful. In this way, under our model assumptions, priming effects may contribute to driving the differential sampling observed in populations of apes.
The effects assuming a stock of innate gestures have on the behavior of the modeled dyad (and third-party observers) largely agree with the literature: variation over time and across individuals can be simulated, and behaviors that have been established previously are maintained (e.g., ritualized gestures). Still, to have a tractable problem, we must make simplifications and assumptions: few strong conclusions can be drawn from these particular simulations, since the data cannot greatly constrain our models. However, what can be said is that simple learning routines organized around well-researched sensorimotor systems that manage reach-to-grasp behaviors (Fagg and Arbib 1998), action recognition (Bonaiuto et al. 2007; Oztop and Arbib 2002) and decision making (Bonaiuto and Arbib 2010; Gasser and Arbib 2017a, b) can yield impressive adaptive behavior. It will take more research in this area to answer how these brain networks interact, how apes deploy visual attention to others’ actions, how they process feedback to inform future behaviors and how well defined the kinematic details of gestures must be: above, we assumed an arm swing at midline was different from that same motion near the gesturer’s head. Future research can clarify these issues with, for example, empirical studies like those of Hobaiter and Byrne (2017) and Bard et al. (2017) better integrated with future modeling, extending the studies here to make precise their implications for understanding ape gesturing behavior.
We have presented an integrative model demonstrating gestural acquisition as a function of multiple learning processes across innate and ritualized gestures. The gross behavioral results of the model comport with data on ape gesturing behavior, though necessarily at a level abstracted away from certain details. We now summarize how our model relates to existing data below, and how the model may be extended to make further contact with available data.
Social influences on gestural learning in apes
Various researchers claim to observe group-specific or idiosyncratic gestures (Halina et al. 2013; Pika and Liebal 2006; Pika et al. 2003). Yet, based on analyses of the gestural repertoires of gorillas (Genty et al. 2009) and chimpanzees (Byrne et al. 2017; Hobaiter and Byrne 2011, 2017), Byrne and colleagues argue for a negligible role for social learning mechanisms in the acquisition of gestures, and instead argue that apes have access to family-typical and species-specific gestural repertoires from birth. As an example, Hobaiter and Byrne (2011) recorded 66 unique gestural types as operationally defined in their study. (Subsequent work has expanded this repertoire to 81; see Hobaiter and Byrne 2017.) Of the 66, the pattern of variation of their usage across subjects did not match their criteria to be considered idiosyncratic, nor—they claim—did the two ‘potentially ritualized’ gestures observed sufficiently match the physical acts OR may have predicted. (The conclusions of (Genty et al. 2009) on gorilla gesturing are similar.) Nonetheless, other studies, employing differing methodological approaches, such as an emphasis on longitudinal designs (Halina et al. 2013; Rossano and Liebal 2014) (and see: Pika and Fröhlich 2018), make this picture more complicated. Moreover, our modeling suggests that widespread use of a gesture does not preclude that it emerged through OR. In any case, analysis of the role of learning is important even here.
Elsewhere, Hobaiter and Byrne (2011) argue that ‘pruning’ occurs over development as infants transition from gesturing in sequence rapidly to being more efficient in their sampling and use of gestures, as observed in non-infant chimpanzees, though especially adults. (The notion of ‘pruning’ may be more properly understood as differential sampling, since it is not contended that the gestures not often expressed are ‘lost’.) For instance, they observed that as age increased, the number of sequence ‘bouts’ decreased, while the percentage of successful communicative bouts increased. Other data support the notion that these interactions are critical not just to gestural development, but more generally to motor, cognitive and socio-emotional development (Bard et al. 2013; Schneider et al. 2012a). Luef and Liebal (2012) presented data suggesting gorillas interacting with infants modify their expressions in infant-specific ways. Schneider et al. (2012b) confirmed that infants are more alike in their gestural repertoire than adults, while the adults are more alike to each other than to the infants. Most interestingly, Bard et al. (2013) showed how infants first engage in, then initiate and finally request particular social interactions as gestures are learned specific to each context. Recently, Pika and Fröhlich (2018) presented an analysis of gestural acquisition over development, suggesting that social negotiation between individuals spontaneously leads to the emergence of gestures during development. Contrary to claims from OR, these periods of interaction between individuals need not be ‘seeded’ by functionally effective actions, but can be spontaneous occurrences by capable learners. This Social Negotiation Hypothesis contends that myriad factors can lead to the development of gestural repertoires, and presents interesting challenges to future modeling studies: how may gestures spontaneously arise between individuals, and how may a gesture be understood and subsequently used by both individuals of an interacting pair?
The claim that motor pattern generators (MPG), specific for each gesture, are acquired genetically, and that appropriate environmental ‘releasers’ are also encoded to give the gesture its meaning, does not address whether this obscures a role for social learning—underplaying the notion that observation may ‘prime’ the execution of one gesture rather than an alternative, semantically similar gesture (Tomasello et al. 1989). In what ways is the learning ‘social’—due to the necessarily social nature of the interaction—and in what ways is the learning non-social? For example, Oztop et al. (2004) showed that universal grasp types for humans need not be encoded genetically, and that simple learning mechanisms can explain the rise of common behaviors as those grasps, e.g., precision pinch and power grasp, which are stable in the developing repertoire because of their proven utility in manipulating objects. Similarly, turning to speech, Oudeyer (2005) offered a model of phoneme learning in human infants based on self-organization of feature maps. Note that here the notion is that there is a continuous space of articulation and perception and that learning will carve the space up into categories in a way that may be culturally determined (as in the case of the lost distinction between /l/ and /r/ in Japanese). For a group of apes, there may or may not be a ‘culture’ of existing gestures to shape infant learning, but we advocate extensive investigation of the hypothesis of the “carving up” a continuous space of brachio-manual forms rather than basing learning on a discrete set of genetically defined gestures—even though the model reported here was, as an opening strategy, based to a great extent on a discrete set of gestures.
Social learning of manual tasks in primates
Comparative behavioral task designs—comparing either apes and human children (Horner and Whiten 2005), or macaques, apes and human children (Dean et al. 2012)—have found species differences in the extent to which learning about action sequences from others is possible and in what learning strategies are employed. For example, Horner and Whiten tasked apes and human children to learn from an adult demonstrator how to open, through a series of manipulations of lever bars and small doors, an ‘artificial fruit’ box (attempting to simulate a naturalistic behavior involving extracting a food item, for example). Whereas human children appeared to imitate more directly the actions of the demonstrator including through the imitation of the action ‘means,’ chimpanzees appeared to ‘emulate’ the ‘ends’ of the demonstrated action, while omitting actions that did not clearly achieve subgoals. Dean et al. found differences across all species, but with humans again achieving much closer correspondence to the demonstrated set of actions, as well as uniquely being capable of pedagogical instruction among the tested subjects: children would assist others in solving the complex task. Still, ape observational learning strategies did facilitate their performance above expected baseline, and in some respects with respect to the tested macaques as well.
Gorillas in the wild are able to learn hierarchical action procedures to process nettles in such a way as to avoid harsh stingers. Infant gorillas observed adults processing the leaves and eventually—over years of observation and trial-and-error practice—became capable of avoiding the stingers too. Importantly, it appears that the relevant features learned through observation by the young gorillas were not the actions themselves but the subgoals they achieved, as determined by “behavior parsing” (Byrne 2003). A combination of observation over long periods of time to identify subgoals and individual trial and error to find the means to achieve them resulted in the learning of a complex, hierarchically structured task. This is further supported by studies of monkey palm nut cracking (Fragaszy et al. 2013). Due to the use of stone anvils to assist in the cracking of hard nut shells, young capuchins can interact with existing anvils and learn from many attempts over a long period of time. These trial-and-error experiences yield the gradual accumulation of the skill required to extract the nuts. In both examples, constituent actions were learned by trial and error. What distinguishes these examples from OR is that the learning feedback for nettle folding relied primarily on the physical properties of the nettle and thumb, or the nuts and the stone anvil, whereas in OR the dyadic loop between mother and child is crucial. These studies show imitation need not follow directly from observation of others—at least in difficult tasks—but rather that social learning can combine a number of possible influences (Gariépy et al. 2014) with trial-and-error learning to reproduce desirable features of the observed behavior.
Our integrative model supports both mutual shaping (OR) and pruning of innate gestures. However, the model is preliminary and our aim is to point the way forward for increased attention to modeling (and, in particular, dyadic brain modeling) in assessing and integrating further work in the primatology of gesture and behavior.
Among the refinements, for example, would be to model more finely the motor learning of the gesturer. Contemporary research is unclear on the kinematic details of ape gesturing behavior, though in principle video analyses could yield insights. This also relates to perceptual discrimination on the part of the recipient in a dyad. In general, both the ape learning the gesture and the other member toward whom the gesture is targeted needs to be assessed, to see whether much precision is needed for the success of the learning mechanisms we have posited. Additionally, studies examining the role of contextual information in participating in the meaning of a gesture like ‘touch’ (Bard et al. 2017)—during play behaviors, or as an ‘attention getter’, for example—offer intriguing ideas for modeling the recognition and appropriate response to, on the surface, similar gesture ‘types’.
Additionally, as shown by Rossano and Liebal (2014), for example, dyadic interactions can be highly structured and complex. The simple scenarios explored here are useful first steps that require more structured data and the model refinements to address them. Additionally, the model should be expanded in terms of how goal states are represented, creating a larger action space and bringing in emerging kinematic insights, and modeling ‘populations’ of individuals and possible observational effects on future gesturing behavior. Additionally, more work is needed to apply this united framework to analyze other instances of social learning, for example the development of vocal communicative signals in vervets (Seyfarth and Cheney 1986). Additionally, the tasks discussed above, artificial fruit tasks and nettle processing, each involving goal–subgoal structures, require further modeling. Nut-smashing behaviors (Visalberghi et al. 2013) and other behaviors involving tool use are apparently influenced by social learning effects, though whether these can be said to be ‘imitative’ or not is unclear. In developing and assessing our models, it is important to recognize the emergence of neurophysiological techniques, including noninvasive imaging, examining computation of social variables in interacting primates (Azzi et al. 2011; Klein and Platt 2013; Santos et al. 2011; Yoshida et al. 2011, 2012). Moreover, though the efforts detailed here are novel for brain modeling (see our work modeling observational learning in macaques) (Gasser and Arbib 2017a, b), it is important to be aware of the contributions from dyadic modeling of, for example, robotic agents in the context of learning novel forms of communication (Spranger and Steels 2014; Steels 2003).
The modeling work presented here sought to go beyond the results of our previous work which detailed a dyadic brain modeling approach specific to ontogenetic ritualization (Arbib et al. 2014). Here, we offer an integrative model of gestural learning that unifies learning mechanisms for ontogenetic ritualization with mechanisms for ‘pruning’ an innate stock of gestures as a function of the successes or failures of previous gesturing attempts. Our model is capable of ritualizing gestures through the ‘mutual shaping of behavior’ in particular contexts, but then is also capable of discovering appropriate gestures within an innate repertoire for particular goals. According to leading hypotheses on gestural acquisition (Byrne et al. 2017; Halina et al. 2013), these two mechanisms may explain the observed patterns of gestural behavior in wild and captive apes, though rarely are attempts made at reconciling these two plausible routes for gestural acquisition within a single, unified account. We have sought to make a computationally specific account of flexible learning, production and recognition of gestures as both ‘ritualized’ and innate motor patterns. We have shown results consistent with these separate claims, and consistent with previous modeling work and known neurophysiological mechanisms. Recent work on spontaneous, online creation of gestures (Pika and Fröhlich 2018) adds further, exciting challenges to this modeling work, as does the persistent interest in imitation as a mechanism for gestural acquisition. These datasets (and see also Bard et al. 2017), datasets on social learning of manual actions discussed above, and datasets on neuroanatomical and neurofunctional data in apes (Hecht et al. 2013a, b), can all contribute to a continued refining of this model and of the relevant questions for understanding gestural acquisition in non-human primates.
This material is based in part on work supported by the National Science Foundation under Grant no. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship” (Michael A. Arbib, Principal Investigator).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflicts of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Hecht EE, Gutman DA, Preuss TM, Sanchez MM, Parr LA, Rilling JK (2013a) Process versus product in social learning: comparative diffusion tensor imaging of neural systems for action execution–observation matching in macaques, chimpanzees, and humans. Cereb Cortex 23(5):1014–1024. https://doi.org/10.1093/cercor/bhs097 CrossRefGoogle Scholar
- Pika S, Liebal K (2006) Differences and similarities between the natural gestural communication of the great apes and human children. In: Cangelosi A, Smith ADM, Smith K (eds) The evolution of language, Proceedings of the 6th international conference (Evolang6). World Scientific Publishing, London, pp 267–274Google Scholar
- Rossano F, Liebal K (2014) “Requests” and “offers” in orangutans and human infants. In: Drew P, Couper-Kuhlen E (eds) Requesting in social interaction. John Benjamins, Amsterdam, pp 335–363Google Scholar
- Spranger M, Steels L (2014) Discovering communication through ontogenetic ritualisation. Paper presented at the Development and learning and epigenetic robotics (ICDL-Epirob), 2014 joint IEEE international conferences onGoogle Scholar