Introduction

Whereas there are many approaches to the evolution of human cognition, we set our contribution to this special volume within the context of evolutionary cognitive archaeology. This discipline aims to identify, reconstruct, interpret, and explain development and change in the cognition of past societies, based on the material culture they left behind (e.g., Garofoli 2018; also see Renfrew 1993). Several approaches have contributed meaningfully to discussion about the cognition of Stone Age/Paleolithic Homo sapiens populations—none more so than the enhanced-working-memory model (see Coolidge 2019 for a recent synthesis). Other useful frameworks include expert cognition (e.g., Wynn and Coolidge 2004; Wynn et al. 2017), material engagement theory and meta-plasticity (e.g., Malafouris 2015; Roberts 2016), theory of mind (ToM) (e.g., Gärdenfors 2003; Cole 2019; Dere et al. 2019; Stade and Gamble 2019), mental time travel (e.g., Brinums et al. 2018), as well as cognitive task-structuring strategies (e.g., Fairlie and Barham 2016). Galway-Witham et al. (2019) recalled orders of intentionality, as an element of ToM, to differentiate levels of cognition based on the material culture of the last 1 Mya (also see Dunbar 1998; Cole 2019).

Here we discuss why we see causal cognition as a useful general framework for cognitive archaeology, and explore relationships between the evolution of causal cognition and the evolution of ToM. Following previous work, we break down causal cognition in seven grades and ToM in several orders, and we provide examples of how the seven grades of causal cognition may play out in the archaeological record.

We put forward three theses:

  1. 1.

    ToM is an integral element of causal cognition.

  2. 2.

    Generally, the more advanced causal cognition is, the more it is dependent on ToM.

  3. 3.

    The evolution of causal cognition depends more and more on mental representations of hidden variables. (A hidden variable is something that is not directly or physically perceivable, but only mentally constructed; within philosophy of science these variables are called theoretical entities.)

Causal Cognition as an Inclusive Way of Exploring Human Cognitive Evolution

As one of our approaches to the exploration of human cognitive evolution, we have previously presented a new analysis of causal cognition (Lombard and Gärdenfors 2017; Gärdenfors and Lombard 2018, 2020). We find this broad, yet nuanced approach to causal reasoning useful to cognitive archaeology because it incorporates almost all other types of thinking relevant to the topic of human cognitive evolution. For example, it includes aspects of working memory (Bauer and Booth 2019), episodic memory (Suddendorf 2017), mental time travel (Gärdenfors and Osvath 2010; Brinums et al. 2018), analogical reasoning (Krzemien et al. 2017), intentionality (Sloman et al. 2012), general ToM (Barrett 2012), relational complexity (Halford et al. 2010), and social cognition (Rochat et al. 2004). Causal understanding is also integral to tool use (Wolpert 2003; McCormack et al. 2011; Osiurak and Reynaud 2020), making it important in terms of later hominins evolving into obligatory users of stone tools (Shea 2017), and thus germane to the archaeological record.

A tool’s or object’s usefulness and application depend on several elements. Extrapolating from the list proposed by McCormack et al. (2011), we propose that these would include: (a) a tool or object’s physical traits, (b) the physical and mental traits of the tool user and those of his/her target or audience, (c) the causal (mechanical or perceptual) principles that connect these traits, and (d) how the tool user understands the underlying principles and relationships between these different aspects. As such, human techno-behaviors provide opportunities for participants to actively and knowingly intervene with their physical and social environments to reach specific goals based on their causal understanding of the relationships between tools and objects, or the effect of these on participants or circumstances.

Working from Woodward’s (2011) three-tier model, and based on the human ability to read tracks (Carruthers 2002; Liebenberg 2013; Shaw-Williams 2014, 2017; Stuart-Fox 2014), we initially established a framework for causal cognition that is fine-grained enough to accommodate a range of extended evolutionary trajectories (Lombard and Gärdenfors 2017). We then investigated causal cognition in terms of force dynamics and how it may play out in the development of some Stone Age hunting technologies (Gärdenfors and Lombard 2018, 2020). The resulting model includes seven grades of causal understanding, each operating increasingly detached in time and space. However, it is important not to think of the grades as unilinear. For example, although the range of causal understanding expands, its contextual evolution is not always linearly progressive, because aspects of its development are a systemic process involving the interplay between evolutionary-biological, historical-social, and ontogenetic-individual dimensions (see Haidle et al. 2015 for discussion). With such extension of the relation between cause and effect, we partly follow the distinction between cued and detached mental representations introduced by Gärdenfors (2003). A cued representation refers to something in the current or recent sensorial experience of the individual. For example, a warning call triggers the expectation of the presence of a predator. By contrast, detached representations refer to objects or events that are not present in the subject’s current or recent external context, and so could not directly trigger the representation.

With this contribution we present refined definitions for the different grades of causal cognition. For the first time, we make it explicit how these relate to forms of ToM, and integrate other types of cognition where relevant to demonstrate the inclusive scope of the framework. We also suggest how our categories for causal cognition may reflect in the archaeological record—thus making our model more accessible to cognitive archaeology in general, and more testable against aspects of the archaeological record.

The Role of a Theory of Mind in Causal Cognition

One form of cognition that is well developed in humans, compared to other species, is theory of mind (ToM), which in this context means the sharing and representing of one’s own and of others’ mentality (e.g., Premack and Woodruff 1978; Tomasello 1999). Having a ToM is not a unitary ability, but applies generally to understanding the emotions, attention, desires, intentions, and beliefs of the self and others (Gärdenfors 2003, 2007), and that actions based on such understanding have causes and effects. In this sense, we argue that ToM is a fundamental part of the causal cognition package, instead of something separate or disconnected from it. When analyzing different forms of causal cognition it is therefore useful to separate orders of ToM. Dennett (1987) writes about different orders of intentionality, but here we extend it to other components of ToM.

  • Zero-order ToM ascribes no mentality to an individual, but assumes that behavior of the individual is governed by instincts, reflexes, or conditioning.

  • First-order ToM attributes emotions, attention, desires, intentions, or beliefs to the individual and that some forms of behaviors are governed by these entities. This level, however, presumes no understanding of the minds of other individuals.

  • Second-order ToM requires an individual to attribute a ToM to other individuals and to use this in their understanding of the behavior of others.

  • Third-order ToM requires an individual A to attribute to a second individual B an understanding of the ToM of A.

  • Higher orders of ToM require an individual to represent at least two mental states, their own and that of someone else.

The important point is that a great deal of cognitive and social complexity found in hominins presumes that a number of mental states are linked together in a web of causal learning and understanding.

ToM also includes cooperative forms, in particular joint attention and joint intention. Joint attention results when the agents have eye contact while sharing attention to a target. The prolonged eye contact signals mutual awareness and promotes communication about the target (Tomasello 1999). Joint attention involves third-order ToM since the individuals must ensure that they attend to the same thing (“I see that you see that I see”) (Gärdenfors and Warglien 2012). The ability to engage in joint attention has not, so far, been established conclusively in nonhuman primates (Carpenter and Call 2013; but see Leavens and Racine 2009; Tanner and Byrne 2010 for a different opinion). Joint intention requires that the agents share an intention to interact, react to each other’s intentions to act, and coordinate their intentions (Tomasello et al. 2005). For example, in setting up an ambush in hunting, the individuals involved understand that it is their common goal to kill a prey animal, and that they take different roles in the execution of the joint intention. Again, this is an example of a third-order ToM.

As discussed above, ToM comprises multiple components that probably evolve gradually in animals and humans. Dunbar (2007), for example, suggests that great apes are poised at the brink of second-order ToM, because some of them have a capacity for understanding false-belief states (Krupenye et al. 2016). A test for such understanding is deliberate deception with the intention to affect or manipulate the knowledge, beliefs, or emotions of others. In children this ability mostly emerges by about four years of age (Wimmer and Perner 1983; Gamble et al. 2014). Cole (2019) argues that the conscious apprehension of third-order ToM by H. sapiens and our immediate ancestors provides the necessary “springboard” towards subsequent higher orders.

Grades of Causal Cognition, Their Relation to Theory of Mind and Stone Tool Behaviors

In this section, we reintroduce the seven grades of causal cognition with slightly revised nomenclature and updated definitions. We unpack their relationship with ToM where relevant, and suggest how each grade may manifest in terms of stone tool behaviors observed in the archaeological record. It is, of course, rather hopeless to try to date when each of the seven grades we propose emerged during hominin evolution. Our arguments are instead based on comparisons between the capacities for causal cognition that are expressed in the different Stone Age technologies and techno-behaviors represented in the archaeological record that may serve as proxy for certain ways of reasoning. To do so, we rely on the methodological principle of cognitive parsimony, i.e.:

  • If the cognitive capacities required for an activity or technique A are a subset of those required for an activity or technique B, then (barring cases where the additional capacities required for B are evidenced synchronously or earlier than A), A is evolutionarily prior to B.

Even though this principle does not say anything about dating, it makes it possible to argue that one type of activity is evolutionarily older than another. The principle entails that the grades of causal understanding do not necessarily follow a unilinear evolutionary trajectory. Aspects of each type of thinking may have evolved parallel to one another, or are still evolving within continuing coevolutionary feedback loops with other relevant fields such as social frameworks, human biology (e.g., brain and DNA) and ecology (Lombard and Högberg 2021). Within each grade of causal understanding there might also be several levels of complexity that developed at different times in different places and/or in different hominin populations. For example, basic, conspecific mindreading skills (grade 3 below) might have been acquired early on in our evolution. However, enhanced orders of human mindreading, or ToM, that enable us to cope with current complex societies, might only have evolved at a later stage, i.e., after we were able to understand and interpret the behaviors of non-conspecifics or grade 5 causal understanding. Thus, a newly identified grade of causal understanding does not automatically imply that all or some aspects of the previously identified grade stopped its evolutionary process.

We also need to keep in mind that technical practices may be simplified during the process of cultural evolution when groups shift their socioeconomic behaviors (e.g., Shennan 2001; Henrich 2004; Riede 2008), or when they find cognitively less demanding ways to produce and use a technology. Good examples of “cognitive simplification” include expert cognition (e.g., Wynn et al. 2017), and cognitive or technological modularization (e.g., Lombard and Haidle 2012; Lombard et al. 2019). Thus, despite a general trend towards “cumulative culture,” cognitive evolution and its products are not always subjected to a one-way process—the so-called ratchet effect—towards increasing complexity (see discussions in Lombard 2016; Haidle et al. 2015). Furthermore, novelty in the archaeological context might not correlate directly with novel traits in cognitive evolution, because some cognitive capacities might have been expressed behaviorally for some time in ways that are invisible through the material record. Yet, from a cognitive archaeological point of view—the focus of this article—the hominin technical record provides concrete, spatiotemporal proxies for some ways of thinking. By using parsimony these proxies represent “minimal-capacity inferences” (e.g., Wynn and Coolidge 2004, 2009; Pain 2019). They provide the simplest explanations, requiring the fewest possible assumptions to reach the best-fit interpretation of the data, safeguarding against the overestimation of cognitive capacities.

Grade 1: Individual Causal Cognition

This most basic type of causal understanding corresponds to Woodward’s (2011) egocentric causal learner. It involves a direct connection between a motor action that an individual exerts and the resulting effect. Both the cause and the effect are immediately perceived, with the result that the individual experiences their own difference-making agency. Individual causal information processing does not involve strong cognitive or social mechanisms. It can be learned through ordinary instrumental conditioning without any social transmission and limited self-awareness, and was well in the cognitive range of the last panin-hominin common ancestor already during the final stage of the Miocene (Stuart-Fox 2014; Lombard and Gärdenfors 2017).

In terms of stone tool behavior, individual causal understanding simply requires that someone is aware that they can manipulate a stone to cause an effect—for example, the awareness that dropping a stone on another rock will make a noise, or perhaps damage or break one of the stones—without involving sharing and representing of others’ mentality. For example, Proffitt et al.’s (2016) observations illustrate the perception of capuchin monkeys that a stable, hard surface or rock anvil can be used as an aid to exert force on a handheld stone, that such force will result in sound and/or damage to the stone, and that the damage can be observed by sniffing or licking stones used in this manner. This behavior represents the individual causal cognition of the monkeys and zero-order ToM.

Grade 2: Cued Dyadic-Causal Cognition

This type of information processing involves at least two individuals performing a similar action. They are able to understand that the action of someone else causes an effect, because it gives the same result as the individual’s own action (Woodward 2011). Although the motor forces behind the other individual’s actions are not directly perceived, they are inferred via a mapping onto the forces involved in one’s own actions. Such understanding allows one individual to understand the difference-making agency of another, and that by imitating the actions of another they may achieve similar effects. Actions of one individual are therefore “cued” by those of another. Although it includes learning by imitation (e.g., Zentall 2004; Whiten et al. 2009; Kline 2015; Gärdenfors and Högberg 2017), it does not require either joint attention or joint intention.

The rock-pounding behavior of capuchin monkeys (e.g., Proffitt et al. 2016), mentioned above as an example of grade 1 causal cognition, also demonstrates how different types of causal cognition are scaffolded or nested within each other. Because several monkeys in the group display the pounding behavior, we may infer that they achieved cued dyadic-causal understanding and rudimentary social learning through mimicking each other’s actions (Lombard et al. 2019). The behavior reflects their belief that the same set of actions will have similar outcomes and the desire to replicate the outcomes—i.e., first-order ToM, which presumes no understanding of the mind of the other. Cued dyadic-causal understanding is also evident in the nut-cracking techno-behaviors of wild chimpanzees (e.g., Boesch 1991; Visalberghi et al. 2015), where young chimpanzees seem to understand that by mimicking expert nut crackers, they too might be able to access the nuts (Lombard et al. 2019).

Here one must distinguish between learning by emulation, where the learner observes the outcomes of the model’s actions and tries to reach the same outcome (goal oriented), and learning by imitation (Tomasello 1999), where the learner observes the sequence of the model’s actions and tries to perform the same actions (process-oriented learning). Early results (Whiten et al. 2005) indicated that chimpanzees emulate while children imitate. Later studies (Whiten et al. 2009) suggest that the situation is more complex—the apes are not confined to emulation but also imitate extensively. What is important is that emulation involves only first-order ToM (the intention to reach a goal), whereas imitation requires second-order: the imitator must understand that the model knows how to reach the result.

The causal cognition for passive hammer flake production, as described for some of the artefacts from Lomekwi 3 dating to ~ 3.3 Mya (Harmand et al. 2015; Lewis and Harmand 2016), can probably also be facilitated through the scaffolding of individual causal cognition and cued dyadic-causal cognition. Assembling nodules and anvils, however, indicates some planning capacities as well as autocuing, similar to that of some chimpanzee nut-cracking behaviors. Whereas social learning is implied by the fact that flake production became a pan-Homo techno-behavior, we do not know whether any form of intentional teaching was involved in such early passive hammer flaking. The technique is easy to imitate, and through trial and error a novice will eventually succeed. No strategic judgements about planned actions are necessary (see Stout et al. 2015).

Grade 3: Conspecific Theory of Mind

As humans, we have a highly developed ToM, that is, understanding of how our desires, intentions, and beliefs lead to different kinds of actions (Premack and Woodruff 1978; Tomasello 1999; Gärdenfors 2003, 2007). By observing and thinking about our actions and through various processes of social learning, we infer the state of mind of other humans under the hypothesis that their desires, intentions, beliefs, and subsequent actions are similar to our own. In this case, we do not perceive physically the cause of another’s actions, but use our understanding of their inner state as a hidden causal variable for their behaviors, that is, second-order ToM. This involves a detachment of perceptual similarity from causal similarities that are determined from desires, intentions, and beliefs. The mental phenomena thereby form the first class of hidden variables that we add to our perception in order to understand causal relations.

Nonhuman animals such as primates, some bird species, dogs, seals, and even goats share with us gaze following as a limited form of ToM (e.g., Emery et al. 1997; Tomasello et al. 2007; Shepherd 2010; Téglás et al. 2012). This represents the understanding that if a conspecific is looking firmly in a particular direction, there is something worthy of attention in that direction. Conspecific co-orientation through following gaze direction provides adaptive advantages regarding predator awareness, food detection, and the monitoring of social interactions (e.g., Schloegl et al. 2007). It is a behavior that develops early during human infancy (e.g., Meltzoff and Brooks 2007). This type of basic causal social cognition also presumes second-order attention of the form “I see that you see,” but not the third-order that is required for joint attention (e.g., Dennett 2009; also see Crockford et al. 2012).

A special case of conspecific ToM is self-awareness in the form of autocuing, which is self-triggered conscious retrieval, the kind of recall needed to practice a skill (Donald 2012). Self-awareness involves the ability to imagine oneself in the future and in the past. This type of thinking includes early forms of mental time travel (Suddendorf and Corballis 2007; Gärdenfors and Osvath 2010; Gamble et al. 2014), basic episodic memory (Tulving 1985; Osvath 2010), basic working memory (Coolidge and Wynn 2005), and priority scheduling, planning depth, or extended perception-and-action sequences (Haidle 2014; Lombard et al. 2019).

For the bipolar knapping approach recorded at Lomekwi 3 dating to ~ 3.3 Mya (Harmand et al. 2015; Lewis and Harmand 2016), moderate levels of self-awareness are necessary for the bimanual manipulation of objects and for assessing the correct amount of striking force (Lombard et al. 2019). Finley (2008) suggested that this knapping technique is difficult to imitate accurately, and Duke and Pargeter (2015) demonstrated that it is not possible to master skillfully without being taught by an experienced knapper. It is therefore reasonable to assume that at least non-intentional teaching in the form of facilitation, as well as a level of intentional evaluative feedback (e.g., Gärdenfors and Högberg 2017), was in play to transfer the technology among individuals or groups. Such basic forms of intentional teaching go beyond mere social learning by imitation, and implies a type of conspecific ToM during which at least some attention and intention is shared. Barrett (2012) suggested that the development of such shared attention enabled a sustained and mutual empathy between social agents in their understanding of the practical qualities of materiality.

Any form of early human social learning or teaching also feeds into current cumulative culture discourse. Thus far, authors working in the disciplines of both archaeology and primatology have suggested that limited forms of cumulative culture were present among early toolmaking hominins. For example, Whiten (2017) showed that living primates have the ability to imitate, and therefore they are able to sustain limited forms of cumulative culture. Earlier Stone Age Oldowan lithic assemblages show signs for predetermined knapping strategies and some differences in knapping “traditions” (e.g., de la Torre et al. 2003; Stout et al., 2019, 2010; Stout 2011). For example, in addition to the bipolar technology of Lomekwi 3, the early Oldowan lithics from Ledi-Geraru suggest that by ~ 2.58 Mya hominins had the ability to systematically produce smaller flakes with discrete platforms and fewer instances of percussive actions (Braun et al. 2019). This signals an increase in the hominin ability to effectively extract sharp edges from stone volumes (discrete platforms and fewer percussive marks compared to Lomekwi 3).

Some of these interpretations have been questioned by Tennie et al. (2009, 2016, 2017) who suggest that the Oldowan may represent a “latent solution” or external cause-and-effect processes, so that there is no evidence to suggest that early stone tool knapping required imitative learning. Their argument is supported by an experimental study that showed non-goal-directed knapping can produce forms that resemble products of predetermined knapping by chance alone (Moore and Perston 2016). They do admit though that the learning process can be facilitated by social contact where individuals focus their attention on the acts of others and thereby enhance the emulative process for transmitting the causal information. This interpretation is consistent with our grades 2 and 3 causal understanding as represented here, requiring rudimentary orders of ToM.

Conspecific ToM continues to develop throughout the Earlier Stone Age/Lower Palaeolithic, shifting towards limited forms of third-order intentionality by the end of this phase, not only because of their sociocultural significance beyond functionality as suggested by Cole (2019), but also because of the increasingly complex levels of intentional teaching associated with platform preparation in elaborately knapped Acheulean hand-axes (Gärdenfors and Högberg 2017). For such technology to be transferred successfully, the teacher must understand that the learner does not know how to perform the knapping and the teacher and the learner must achieve joint attention and intention to learn the knapping process. The production of pieces such as those recorded for the Konso Formation, Ethiopia, dated to ~ 850 ka associated with Homo erectus (Beyene et al. 2013), and from Boxgrove in the UK at ~ 500 ka associated with Homo heidelbergensis (Stout et al. 2014), probably required teaching by communicating abstract concepts via gestures and/or words. This implies that their makers were able to refer to non-present entities even though they might not yet have developed a full linguistic capacity (Gärdenfors and Högberg 2017).

The experimental results of Lycett et al. (2016), however, highlight the importance of imitative learning in terms of transmitting the morphological traits of artefacts in the context of a knapping tradition such as the Acheulean. For the Levallois, they argue that explicit instruction was probably involved—even without gestural or verbal communication. Mithen (1999), for example, also suggests that the spatiotemporal duration of the Acheulean implies an imitative learning system. In his model, however, learning knappers not only copied an artefact but also elements of the techniques and behavioral gestures of other knappers, which may ultimately result in communication through gesturing. In terms of biface production, Putt et al.’s (2014) experiments indicate no strong effect for verbal versus nonverbal communication, so that learning from gestures in combination with imitation and a perception of form is sufficient. Others (e.g., Morgan et al. 2015) have shown that the transmission of knapping skills improved with teaching, and particularly with language, but not with imitation or emulation. These results are interpreted as indicating that hominin reliance on stone toolmaking is intimately linked with selection for teaching and language, and that early low-fidelity social transmission in the forms of emulation and imitation may explain the long stasis in knapping traditions associated with the Oldowan (Morgan et al. 2015). The appearance of Acheulean hand axes may therefore signal the existence of a protolanguage (maybe based on gestures) or the origins of teaching in a long and gradual evolutionary process (Morgan et al. 2015; Gärdenfors and Högberg 2017).

A different take on ToM in association with Acheulean hand axes is presented by Wynn and Berlant (2019), who suggest that biface-producing hominins “used material displays in atypical situations, which in turn suggests that the knappers worked for the appraisal of some other individual or individuals […] in unusual circumstances.[…] This has implications for theory of mind (ToM). The knapper of one of these exceptional hand axes considered not just his or her own point of view but also what at least one other individual could see” (also see Wynn 2000). They acknowledge that it is not possible to know the specific circumstances for such consideration, but go on to argue that knappers who learned the Acheulean biface tool concept since infancy within a tool-oriented technology did not require for it to have additional meaning, apart from tools being an available expression for aesthetic perception, perhaps initially for personal pleasure, but later also to impress or inform someone else within a social context (also see Shipton 2010; Cole 2015 for arguments that ToM was essential to hand axe production).

Grade 4: Detached Dyadic-Causal Cognition

This type of causal thinking allows us to perceive someone else’s or something’s presence detached through time and across space. Such cognition could be achieved through the understanding that the traces they left in the past means they were in a space we observe in the present, or an object associated with an activity in the past is understood to represent a similar activity in the future. For example, finding ash in a fireplace, but no other signs of burning, leads to the inference that someone made a fire there. Such thinking depends on the capacity to entertain two mental representations at the same time, that is, the current perceptual state of seeing a trace together with the imagination of who caused it in the past. This form of thinking also involves hidden variables—the observer does not perceive the person who made the fire, but represents him/her mentally as a cause for the perceived effect.

Detached dyadic-causal cognition seems to be the grade where humans start separating from other species. Being able to reason from inanimate effects to non-present causes seems to be unique to humans today even though some observations suggest that great apes are at the brink of such cognition. For example, Völter and Call (2014) found that apes in captivity can make use of a trail left by a leaking yoghurt cup placed out of their sight, to locate the cup. On the other hand, they did not use the trail when it did not match the type of food that is displaced. In this example the apes reacted directly on scent and taste cues. Cheney and Seyfarth’s (1990) experiment with vervet monkeys, however, shows that when catching sight of a python or a leopard they emit warning cries, but do not react to detached visual signs (such as the track of a snake, or the carcass of leopard prey in a tree) of these dangers alone. Thus, terrestrial animals are dependent on direct physical effects such as scent, taste, sound, and direct sight cues, but it appears that the aptitude for causal understanding based on inanimate or indirect visual cues developed only in the hominin clade (see Calvin and Bickerton 2000; Shaw-Williams 2014; Stuart-Fox 2014). We further speculate that the difference in detached dyadic-causal understanding between extant humans and nonhuman animals is that animals understand causation only in terms of direct agency whilst humans are able to reason about causes also via force transmission across space (action at a distance or out of sight) and through time (detached representations of past experiences and future possibilities). This is another example of the detachment of perceptual similarity from causal similarities.

The 3.6 Mya tracks from Laetoli in northern Tanzania (Leakey and Harris 1987), are widely accepted to be that of australopithecines. The double trail of larger footprints has been interpreted to represent two individuals, one walking in front of the other, with the smaller follower stepping intentionally and exactly into the tracks of the larger one (White and Suwa 1987; Agnew and Demas 1998). We have suggested that if this interpretation is correct, it represents the earliest known indication of basic detached dyadic-causal understanding through “tracking” in the hominin lineage (Lombard and Gärdenfors 2017). Because even if the leader was in view of the follower, the follower had to focus on the leader’s footprints instead of on the person to be able to step perfectly into the leader’s prints. If there was no detachment from the leader, the footprints would have simply followed the same direction, but not be so carefully placed within each other. Similar to Shaw-Williams (2014, 2017), our model suggests that early stages of tracking behavior evolved in the context of conspecific social behaviors. An increasing awareness of the rich body of information that can be gleaned from traces left by other creatures was then applied to improve chances of survival, for example, to avoid predators or enemies, and was subsequently extended into subsistence behaviors such as the scavenging and hunting of animals (also see Stuart-Fox 2014). Both the social and subsistence scenarios have strong selective advantages that would have encouraged ever-increasing levels of complexity and flexibility in our tracking behaviors and associated causal understanding.

Knapped stone tools indicate the use of a tool (a hammerstone) to make another tool (a flake). The secondary tool is an effect that becomes a cause in its later use within the context of a modular system (Lombard et al. 2019). Purposely knapped flakes with confirmed subsequent use are therefore good indicators of detachment in causal understanding, because the use of the hammerstone to knap is not linked directly to, for example, butchering a carcass for food. The detachment applies even if the butchery follows directly after the knapping, because the hammerstone is never directly involved in the subsequent flake application. Such modular techno-behavior is different from primate tool behaviors such as rock-pounding capuchins or nut-cracking chimpanzees who do not use tools to make tools (see discussions in Haidle et al. 2015; Lombard et al. 2019). Currently, the stone tools and associated cut-marked bones at Gona, Ethiopia, represent an early instance of direct evidence for hominin meat processing at ~ 2.6 Mya (Dominguez-Rodrigo and Pickering 2017). The detachment of cause from effect becomes more distinct in cases where there may be evidence of stone tools being transported away from knapping sites to butchering locations or curated in-between butchering events (Blumenschine et al. 2009; Zack et al. 2013). In such cases, the flakes could be seen as the silent reminders of absent carcasses butchered in the past and of future butchering events. One of the oldest examples of such behavior comes from the Middle Awash Valley, Ethiopia, dated to ~ 2.5 Mya (De Heinzelin et al. 1999; also see McPherron et al. 2010 for a possible older case, and Brantingham 2003; Holdaway and Douglass 2012; Haas and Kuhn 2019 for aspects of artefact transportation through time).

Splitting Woodward’s “agent causal learner” category into three different grades of causal understanding enabled us previously to conclude that nonhuman animals manage grade 2 (cued dyadic-causal cognition), do it less well than humans on grade 3 (conspecific ToM), and are very limited when it comes to grade 4 causal understanding (detached dyadic-causal cognition) (Lombard and Gärdenfors 2017).

Grade 5: Non-Conspecific Theory of Mind

This type of causal reasoning allows for the dyadic-causal understanding of the actions and intentions of species other than our own, although their motor actions and cognitive processes are different from ours. In terms of human evolution, it denotes the hominin ability to understand aspects of nonhuman animal mentality. The difference between conspecific ToM and non-conspecific ToM is a matter of degree rather than kind. Again, the mental states that we assign nonhuman animals function as hidden variables in our causal reasoning.

Throughout the Earlier Stone Age/Early Palaeolithic, opportunistic hominin scavenging probably matured into well-developed strategic scavenging, possibly assisted by object throwing to ward off other scavengers or predators, perhaps even killing naturally trapped or weakened animals through stoning or clubbing (Brain 1981; Blumenschine et al. 1987; Lieberman et al. 2009). Such scavenging or rudimentary hunting techniques would have benefitted from the cooperation and competitive strategies developed as a result of conspecific ToM, as well as the associated tracking skills. Extending to grade 5 causal cognition, the benefits, challenges, and dangers experienced during carcass scavenging would have provided the selective pressures for our ancestors to become proficient in the ToM of non-conspecifics too (Shaw-Williams 2014; Lombard and Gärdenfors 2017).

Between about 1 Mya and 500–300 ka hominin meat-getting strategies developed from advanced scavenging strategies into hunting with rudimentary spears. Early hunting was possibly practiced in ambush situations that would have placed prey animals at a disadvantage to the hunters (Liebenberg 2006; Lieberman et al. 2009). Bunn and Gurtov (2014) speculated that such hunting could have been practiced by early Homo with short-distance, wooden spears as far back as 1.8 Mya in the Olduvai Gorge, Tanzania. Ambush hunting has also been suggested for more recent contexts from Olorgesailie, Kenya, spanning ~ 1.2–< 0.5 Mya (e.g., Kübler et al. 2015). Here the authors are more cautious about weapon inference, rather building their case around the features of the landscape and associated hominin and animal behaviors—explaining that the exploitation of game at the site resulted from the predictable patterns of animal movement conducive of ambush hunting. They conclude that: “Homo exploited this part of the Kenya rift not because it was generally ‘good’ for herbivores, but because it was generally ‘bad’, and constrained their movements to predictable pathways which allowed them to be exploited by early hunters” (Kübler et al. 2015, p. 6). A reanalysis of remains from Elandsfontein, South Africa, dating to ~ 600 ka, also indicate that Homo heidelbergensis were capable ambush hunters of large ungulate prey (Bunn 2019). These are examples of grade 5 causal reasoning, and indicate the evolution of non-conspecific ToM in hominins during the African Earlier Stone Age before their split with the Neanderthal population.

Stone tool assemblages associated with ambush hunting are mostly of late Acheulean character or transitional into the Middle Stone Age/Middle Palaeolithic. The notion that Acheulean hand axes were used as throwing weapons has a long history (e.g., O'Brien 1981; Calvin 2002; Samson 2006), but remains difficult to confirm (McCall and Whittaker 2007). Rare use-trace evidence rather supports their use as cutting tools (Rots 2009). Thus, non-conspecific ToM can only be associated with these stone tools when they are found in direct association with additional evidence for ambush hunting.

The first firmly documented record of close-encounter ambush hunting of dangerous animals with wooden spears is from the European Middle Palaeolithic at Schöningen, Germany, dated to ~ 400–300 ka (e.g., Thieme 2005; Voormolen 2008). A wooden spear of similar age was also found at Clacton in the United Kingdom (Allington-Jones 2015). These artefacts are generally associated with Neanderthals, whom many researchers see as skilled ambush hunters. At Schöningen, the deposition of the Middle Pleistocene sediments within an Elsterian tunnel valley explains the unique preservation of the sedimentary succession of the site (Lang et al. 2012), and the interglacial lake supported a wide array of flora and fauna serving as prime ambush location for the hominin hunters (Turner et al. 2018). The spears were found with horse remains, and Voormolen (2008) argues that even though wooden spears could have been cast from a distance to wound, a stalk-and-ambush approach would have been necessary to kill them. Ambushing at Schöningen is further indicated by the presence of multiple horse individuals including foals, which are normally only found when animal families are ambushed (Voormolen 2008). This hypothesis was experimentally tested for the Neumark-Nord 1 paleo-basin site in Germany, where it was established that the perforations on fellow deer remains were consistent with close-quarter wooden-spear ambush hunting (Gaudzinski-Windheuser et al. 2018).

In a broader perspective, Berger and Trinkaus (1995) suggested that instances of Neanderthal trauma reflected close-quarter ambush hunting with heavy thrusting spears, but later extended the interpretation to include longer-range spear hunting (Trinkaus 2012). Based on paleo-ecological evidence for a woodland environment in combination with the Neanderthal muscular power and sprint capacity, Stewart et al. (2019) came to the conclusion that Neanderthals were best adapted for encounter and ambush (rather than pursuit) hunting. White et al. (2016) describe how they were also accomplished ethologists, mindful of the behavioral eccentricities of different prey species, and selecting their hunting strategies accordingly. Understanding how the behavior of different animals varies with circumstances requires at least some ascribing of intentions to the animals, such as which paths they may follow when thirsty or hungry. Thus, it is our current interpretation that Neanderthal populations, such as those from Schöningen, have reached at least grade 5 causal cognition.

Grade 6: Inanimate Causal Cognition

This type of causal understanding allows for the attribution of causal roles to inanimate objects. To borrow an example from Tomasello and Call (1997), an individual who has reached this level observes the wind blowing on a tree so that fruit fall to the ground, can mentally represent the force the wind is exerting, and is able to conclude that if they act on the tree’s branches with similar force, the fruit will also fall. Unlike the previous types of causal cognition, there is no animate agent that performs an action. Instead, causation is seen as force transmission and, in this sense, as an extension of agency (Povinelli 2000; Wolff 2007; Gärdenfors and Warglien 2012). Such understanding could be a candidate for the new representational system suggested by Povinelli and Bering (2002) in which the observable world and what happened in it could be reinterpreted with hidden meaning, allowing humans to reflect on unobservable causes. Again, the abstract forces constitute another form of detachment of perceptual similarity from causal similarities. With grade 6 causal cognition, we thus see a further extension of the hidden variables involved, from the ToM components that function as causal social forces in humans, to those in nonhuman animals, and now to more abstract forces exerted by inanimate entities.

We have previously suggested that effective tool use over a distance, such as throwing spears forcefully and accurately, could represent an evolutionary selection mechanism behind the human capacity for inanimate causal reasoning (Lombard and Gärdenfors 2017). Also, the ability to infer the forces of twine or a sticky substance such as tree gum as binding agents for the construction of composite tools (e.g., stone-tipped spears), is facilitated through inanimate causal cognition (Gärdenfors and Lombard 2018, 2020). These techno-behaviors represent an understanding of abstract forces as hidden variables, and thereby the role of forces as causes.

Abstract thinking is the ability to recognize regularities in diversity (Reuland 2010). Cole (2019) writes that the production of composite tools reflects an ability for abstract thought, but whereas he interprets such abstraction only in terms of “cultural signaling,” and therefore indicating third- to fourth-order intentionality, we propose that there is more to abstraction—also in technical and cognitive terms. For example, Zilhão (2007) suggested that the Königsaue pitch associated with Neanderthals at ~ 45 ka could not have been developed, transmitted, and maintained in the absence of abstract thinking and language (also see Niekus et al. 2019; but see Schmidt et al. 2019 for an alternative interpretation). Homo sapiens in southern Africa used complex adhesive recipes for the hafting of stone tools from at least 72 ka (Lombard 2006), and Wadley (2010, 2013) has shown that their manufacturing processes required multitasking and thinking in abstract terms about the qualities and necessary quantities of the ingredients that were manipulated. Whereas some have argued that ancient synthetic substances merely represent customary recipes, followed by unreflective tradition (Boyd 2017; Henrich 2017), Wadley’s (2010) experimental work on adhesive production, and recent ethnographic observations about adhesive and poison production amongst San hunters of Namibia (Wadley et al. 2015), reveal a different, real-life perspective.

For example, the collecting of all the different ingredients happens over an extended period, and the manufacturing of the compounds require carefully monitored heat treatment. Such treatment represents a range of different techniques and requires high levels of attention to monitor time exposed to heat and changes to the compounds. During each production session, continuous adjustments are made to the amounts of each ingredient added, so that ultimately the right consistency is achieved, depending on an array of contextual conditions (Wadley 2010; Wadley et al. 2015). What is more, although a certain ingredient may be a constant amongst some groups, the recipes are not always the same, and different hunters prepare similar sets of ingredients differently (Wadley et al. 2015).

Thus, whilst cognizant of tradition, symbolism, and variation through time, these studies demonstrate that such techno-behaviors are far from being mechanistic, thoughtless processes that can be explained through, for example, expert cognition (Wynn et al. 2017). Instead, they imply relatively long attention spans, response inhibition, the capacity for novel, sustained multilevel operations, the use of abstract thought, and the ability to plan the assembly of ingredients as well as complex action sequences. In this context, Osiurak et al. (2020) recently also emphasized that humans are not just manipulators, but that we have evolved to solve and create physical problems, and that even though using tools may appear routine, most techno-behaviors are dependent on our ability to reason about the physical world. Evidence for composite technologies involving adhesives are therefore good indicators of enhanced working memory and of inanimate causal cognition. We therefore see these techno-behaviors as examples of grade 6 causal cognition and higher order ToM.

Although some decades ago it was debated whether Levallois points of the Eurasian Mousterian were hafted as spear points (e.g., Holdaway 1989; Shea 1990), isolated finds with such points in faunal remains provided indication of their hunting function in South Africa (Milo 1998) and the Levant (Boëda et al. 1999). Since 2004, Levallois-type or other prepared-core stone points and blade products in the southern African Middle Stone Age context have been consistently associated with early stone-tipped spear hunting and traces of hafting (Lombard 2004, 2005, 2007; Lenoir and Villa 2006; Lombard and Clark 2008; Villa and Soriano 2010; Wilkins et al. 2012; Wilkins and Schoville 2016); later on similar results were published for east Africa (e.g. Sahle et al. 2013) and Eurasia (e.g. Cârciumaru et al. 2012; Rots 2013; Goval et al. 2016).

The Levallois technique indicates a switch to the systematic predetermination of flake removal in terms of size and shape (Van Peer 1992; Boëda 1994), in the context of a step wise, goal-driven knapping process (e.g., Brantingham 2010). Wynn et al. (2017) suggested that the number of routines and length of procedural chains required in prepared-core technologies would have required an increase in long-term procedural memory capacity well beyond the range of preceding stone tool technologies. The Levallois technique also requires deeper rapid problem assessment, and each problem requires an immediate solution, yet the knapper has to hold in mind what he or she ultimately intends—a goal or subgoal, such as platform formation, that might still be several steps removed from the current situation—by following the process. By refitting and following the work sequence of Marjorie’s core, for example, Schlanger (1996) demonstrated that the knapper did not simply perform a preset series of actions, nor did they respond instinctively to external constraints. Instead, he found that the knapper’s course of action was a structured and goal-oriented interplay between mental and material engagement. This was apparent in the way that the knapper attended to consequences that current knapping decisions had for future phases of reduction. Such knapping represents an increase in the depth of problem-solving capacity, which requires not only increases in size and number of informational chunks, but also an increase in working memory capacity, because there are more hidden variables to hold in attention to maximize a solution further along the procedure (Wynn et al. 2017).

Prepared-core reduction also suggests an important role for semantic long-term memory. In the Marjorie’s core reduction sequence, the knapper followed a kind of rule that dictates that after successfully striking off a large flake, the core must be rotated 90° so that a current lateral convexity becomes the distal convexity for the next phase (Wynn and Coolidge 2010). Such a conventional rule almost certainly existed in the mind of the knapper as a chunk of semantic information, in terms of inanimate causal cognition. However, it is not sufficient to understand the rock, but the required action must also be mastered. Lycett et al. (2016) suggested that Levallois technology required active teaching, which involves understanding the attention and intention of the teacher. The teaching probably involved verbal instruction of abstract concepts (e.g., Högberg et al. 2015; Gärdenfors and Högberg 2017), which would be consistent with Cole’s (2019) interpretation of third- to fourth-order ToM for prepared-core technologies. In terms of inanimate causal cognition, Levallois knappers understood how a core would “behave” in the future, providing it was set up appropriately.

Grade 7: Causal Network Cognition

We have suggested that the most complex grade of causal cognition is the understanding of how domain-specific causal node sets connect or link to inter-domain causal networks or causal grammars (Tenenbaum and Niyogi 2003; Lombard and Gärdenfors 2017), the most advanced form of causal network thinking being that of “scientific” or hypothetical reasoning. Such understanding allows speculative thinking about how the world works, either physically or socially. Importantly, it also allows for seamless mapping between the physical and social domains. Thus, during this grade of causal understanding, we are able to integrate aspects of all the previous grades of causal understanding, mapping them onto each other into never-ending patterns of recursion and complexity—including higher-order ToM. Gopnik et al. (2004) describe this kind of thinking in terms of causal Bayes nets that provide humans with the type of reasoning necessary for inductive inference and discovery (theory formation). We do this by perceiving patterns of likelihood between a range of possible events—by thinking through, imagining, or examining (through experimentation) the consequences of interventions by combining multiple types of hidden variables and observed evidence (Gopnik and Schulz 2007).

Causal network thinking is a critical development, because it allows us to gain new knowledge or insight from what we already know through either individual discovery or socially transmitted knowledge about hidden variables. It means that not every individual needs to have the full causal understanding of a complex system (Boyd et al. 2011). Instead, causal network thinking allows for the division of conceptual structures (e.g., the parts that an individual understands causally), and their subsequent rearrangement into new contexts, so that novel structures can be conceptualized in their place—learning from reasoning (Barbey and Wolff 2007). Such a transformation of units in causal understanding, combined with an individual’s unique set of experiences and memories, may lead to new conclusions about how the world works and/or to technical improvement—occasionally in gigantic leaps of invention, but mostly in incremental steps of innovation (Högberg and Lombard 2020).

We have previously suggested that speculative tracking as described by Liebenberg (1990, 2013) in the context of Kalahari bow hunters, demonstrates the ability to draw together domain-specific nodes into inter-domain networks of abstract causal understanding. For example, intimate knowledge of kin, non-kin, and animal behavior and their inanimate signs, are incorporated with multifaceted knowledge about the landscape (geographic features, water sources, vegetation, etc.), abstract causal understanding, and the mental maps, thought processes, and social contexts of the tracker, and the tracker's technical understanding of how best to hunt with poisoned arrows. Speculative tracking therefore demonstrates how humans create meaningful causal networks of hidden variables to deal with complex, dynamic events. The bow hunters create multiple and continuously adapted imaginative reconstructions to interpret the actions and states of the animals they intend hunting. Based on these reconstructions they create novel predictions in endlessly unique and changing circumstances (Liebenberg 1990). This allows them to plan ahead, no longer having to rely only on following visual cues, because some of the tracking now happens abstractly, in the mind of the hunter in a continuous cognitive process of “conjecture and refutation”—i.e., scientific reasoning (Liebenberg 1990).

Such thinking is similar to Mithen’s (1994, 1996) concept of fully integrated domains of intuitive intelligence, which include linguistic, social, technical, and natural history intelligence. Advanced levels of cognitive reasoning, similar to how we think today, can only be reached once humans are able to generalize abstract knowledge from one domain to others into creative, innovative, and flexible solutions. Mithen saw evidence for such advanced levels in cognitive fluidity from about 60,000 years ago in the archaeological record (also see Haidle 2010 for further discussion), which is roughly simultaneous to some of the earliest current archaeological evidence for bow hunting in southern Africa (Lombard and Phillipson 2010; Lombard 2011; Backwell et al. 2018).

From a technological perspective bow-and-arrow technology may also serve as one of potentially several proxies for high-level cognition as a result of its modularity and associated extended thought-and-action sequence or problem–solution distance (Lombard and Haidle 2012; Lombard 2016); other examples would include using needle-and-thread, knitting or weaving technology, and harpoons. These technologies represent symbiotic systems that rely on the simultaneous, focused bimanual manipulation of multiple technical components. Technological symbiosis (where neither part of the system is effective without the simultaneous manipulation of the other) enables a level of complexity and flexibility that is not possible with non-symbiotic, simple, or composite technologies (Haidle et al. 2015). Once the concept of symbiotic technologies is understood, different elements and series of elements can be adapted and grouped in multiple ways, and in sequences of various length and complexity, to achieve diverse results. For example, bows can be:

  • Grouped with drill bits (which are sometimes hunting arrows), weights, and handling pieces to use as bow drills.

  • Used with palm protectors, fire sticks, base-wood, and tinder as fire drills.

  • Used as simple, violin-like instruments, stroked with a stick or arrow; and applied to the mouth cavity or a gourd as a sound box, as is done by Kalahari hunter–gatherers in southern Africa.

  • Plucked (non-symbiotically) with the fingers like a one-string guitar, also demonstrated by the Kalahari San (Lombard 2016).

Thus, we found that a key evolutionary advantage of symbiotic technologies, such as a bow-and-arrow set, enables almost endless combinations of single elements or chains of operations, in a variety of ways, to reach single or multiple goals. Such technologies offer instantaneous and spontaneous flexibility to effectively handle any one possibility or situation out of a suite of diverse foreseen (and unforeseen) scenarios (Lombard and Haidle 2012), allowing for a range of cognitive and cultural complexity and variability associated with grade 7 causal cognition.

The earliest known evidence for bow hunting, with arrow tips made from backed bladelet pieces (e.g., Lombard 2011; also see Cole 2019 on intentionality of Mode 4 bladelet production), represents an archaeological example of causal network cognition in southern Africa at more than 64 ka (Lombard and Gärdenfors 2017). In bow hunting behavior we see how the causal understanding of the advantages of hunting with a sharp projectile is coupled with the abstract causal understanding that the power of stored mechanical energy can overcome physical challenges, such as the limited reach of a spear, to brace subsistence or conflict strategies. Poisoned bone arrow tips were in use in southern Africa at least since the Later Stone Age starting ~ 40 ka (d’Errico et al. 2012; Robbins et al. 2012), and it now seems that this tradition could have started more than 60 ka (Bradfield et al. 2020; Lombard 2020).

In our most recent exploration of the link between causal cognition and technical force dynamics (Gärdenfors and Lombard 2020), we suggest that evidence of poisoned arrow use reflects a complex form of causal reasoning about a force operating over an extended period, sometimes across a long distance, and often out of sight, and that it adds the chemical domain to the physical and technical principles represented in the bow-hunting system (also see Bradfield et al. 2015). In the case of such techno-behaviors, it is no longer the knapping of the stone tools themselves that inform on cognition, but the ways in which they were used. What is more, ultimately humans understood on abstract levels that the energy of a strung bow can be harnessed in multiple ways as listed above. Taken together, examples of such technologies amount to a causal grammar or concepts of fluid intelligence (Mithen 1996; Tenenbaum and Niyogi 2003; Lombard and Gärdenfors 2017). Bow hunting also demonstrates other types of complex cognition, for example, episodic or autobiographical memory that allows the reactivation of all the technical modules and their possible applications over extensive temporal and spatial gaps (Coolidge et al. 2016; Lombard 2019).

Concluding Discussion

We started our article by highlighting that the aim of evolutionary cognitive archaeology is to identify, reconstruct, interpret, and explain development and change in human cognition through time, based on the material culture they left behind. Here we provided a theoretical framework (the seven-grade causal cognition model) that allows predictions about how material culture may manifest as a result of the development of certain cognitive ranges, with potential lines of evidence and examples summarized in Table 1. It involves a successive addition of “hidden variables” related to ToM and to abstract “forces” that lead to an increasing decoupling of causal similarities from perceptual similarities, and is an inclusive framework for human cognitive evolution that is different from some other models (e.g., working memory), because it does not start from the perspective of the “modern mind,” instead attempting to help explain its development through time, although not necessarily in a unilinear fashion.

Table 1 The seven-grade causal cognition model with potential lines of evidence and some examples in the archaeological and stone tool records

Our first thesis proposes that ToM is an integral social element of causal cognition, and we provided a new analysis of ToM and the orders it represents to make our argument explicit. Boyd et al. (2011) created a divide between the “cognitive niche hypothesis” (e.g., Barrett et al. 2007; Pinker 2010) and the “cultural niche hypothesis.” They define the cognitive niche in terms of evolutionary psychology as studied by Cosmides and Tooby (2001) and their followers. In our view, this delimitation amounts to a very restricted account of cognition. In line with the divide, Derex et al. (2019) argue that causal understanding is “unnecessary” for culturally evolving technology (but see McCormack et al. [eds] 2011 on how causal cognition underpins human tool use today). We find this divide artificial. In our opinion, any cultural niche also contains causal cognitive elements (see also Osiurak and Reynaud 2020), both technical and social. For example, the form of technology that we find along hominin lines cannot be maintained without intentional teaching (Gärdenfors and Högberg 2017)—not just by imitating others—and thus it is part of a cultural niche. At the same time, we have argued that there is a coevolution between these forms of technology and more and more advanced forms of causal cognition (Gärdenfors and Lombard 2020). In brief, the cultural niche cannot be maintained without involving the cognitive niche, and vice versa. By understanding ToM as the social cognition part of the causal cognition system, as presented in this article, it becomes evident how the niches intertwine.

Exactly because humans are exceptionally good at causal cognition, particularly so in social causal cognition, Barrett et al. (2010, p. 523) argue that we invest considerable amounts of “mental resources” in trying to understand why others do things in certain ways, continuously assessing their skills and underlying motivational structures. Causal cognition is therefore a key factor in how we learn from others—not only about social behavior and motivation, but also about technical skill, know-how and the motivational structures behind the use of technologies. Bender and Beller (2019, p. 923) perhaps say it best: “Causal cognition emerges early in development and confers an important advantage for survival. […] The multiple ways in which both content and the key mechanisms of cultural transmission generate cultural diversity suggest that causal cognition in humans is not only colored by their specific cultural background but also shaped more fundamentally by the very fact that humans are a cultural species.”

With our second thesis, we suggest that the more advanced causal cognition is, the more it is dependent on ToM. Our seven grades of causal cognition extend Woodward’s (2011) division between egocentric causal learner, agent causal learners, and observation/action causal learners. Especially, our grades 2–5 form an expansion of his agent causal learners. The main reason for making these distinctions is that they involve increasing use of detached representations and of ToM. We have shown that from grade 3 causal cognition onwards different forms of ToM are required for the kind of reasoning represented. Being able to share mental states by using words leads to a better understanding of mental states in other people and how they understand the world both technically and socially. The development of a language would have strengthened this tendency (Bender and Beller 2019, pp. 927–928), to such an extent that causal relationships can be mapped onto linguistic constructions in various ways (Gärdenfors 2020).

The third thesis is that the evolution of causal cognition depends more and more on mental representations of hidden variables. Again, the seven-grade model provides support for this thesis. The first hidden variables are the mental entities—relating to emotion, attention, desire, intention, and belief—that are involved in ToM. These can be seen as mental “forces” that cause the agent to behave in a certain way. Then other forces—e.g., physical and chemical in the form of hafting bindings and adhesives—are added as hidden variables to the causal reasoning on grade 6. The final seventh grade involves reasoning with a network of hidden variables, and is as far as we know, unique to Homo sapiens. Today, causal network cognition is a panhuman trait in that all living human populations are able to think in this manner, despite possible variation in individual cognition (e.g., Mistry et al. 2018).

In brief, our analysis shows that the seven-grade model of causal cognition can function as a productive, inclusive theoretical tool for evolutionary cognitive archaeology. It allows for a seamless integration between aspects of technical and social cognition—both being central to the Homo lineage and almost certainly part of a long and complex coevolutionary feedback loop.