1 Introduction

Computers, cars, and robots are merely inanimate objects. However, people assign human attributes to these objects. Objects can be perceived as having beliefs, consciousness and intentions [21, 47, 53, 54, 86]. This behavior is called anthropomorphizing. Intriguingly, we can exploit anthropomorphizing to enhance human–robot interaction in general [5, 35, 40] and robot motion planning in particular [10, 11, 13]. This is achieved by adding an additional attribute to the robot: human likeness.

We aim to devise a robot motion planner that generates human-like motions. Additionally, the motion planner should address the challenges of a populated, yet not crowded, environment. However, enabling robots to behave thoroughly humanly is one of the most demanding goals of artificial intelligence. Tackling this challenge is becoming more pressing because of the remarkable strides in research. First, computer graphics, display resolutions and virtual reality techniques have considerably evolved. Soon, a realistic human-like behavior of virtual agents will be the bottleneck for a real-life experience in computer games. Computer animations or virtual teaching exercises also depend on a human-like navigation of their characters. Moreover, artificial agents are starting to share their workspaces with humans: robots navigate autonomously through pedestrian areas (see Fig. 1) or highways [9, 82, 87, 89]; they guide people at museums or fairs [64]; and they even have physical contact with humans and work together with them on an assembly or manipulation task [19, 22] or assist the elderly [23]. All these examples have in common that their performance could be improved (e.g., readability, trustworthiness, fault tolerance, and work pace) if the navigation is human-like [10, 11, 13]. This is because a human may perceive an agent as being more mindful if it appears similar to oneself [21, 35, 47, 53, 54]. Then, it is treated as an entity that deserves respect and that possesses apparent competence [21, 35, 86]. If this situation occurs, the acceptance and collaboration between humans and robots can be enhanced [35, 40, 86]. Remarkably, two aspects appear to be particularly important for influencing the scale of anthropomorphism: similarity in appearance and similarity in motion [21]. This paper focuses on generating the latter: human-like motions.

Fig. 1
figure 1

Robot motion planning in a populated environment [8, 87]

In [81], we showed that human decision making during navigation can be approximated with Nash’s equilibrium from game theory. Building upon this result, we devise a motion planner that reproduces human-like motion behavior by using game theory. Game theory extends optimal control theory to a decentralized multi-agent decision problem [3, p.3]. It models the decision-making process of individuals rather than merely reactive behavior. Its major advantage is that it incorporates reasoning about possible actions of others and interdependencies, which is called interaction awareness [81]. Interaction awareness is a key factor in human-like navigation and leads to conditionally cooperative behavior and mutual avoidance maneuvers. During navigation, humans reason about the consequences of their possible actions on others and expect similar anticipation from everyone else [60, 63, 83]. However, many human-like motion planners neglect interaction awareness, even though it is crucial. Ignoring interdependencies leads to inaccurate motion prediction and results in detours, stop and go motions, or even in collisions. Trautman et al. [78] argue that these problems can only be resolved by anticipating the human mutual collision avoidance. Hence, our main focus is to incorporate interaction awareness into our motion planner to generate human-like behavior.

Another focus of this work is to properly evaluate the human likeness of our planner. A popular method to assess human likeness is to define a set of social rules and then determine the degree of human likeness based on how accurately the motion planner adheres to these rules. Another approach is to visually compare the calculated trajectories to human solutions. However, both approaches only partially evaluate human likeness because they make assumptions about which behavior is human-like. This paper presents an evaluation that is applicable without characterizing human likeness itself and that is based on a variation of the Turing test. Therefore, two studies were conducted in which human volunteers rated the human likeness of motions. The motions were either based on our motion planner, on state-of-the-art motion planners, or on real human motions. In the first study—a questionnaire based on simulated videos—the volunteers acted as passive observers. In the second study, the participants could interact with an agent within virtual reality.

In summary, this paper addresses two major challenges within the field of motion planning: enabling robots to move in a human-like way and to navigate fluently within a dynamic environment by considering interaction awareness. Game theoretic tools are used for these challenges, and two standalone studies are presented that extensively evaluate the approach. Note that due to our Turing test setup, evaluating the presented motion planner on a real robotic platform falls outside the scope of this paper. Nevertheless, the presented algorithm would be suitable for wheeled robots, as shown in Fig. 1.

The remainder of this paper is organized as follows. Sect. 2 surveys the work related to human-like motion planning and game theory. Section 3 defines the problem of human-like motion planning. Section 4 explains the game theoretic background that is necessary for the implementation. The two experimental setups, their corresponding evaluation methods, and their results are discussed in Sects. 5 and  6.

2 Related Work

The related work for human-like navigation combines methods from psychology, robot motion planning and mathematics. The following section is structured in exactly this sequence: first, an overview of psychological studies surveys the importance of motions for the occurrence of anthropomorphism; then, human-like motion planning is discussed—a mixture of psychology and traditional motion planning; and the section concludes with applications of game theory for motion planning and a discussion about to which extent our approach stands out when compared to the related work.

Several psychological studies target anthropomorphism in combination with motions. A pilot study from Heider and Simmel [26] showed that humans ascribe intentions to moving shapes, such as circles and triangles, if their movements resembles social interactions. This was confirmed in [1, 12, 32]. Next to intentions, motions also convey emotions: humans read emotions from the gait of humanoid robots [17], from a Roomba household robot [72], and even from a simplistic moving stick [25]. A survey of Karg et al. [33] goes deeper into how to reproduce movements that express emotions. Epley et al. [21] explain in which situations humans are likely to anthropomorphize and note that—next to a human-like appearance—similarity in motion is of particular importance. This result is consistent with Morewedge et al. [54]. They state that humans anthropomorphize agents if these agents move at speeds similar to human walking speeds. These findings substantiate that a robot that moves in a human-like manner is more likely to be anthropomorphized. This in turn raises its acceptance and enhances performance within human–robot interactions [35, 86]. This is one of the reasons why researchers aim to generate human-like motions for artificial agents.

Within in the field of robotics, human-like motion planning is often used together with attributes such as socially aware, human aware, human inspired, or socially compliant [39, 67]. However, in our understanding, the term human likeness differs from these attributes, although they may share common features. This work builds upon the definition of human likeness, as explained later in Sect. 3. Thus, this section mainly concentrates on works that specifically mention the term human-like motion. Importantly, we focus on how these works evaluate human likeness.

The most common method to evaluate human likeness is to first define a set of problems, rules, or measurements that are considered to address human likeness. Then, it is determined whether the motion planner fulfills the requirements. Kirby et al. [37] generated human-like motions by modeling social conventions—such as preferred avoidance on the right side—as constraints of an optimization problem. They evaluated their approach by counting how often the social rule of passing on the appropriate side was fulfilled in simulated scenarios. Khambhaita and Alami [34] analyzed whether their approach results in a mutual avoidance with an unequal amount of shared effort by plotting the trajectories recorded during different avoidance scenarios. They implemented the social convention that the robot takes “most of the load” during the avoidance by combining a time elastic band approach with graph optimization that can manage kinodynamic and social constraints. Interestingly, they emphasized the need to view human navigation as a cooperative activity. Müller et al. [56] presented a robot that moves with a group by following those persons that move toward the goal of the robot. They assessed whether the robot behaved according to a social norm. Moreover, they focused on smooth robot motions, which are frequently considered as an attribute of human likeness. For example, Best et al. [6] analyzed the smoothness of trajectories calculated by their crowd simulation algorithm that is based on density-dependent filters. Additionally, they counted the number of collisions. Similar measures were used by Pradeep et al. [65]. In simulations, they evaluated the number of collisions, path irregularity, and the average ‘safety threat’ (a value dependent on the relative position of the nearest object). In contrast, Shiomi et al. [73] concentrated on path irregularities caused by the robot; hence, they evaluated whether the robot caused sudden motions for the surrounding pedestrians. They developed a robot that navigates within a shopping mall. This robot relies on an extended social force model that includes the time to collision as a parameter to compute repulsion fields around agents [90].

Note that the mentioned evaluation techniques merely assess predefined assumptions about human likeness. Another approach is relying on human discrimination: the assessment is made by or against humans through observations or questionnaires [28]. Thus, Best et al. [6] additionally analyzed whether the speed to crowd density ratio of their algorithm is similar to that of human crowd recordings. Tamura et al. [75] calculated the difference between observed human trajectories and trajectories that were calculated by their motion planner, which is based on social forces. Similarly, Kim and Pineau [36] compared the trajectories of their approach to the trajectories of a wheelchair that was controlled remotely by a human. Specifically, they compared the closest distance to pedestrians, the avoidance distance, and the average times to reach the goal. They proposed using the population density and velocity of the surrounding agents as input for inverse reinforcement learning. Apart from comparing trajectories, i.e., assessment against humans, human likeness is also validated by humans through questionnaires or interviews. Shiomi et al. [73] conducted a study wherein participants had to avoid a robot and vice versa. Afterward, the participants rated how comfortable they felt. Althaus et al. [2] presented a robot that joins a group of standing participants. According to the interviewed participants, the robot appeared natural and was perceived as intelligent. Note that the mentioned questionnaires focused on attributes such as comfort, naturalness and intelligence. Even fewer assumptions are made if human likeness itself is rated, for example, by using a Likert scale. This was shown by Minato and Ishiguro [51], who evaluated the hand motions of a humanoid. Another technique is using a variation of a Turing test. Kretzschmar et al. [38] applied inverse reinforcement learning to reproduce human navigation behavior. They showed animated trajectories to human participants and asked them whether the trajectories are based on human recordings or their algorithm. In this paper, a similar evaluation is used, however, we go beyond showing animated trajectories by using virtual reality.

Further work related to human-like motion planning could be listed that does not explicitly mention human likeness but rather uses terms similar to interaction awareness. Since we consider interaction awareness to be a key attribute of human-like navigation, research conducted in this area is highly relevant. However, such research is already summarized in our previous work [81] and is not repeated here. Rather, the application of game theory for motion and coordination tasks is outlined in the following.

A new approach to model the navigation of humans is game theory, which has already found applications in motion planning and coordination. LaValle and Hutchinson [43] were among the first to propose game theory for the high-level planning of multiple robot coordination. Specific applications are a multi-robot search for several targets [49], the shared exploration of structured workspaces such as building floors [74], or coalition formation [24]. Closely related to these coordination tasks is the family of pursuit-evasion problems. These problems can be formulated as a zero-sum or differential game [3, 52, 85]. Zhang et al. [91] introduced a control policy for a motion planner that enables a robot to avoid static objects and to coordinate its motion with other robots. Their policy is based on zero-sum games and assigning priorities to the different robots. Thus, it eludes possible mutual avoidance maneuvers by treating robots with a higher priority as static objects. The motions of multiple robots with the same priority are coordinated within the work of Roozbehani et al. [69]. They focused on crossings and developed cooperative strategies to resolve conflicts among autonomous vehicles. Recently, Zhu et al. [93] discussed a game theoretic controller synthesis for multi-robot motion planning.

Unfortunately, the works mentioned thus far focused on groups of robotic agents. In contrast, Gabler et al. [22] regarded a two-agent team consisting of a human and a robot. They presented a method to predict the actions of a human during a collaborative pick and place task based on game theory and Nash equilibria. The works of Dragan [20] and Nikolaidis et al. [58] also involved a human. They both formulated several interactive activities between a human and a robot generally as a two-player game. Then, they highlighted how different approximations and assumptions of this game result in different robot behaviors. Both mostly concentrated on pick and place [59] or handover tasks. An exception is the approximation for navigation presented by Sadigh et al. [71]. They modeled interactions between an autonomous car and a human driver for merging or crossing scenarios. They simplified solving the dynamic game by assuming that the human merely computes the best response to the car’s action (rather than calculating all Nash equilibria) and showed that the autonomous car and the human driver act as desired. A similar merging scenario was also formulated as a dynamic game by Bahram et al. [4]. However, they concentrated on navigation for cars. Moreover, the works mentioned in the last paragraph focus mainly on two-player games, whereas we aim for human-like navigation in populated environments. Hoogendoorn and Bovy [30] were among the first to connect game theory with models for human motion. They focused on simulating crowd movements and generated pedestrian flows by formulating the walking behavior as a differential game, where every pedestrian maximizes a utility function. Another technique to simulate pedestrian flows is to combine game theory with cellular automata [76, 92]. In this technique, the crowd is interpreted as a finite number of agents that are spread on a grid, where each agent jumps to a bordering cell with a fixed probability. In contrast, Mesmer and Bloebaum [50] investigated a combination of game theory with velocity obstacles. Their cost mirrors the energy consumption during walking, waiting and colliding. Apart from that, mean field game theory has become increasingly popular for modeling crowds. This theory explores decision making in large populations of small interacting individuals, wherein one agent has only a negligible impact on the entire crowd. Among others, Dogbé [18] and Lachapelle and Wolfram [42] presented mean-field-based formulations for crowd dynamics and adequate solvers. In particular, this latter method highlights the main application interest of the mentioned techniques: crowd modeling to simulate evacuation scenarios, i.e., scenarios with a vast number of agents. However, we are interested in sparsely populated environments. To date, there have been almost no attempts to evaluate the application of game theory for these types of crowds, with the exception of [46]. Ma et al. [46] first estimated person-specific behavior parameters with a learning-based visual analysis. Then, they predicted the motions of humans recorded in pedestrian areas by encoding the coupling during multi-agent interactions with game theory.

We go one step further and use game theory not only for prediction but also for planning human-like motions. This paper differentiates itself from previous human-like motion planners by focusing on the decision-making process of individuals and taking interdependencies into account. All agents are modeled as interaction-aware individuals that anticipate possible avoidance maneuvers of other moving agents, which goes beyond the popular constant velocity assumption. We further refrain from modeling the motion behavior directly but rather formulate navigation as a mathematical decision between different movements. We use game theory to formulate this decision process and to determine human movements in populated environments. Thus, the term populated refers to busy, yet not crowded, areas. At present, game theoretic pedestrian models mainly focus on larger crowds to simulate evacuations, or they focus only on two-player setups. For the former approaches, a human-like, macroscopic behavior is relevant (e.g., behavior at bottlenecks, line formation, and flocking). In contrast, we are interested in evaluating the microscopic behavior concerning the human likeness. We extensively evaluate the human likeness in several experiments. Thus, we refrain from relying on prior assumptions and use a variation of the Turing test. In our opinion, this is the most unbiased way to assess human likeness. Additionally, we compare several properties of human trajectories to the trajectories computed by our motion planner. The trajectories were gathered during a collision avoidance study in virtual reality between a human participant and an artificial agent. To the best of our knowledge, this work is the first time that a motion planner for populated environments based on game theory is tested in an online fashion.

3 Problem Formulation

Driven by the motivation discussed at the beginning, we summarize our goal as follows:

Goal: a motion planner that generates human-like motion behavior for robots acting in populated environments.

This goal yields the problem of defining human likeness. It is nontrivial to define criteria that validate whether a behavior is human-like or to what extent it is perceived as human. This is why this paper builds upon the fundamental definition that an artificially generated motion is a human-like motion if a human perceives no difference between a ‘real’ human motion and an artificially generated motion. No further assumptions are made.

Human-like motion planning: consists of planning collision-free motions for one or more agents such that they behave equivalent to, or indistinguishable from, a human.

As mentioned, the robot is supposed to act in populated environments. This means solving a kinodynamic planning problem within a dynamic workspace, as shown in Fig. 2: a state space \(\mathcal {X}\) is occupied by static objects and dynamic agents. Let us denote the unified occupancy of all static objects as \(\mathcal {X}^{\mathrm {obj}}\) and the admissible workspace as \(\mathcal {X}^{\mathrm {adm}}= \mathcal {X}\setminus \mathcal {X}^{\mathrm {obj}}\). Note that this definition deliberately disregards the space occupied by dynamic agents. A subtask is to find a trajectory \({\varvec{\tau }}_{n}\) for each agent \(A_{n} \in \mathcal {A}\) that at first only satisfies static object constraints and local differential constraints. The differential constraints are expressed in implicit form:

$$\begin{aligned} \dot{\mathbf {x}_{n}} = f(\mathbf {x}_{n}, \mathbf {u}_{n}), \end{aligned}$$
(1)

in which \(\mathbf {x}_{n}\) and \(\mathbf {u}_{n}\) are the agent’s state and control input, respectively, with \(\mathbf {x}_{n} \in \mathcal {X}\) and \(\mathbf {u}_{n} \in \mathcal {U}_{n}\) denoting the set of control inputs of agent \(A_{n}\). The kinodynamic planning problem [44] is to find a trajectory that leads from an initial state \(\mathbf {x}^{\mathrm {init}}_{n} \in \mathcal {X}\) to a goal region \(\mathcal {X}^{\mathrm {goal}}_{n} \in \mathcal {X}\). A trajectory is defined as a time-parameterized continuous path \({\varvec{\tau }}_{n} : [0,T] \rightarrow \mathcal {X}^{\mathrm {adm}}\) that fulfills the constraints given by (1) and avoids static objects. A certain segment of a trajectory is defined by a time interval and described by \({\varvec{\tau }}_{n}([t^{\mathrm {0}},t^{\mathrm {1}}])\).

Due to the dynamic environment, an additional constraint is to find a combination of trajectories that ensures that none of the agents will collide. To guarantee collision-free navigation, the following has to hold at any time t:

$$\begin{aligned} \mathcal {X}^{\mathrm {dyn}}_{n}(t)\ \cap \ \mathcal {X}^{\mathrm {dyn}}_{n'}(t) =\ \emptyset \ \forall \ A_{n}, A_{n'} \in \mathcal {A}, \end{aligned}$$
(2)

with \(\mathcal {X}^{\mathrm {dyn}}_{n}(t)\) being the subset of the state space that is occupied by a dynamic agent \(A_{n}\) at a certain time t.

Fig. 2
figure 2

Example of interaction-aware navigation of agents on a sidewalk with a static object. Interaction may be a mutual avoidance maneuver. Agents \(A_{n}\) plan trajectories from an initial state \(\mathbf {x}^{\mathrm {init}}_{n}\) to a goal region \(\mathcal {X}^{\mathrm {goal}}_{n}\) that avoid static objects \(\mathcal {X}^{\mathrm {obj}}\)

To further specialize the motion planning problem toward human likeness, each agent is assumed to have the following properties:

  • An agent can be either a human or a controllable agent (e.g., robots, characters in a game, or simulated particle).

  • All agents are interaction-aware.

Thus, the term interaction-aware is seen from a navigational perspective. The individuals reason about possible motions of others and interdependencies.

Interaction-aware motion planning: is defined as the planning of collision-free trajectories in dynamic environments that additionally considers possible reciprocal actions and influences between all other dynamic agents.

The problem can be further specialized. In the case that all dynamic agents are robots (i.e., controlled agents), the problem can be formalized as a centralized multi-agent motion planning problem. In the case of robot(s) navigating among humans, the challenge is to reason about the possible motions of the humans and interdependencies. Based on this, the robot has to decide which trajectory it should take. The presented motion planning approach can address both challenges and is evaluated accordingly.

4 Human-Like, Interaction-Aware Motion Planning Based on Game Theory

We aim to reproduce human behavior by creating a motion planner that takes the interaction awareness of all agents into account and generates human-like trajectories. In our previous works [79, 81], we already showed that human interaction-aware decision making during navigation can be mathematically formulated as searching for Nash equilibria in a static game. We build upon these findings and base our approach on game theory.

4.1 Modeling Navigation as a Game

This paper focuses on the branch of non-cooperative games. This can briefly be summarized by the following.

Non-cooperative game theory: handles how rational individuals make decisions when they are interdependent.

Mutual interdependence exists if the utility of any individual is dependent on the decision of others. The term non-cooperative is meant in contrast to cooperative/coalitional games, where the focus is on the benefit that groups of agents, rather than individuals, achieve by entering into bindings [68, p. 1f]. This distinction does not imply that non-cooperative game theory neglects cooperation. However, cooperation only occurs if it is beneficial for the individuals. This is also referred to as rational behavior. Individuals behave rationally if they maximize their expected utility or minimize expected cost [15]. In this paper, we imply that navigating humans act rationallyFootnote 1 and model navigation as

Definition 1

(Static Game) A static, non-cooperative, finite, nonzero-sum game is defined by a [45]

  1. 1.

    Finite set of \(N\) agents \(\mathcal {A}= \{A_1,A_2,\dots ,A_N\}\), \(N=|\mathcal {A}|\).

  2. 2.

    Finite set of action sets \(\mathcal {T}= \mathcal {T}_1\cup \mathcal {T}_2\cup \dots \cup \mathcal {T}_N\), where a set \(\mathcal {T}_{n}\) is defined for each agent \(A_{n} \in \mathcal {A}\). Each \({\varvec{\tau }}_{n}^{m} \in \mathcal {T}_{n}\) is referred to as an action of \(A_{n}\), with \(m= \{1,2,\dots ,M_{n}\}\) and \(M_{n} = |\mathcal {T}_{n}|\).

  3. 3.

    Cost function \({J}_{n}\): \(\mathcal {T}_1 \times \mathcal {T}_2 \times \dots \times \mathcal {T}_N \rightarrow \mathbb {R} \cup \{\infty \}\) for each agent \(A_{n} \in \mathcal {A}\).

The subscript \(n\) always refers to the addressed agent. Each agent \(A_{n}\) has different actions \({\varvec{\tau }}_{n}^{m} \in \mathcal {T}_{n}\) and a cost function \({J}_{n}\). The superscript \(m\) refers to an action, and \({\varvec{\tau }}_{n}^{m}\) is the \(m\)th action out of \(M_{n}\) actions of agent \(A_{n}\). A game is finite if the number of actions is bounded for all agents. One speaks of a non-zero-sum game if the sum of each agent’s costs can differ from zero. Mapping these terms to robotic motion planning, an action \({\varvec{\tau }}_{n}^{m}\) is defined here as a trajectory leading \(A_{n}\) from its starting state \(\mathbf {x}^{\mathrm {init}}_{n}\) to its goal region \(\mathcal {X}^{\mathrm {goal}}_{n}\). An example is given in Fig. 2 where two agents, \(A_1\) and \(A_2\), are walking on a sidewalk that is occupied by a static object. Each agent can choose between different trajectories, i.e. their actions \(\mathcal {T}_1 = \{{\varvec{\tau }}_1^1, {\varvec{\tau }}_1^2,{\varvec{\tau }}_1^3, {\varvec{\tau }}_1^4\}\) and \(\mathcal {T}_2 = \{{\varvec{\tau }}_2^1, {\varvec{\tau }}_2^2, {\varvec{\tau }}_2^3,{\varvec{\tau }}_2^4, {\varvec{\tau }}_2^5\}\).

The cost function \({J}_{n}\) models the interaction awareness by being dependent on the actions of all agents. In this work, it consists of an independent component \(\hat{{J}}\) and an interactive component \(\tilde{{J}}_{n}\):

$$\begin{aligned}&{J}_{n}({\varvec{\tau }}_1^{m},\dots , {\varvec{\tau }}_{n}^{m'}, \dots , {\varvec{\tau }}_N^{m''}) \nonumber \\&\quad = \hat{{J}}_{n}({\varvec{\tau }}_{n}^{m'}) +\tilde{{J}}_{n}({\varvec{\tau }}_1^{m}, \dots , {\varvec{\tau }}_{n}^{m'}, \dots , {\varvec{\tau }}_N^{m''}). \end{aligned}$$
(3)

Note that this distinction clarifies that the game theoretic formulation results in an independent set of optimal control problems if no interaction occurs. \(\hat{{J}}_{/unten}\) is only dependent on the action \({\varvec{\tau }}_{n}^{m'}\) of \(A_{n}\) (it is defined later). The interactive component \(\tilde{{J}}_{n}\) contains the interdependency cost. It is not only dependent on its own choice of action but also on the other agents’ actions. The interactive component \(\tilde{{J}}_{n}\) is set to

$$\begin{aligned} \tilde{{J}}_{n}({\varvec{\tau }}_1^{m}, \dots , {\varvec{\tau }}_N^{m''}):= {\left\{ \begin{array}{ll} \infty &{} \text {if a collision occurs,}\\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$
(4)

With this definition, \(\tilde{{J}}_{n} \) becomes infinity in the case that action \({\varvec{\tau }}_{n}^{m'}\) leads to a collision with the action of another player; otherwise, it is zero.

4.2 Solving Games: Solution Techniques

This sections focuses on what would be the most promising combination of actions for the sidewalk scenario in Fig. 2. Game theory offers diverse definitions of equilibrium points. We interpret them as recommendations or as a prediction of what is likely to occur. A combination of actions of the agents is denoted as an allocation\(({\varvec{\tau }}_1^{m},\dots , {\varvec{\tau }}_{n}^{m'}, \dots , {\varvec{\tau }}_N^{m''})\). Note that such an allocation was used for the cost function in Eq. (3). Keeping this in mind, a popular solution technique is presented: the Nash equilibrium.

Nash equilibrium: a Nash equilibrium is an allocation where no agent can reduce its own cost by changing its action if the other agents stick to their actions. A Nash equilibrium is the best response for everyone.

A game can have several Nash equilibria. Let us denote the set of Nash equilibria as \(\mathcal {E}= \{\epsilon ^1, \dots , \epsilon ^{k}, \dots , \epsilon ^K\}\), with \(K=|\mathcal {E}|\). The actions of an equilibrium allocation \(\epsilon ^{k} = ({\varvec{\tau }}_1^{*}, \dots , {\varvec{\tau }}_N^{*})\) are marked with an asterisk. It is mathematically defined by the following:

Definition 2

(Nash equilibrium) The \(N\)-tuple allocation of actions \(({\varvec{\tau }}_1^{*}, \dots , {\varvec{\tau }}_{n}^{*}, \dots , {\varvec{\tau }}_N^{*})\), with \({\varvec{\tau }}_{n}^{*} \in \mathcal {T}_{n}\), constitutes a non-cooperative Nash equilibrium for an \(N\)-agent game if the following \(N\) inequalities are satisfied for all actions \({\varvec{\tau }}_{n}^{m} \in \mathcal {T}_{n}\):

$$\begin{aligned} \begin{array}{c} {J}_1({\varvec{\tau }}_1^{*}, {\varvec{\tau }}_2^{*},\dots , {\varvec{\tau }}_N^{*}) \le {J}_1({\varvec{\tau }}_1^{m}, {\varvec{\tau }}_2^{*}, \dots , {\varvec{\tau }}_N^{*})\\ {J}_2({\varvec{\tau }}_1^{*}, {\varvec{\tau }}_2^{*},\dots , {\varvec{\tau }}_N^{*}) \le {J}_2({\varvec{\tau }}_1^{*}, {\varvec{\tau }}_2^{m}, \dots , {\varvec{\tau }}_N^{*})\\ \vdots \\ {J}_N({\varvec{\tau }}_1^{*}, {\varvec{\tau }}_2^{*},\dots , {\varvec{\tau }}_N^{*}) \le {J}_N({\varvec{\tau }}_1^{*}, \dots , {\varvec{\tau }}_{N-1}^{*}, {\varvec{\tau }}_N^{m})\\ \end{array} \end{aligned}$$
(5)

At least one solution \(\epsilon ^{k}\) for these inequalities exists if a cost function in the form of (3)–(4) is used.Footnote 2

Fig. 3
figure 3

Illustration of Fig. 2 as a static game. The actions \({\varvec{\tau }}_{n}^{m}\) of the agents \(A_{n}\) and the independent cost of the trajectories \(\hat{{J}}_{n}({\varvec{\tau }}_{n}^{m})\) are shown. The corresponding cost matrix that considers collisions is Table 1

Let us use the sidewalk scenario in Fig. 2 as an example to illustrate the meaning of a Nash equilibrium. First, an (arbitrarily chosen) independent cost \(\hat{{J}}_{n}\) is assigned to each trajectory drawn on the sidewalk (compare Fig. 3). A corresponding cost matrix that considers interdependence (i.e., collisions) is shown in Table 1. Then, the Nash equilibria are calculated by solving (5). Accordingly, the game has four Nash equilibria: \(\mathcal {E}= \{\epsilon ^1, \epsilon ^2, \epsilon ^3, \epsilon ^4\} = \{({\varvec{\tau }}_1^{2^*},{\varvec{\tau }}_2^{2^*}),({\varvec{\tau }}_1^{1^*},{\varvec{\tau }}_2^{3^*}), ({\varvec{\tau }}_1^{3^*},{\varvec{\tau }}_2^{5^*}),({\varvec{\tau }}_1^{4^*},{\varvec{\tau }}_2^{4^*})\}\). They are marked in bold in Table 1. At these allocations, neither of the two agents can lower their own cost any further by changing only their own action.

This example raises the question of which equilibrium an agent should choose. Comparing the cost of the equilibria reveals that the cost pair (4|4) is dominated by the alternatives (2|2) and (1|3). To further reduce the set \(\mathcal {E}\), only Pareto-optimal Nash allocations are kept. These are all equilibria that adhere to the following condition.

Pareto optimality: a Pareto optimal outcome is an allocation in which it is impossible to reduce the cost of any player without raising the cost of at least one other player.

In our example, this condition holds for three dominating Nash equilibria. They will be denoted as elements of the set \(\mathcal {E}_{\mathrm {pareto}}= \{({\varvec{\tau }}_1^{1^*},{\varvec{\tau }}_2^{3^*}), ({\varvec{\tau }}_1^{3^*},{\varvec{\tau }}_2^{5^*}), ({\varvec{\tau }}_1^{4^*}, {\varvec{\tau }}_2^{4^*})\}\), with \(\mathcal {E}_{\mathrm {pareto}}\subseteq \mathcal {E}\). The three remaining equilibria can be interpreted as different avoidance maneuvers: either the agents avoid each other equally or one agent gives way to the other. Which of these equilibria our motion planner should choose is dependent on the application and is discussed later.

Table 1 Static game. The cells depict cost pairs \({J}_1|{J}_2\) dependent on actions \({\varvec{\tau }}_{n}^{m}\). Actions and corresponding cost are shown in Fig. 3. In case of a collision the cost is infinite. Nash equilibria are in bold
Fig. 4
figure 4

Applying game theory for motion planning in dynamic environments; the problem is decoupled into solving a kinodynamic planning problem and repeatedly playing a static, non-cooperative game. The dashed lines are only used if the group of agents is a mixture of controllable agents and humans

4.3 Implementing a Game Theoretic Motion Planner

The presented static game lays the foundation for the motion planner. According to the problem definition in Sect. 3, the planner should find a trajectory for each agent that fulfills differential constraints, as well as avoids static objects and dynamic agents. In this paper, this problem is decoupled by first solving the kinodynamic planning problem independently for each agent. However, it is solved repeatedly such that a set of various trajectories is calculated for each agent (i.e. the action sets). In the second step, game theoretic reasoning decides on a combination of trajectories (see Fig. 4). In the reasoning step, we will further differentiate between the case where all agents are controllable and the case where the dynamic agents are a mixture of humans and robots.

4.3.1 Trajectory Planning with Differential Constraints

To calculate the trajectories, a control-based version of the rapidly exploring random tree (RRT) [44] is used. It is chosen because it considers differential constraints and finds multiple solutions. For this task, other planners are also suitable. It is also conceivable to combine the solutions of different planners. For example, one could additionally calculate a trajectory with an RRT*, or one could further optimize the trajectories to generate (locally) optimal solutions. However, it is crucial that several diverse trajectories to the goal are found.

As with its original, the control-based RRT repeatedly samples a state at random and finds the nearest neighbor in the tree. The two versions differ in the following extension step: the control-based version selects a control input \(\mathbf {u}\in \mathcal {U}_{n}\) that extends the vertex of the nearest neighbor toward the sampled state. In this way, the control input \(\mathbf {u}\) is applied for a certain time interval \(\delta t\), \(\{\mathbf {u}(t') | t \le t' \le t + \delta t\}\), and the new state is calculated through numerical integration. Thus, the output of the control-based RRT is not only a collision-free trajectory \({\varvec{\tau }}_{n}\) but also a series of controls \({\varvec{\mu }}_{n} : [0, T] \rightarrow \mathcal {U}_{n}\). For the numerical integration, a discrete-time approximation of (1) is used:

$$\begin{aligned} \mathbf {x}_{n}[t+1] = f(\mathbf {x}_{n}[t], \mathbf {u}_{n}[t]). \end{aligned}$$
(6)

The state and control vectors are given by

$$\begin{aligned} \mathbf {x}_{n}[t] = \begin{pmatrix} x_{n}[t] \\ y_{n}[t] \\ \theta _{n}[t] \end{pmatrix}, \quad \mathbf {u}_{n}[t] = \begin{pmatrix}v_{n}[t] \\ w_{n}[t] \end{pmatrix}, \end{aligned}$$

where the state \((x_{n}, y_{n}, \theta _{n})^\intercal \) describes the global position and orientation of the center of mass, and the control inputs denoted by \((v_{n}, w_{n})^\intercal \) are linear and angular velocities. The subscript \(n\) matches the state and control to the corresponding agent \(A_{n}\). The output of the control-based RRT is consequently a discrete trajectory \({\varvec{\tau }}_{n} = (\mathbf {x}_{n}[0], \mathbf {x}_{n}[1], \dots , \mathbf {x}_{n}[T])\), and a discrete control series \({\varvec{\mu }}_{n} = (\mathbf {u}_{n}[0], \mathbf {u}_{n}[1], \dots , \mathbf {u}_{n}[T])\). To approximate an agent’s motion, a discrete time unicycle model is used:

$$\begin{aligned} x_{n}[t+1]= & {} x_{n}[t] + dt\ v_{n}[t]\cos (\theta _{n}[t]),\nonumber \\ y_{n}[t+1]= & {} y_{n}[t] + dt\ v_{n}[t]\sin (\theta _{n}[t]),\nonumber \\ \theta _{n}[t+1]= & {} \theta _{n}[t] + dt\ w_{n}[t]. \end{aligned}$$
(7)

Here, \(dt\) is the magnitude of the numerical integration time step. Note, that this differs from the RRT time step \(\delta t\) that defines how long a control is applied to extend an edge. \(\delta t\) is the propagation duration and can be larger. We further use a finite set of control inputs

$$\begin{aligned} \mathcal {U}_{n} = \Biggl \{&\begin{bmatrix} v_{n} \\ 0 \end{bmatrix}, \begin{bmatrix} v_{n} \\ w_{n} \end{bmatrix}, \begin{bmatrix} v_{n} \\ -w_{n} \end{bmatrix}, \begin{bmatrix} v_{n} \\ cw_{n} \end{bmatrix}, \begin{bmatrix} v_{n} \\ -cw_{n} \end{bmatrix}\Biggr \}, \end{aligned}$$
(8)

with \(w_{n} \in [w_{n}^{\mathrm {min}}, w_{n}^{\mathrm {max}}]\) being randomly chosen every time before a new trajectory from start to goal is planned. Additionally, a factor \(c\) is introduced to allow for different curvatures within the resulting path. In this setup, \(c\) is set to \(\frac{1}{2}\). The linear velocity \(v_{n}\) is an agent-specific value.

Table 2 Parameters used by the game theoretic planner (GT) for the online video study (video) and the virtual reality study (virtual)
Fig. 5
figure 5

Different replanning steps of the multi-agent motion planning. Static objects are shown as gray rectangles, and agents are colored circles. The Nash equilibrium trajectories of each step are in bold. The final trajectories are shown in Fig. 6. a\(t=0\) s. b\(t=2\) s. c\(t=4\) s. (Color figure online)

To create diverse trajectories for each agent, the parameters of the control-based RRT are constantly varied. The angular velocity is randomly chosen as mentioned above. Additionally, the propagation duration \(\delta t\) is not fixed but lies in the interval \([\delta t^{\mathrm {min}}, \delta t^{\mathrm {max}}]\). It varies at each state extension step. The propagation interval also changes. The upper and lower bounds are each randomly chosen from two intervals \(\varGamma ^{\mathrm {min}}\) and \(\varGamma ^{\mathrm {max}}\) before a new trajectory is planned. The values used for each of the aforementioned parameters are listed in Table 2. Examples of resulting trajectories are shown in Fig. 5. For these trajectories, the values in the column ‘video’ were used. The Open Motion Planning LibraryFootnote 3 served as the basis for our implementation.

4.3.2 Choosing the Cost Function

After calculating the action sets for each agent, the independent cost component \(\hat{{J}}_{n}\) of each action needs to be specified [see Eq. (3)]. A prior human motion analysis [81] evaluated how accurately different cost functions could reproduce the human decision making during navigation. Considering only the length of a trajectory, as well as possible collisions within the cost, worked best. Consequently, the independent component \(\hat{{J}}_{n}\) is defined to be the length of the path.

$$\begin{aligned} \hat{{J}}_{n}({\varvec{\tau }}_{n}^{m}):= \mathrm {Length}({\varvec{\tau }}_{n}^{m}) \end{aligned}$$
(9)

This is corroborated by researchers stating that humans execute their motions by following a minimization principle. For example, they minimize the global length of their paths [7]. Additionally, psychologists imply that even infants expect a moving agent to reach their goal by taking the shortest path [16]. However, we acknowledge that only using the length does not perfectly capture the true cost for navigation. The cost function of a static game could easily be individualized for each agent to incorporate preferences or physical properties. However, simply using a more complex function does not necessarily improve the performance [81]. In this paper, we concentrate on length because minimizing length appears to be a prevalent aim of humans [7, 16, 81].

4.3.3 Multi-agent Motion Planning with Game Theory

The game theoretic reasoning decides on a combination of trajectories (Fig. 4). This section describes the reasoning for the case that all agents are controllable. As mentioned, the navigation problem is modeled as a static game (Definition 1). To create a motion planner that adapts to changes, a static game is constantly replayed every \(\varDelta t\) seconds. The agents \(A_{n}\), their action sets \(\mathcal {T}_{n}\), and the corresponding costs \({J}_{n}\) change at every time step. Several RRT planners generate new sets of trajectories for all. Additionally, the default action \({\varvec{\tau }}_{n}^0\) “stand still for \(\varDelta t\) seconds” is added to each set \(\mathcal {T}_{n}\) such that an agent can stop immediately. This action is tagged with an independent cost \(\hat{{J}}\) that is higher than each trajectory cost in \(\mathcal {T}_{n}\) but lower than the cost for a collision. Thus, we prevent “stand still for \(\varDelta t\) seconds” from always remaining the best option for an agent.

After a static game is set up, its set of Nash equilibria \(\mathcal {E}\) is calculated and processed in the coordination step (Fig. 4). For multi-agent motion planning, the Nash equilibrium that Pareto dominates the other equilibria is chosen. If several Pareto-optimal equilibria exist, one of them is selected at random. This allocation is denoted as \(\epsilon ^{*}\). In addition to each equilibrium trajectory \({\varvec{\tau }}_{n}^*\), a corresponding control series \({\varvec{\mu }}_{n}^{*}\) exists. For the duration of \(\varDelta t\) seconds, the control inputs of the respective trajectories are transferred to each agent. The agents advance, the environment changes, and the next planning loop can begin. However, the chosen equilibrium trajectories \(\epsilon ^{*}\) are memorized and used as actions in the static game of the following time step (see Fig. 4). They lead to the goal region and are promising because they were already the ‘winning’ combination in the last loop.

An example for the multi-agent motion planning is given in Fig. 5. An environment is occupied by static objects (gray rectangles) and five agents (colored circles). The scene is a reproduction of the environment in the BIWI Walking Pedestrians dataset [62] (Fig. 6). It shows two agents walking side-by-side (dark blue and dark green), an agent passing by on the right-hand side with a higher velocity in front of these two agents (light green), and two agents facing and crossing each other (black and light blue). Figure 5 shows the planned trajectory sets at different time steps, and the chosen Nash equilibrium trajectories are drawn in bold. The example of the dark green agent shows that the trajectory is constantly improved. At \(t=0\,\hbox {s}\), the game only found a relatively long and curvy solution. In general, the more often the game is played, the shorter the path becomes. The final solution is drawn in Fig. 6.

4.3.4 Motion Planning Among Humans

In the case where the set of agents is a mixture between controlled agents and humans, the game theoretic reasoning is adopted. An application for this setup would be a robot that is navigating in a human-populated environment (see Fig. 1). Similar to the procedure before, a static game is repeatedly played, and the set of Nash equilibria \(\mathcal {E}\) is calculated at each time step. The difference is in the way of deciding which of the Nash equilibria is the ‘winning’ allocation \(\epsilon ^{*}\). Only at the first planning loop (\(t=0\,\hbox {s}\)) is the Pareto-optimal Nash equilibrium chosen. For the subsequent time steps, the following are considered: the set of Nash equilibria from the current time step \(\mathcal {E}[t]\), the set of Nash equilibria from the previous time step \(\mathcal {E}[t-\varDelta t]\), and the set of observed trajectories that the agents walked in the previous time step, denoted as \(\mathcal {T}^{\mathrm {obs}}\) with the elements \({\varvec{\tau }}^{\mathrm {obs}}_{n}([t-\varDelta t, t])\). The additional components for the reasoning step are marked in Fig. 4 with dashed lines. First, we infer which of the previous equilibrium allocations \(\epsilon ^{k} \in \mathcal {E}[t-\varDelta t]\) is most similar to the observed behavior of all agents, i.e., the observed trajectories \(\mathcal {T}^{\mathrm {obs}}\) from the previous time step. Then, the most similar allocation in \(\mathcal {E}[t-\varDelta t]\) is in turn compared to the allocations in the new set of Nash equilibria \(\mathcal {E}[t]\). The Nash allocation in \(\mathcal {E}[t]\) with the highest resemblance is chosen to be the ‘winning’ allocation \(\epsilon ^{*}\) of time step t. By using this approach, knowledge gained through observation is included in the reasoning step. How to calculate the similarity between two trajectories is discussed in [80]. For our case, it is sufficient to compute the average Euclidean distance because only the length of the trajectory is considered in the cost function. To obtain the similarity between two allocations, we calculate the mean of all trajectory comparisons. Examples for the resulting trajectories of the presented motion planner for navigating among humans are drawn in Fig. 14b. The entire approach is evaluated with the experimental setup described in Sect. 6.

Fig. 6
figure 6

Environment of the BIWI Walking Pedestrians dataset [62] that served as the basis for the example shown in Fig. 5. Final trajectories of the motion planning in Fig. 5 at \(t=11\,\hbox {s}\) are drawn in the scene

Fig. 7
figure 7

Screenshots of the videos that were shown to the participants. Additionally, the paths of the agents are plotted. They were generated by recording humans (HU) or using different planning methods (GT, RVO, SF). The visualization is a replication on the scene shown in Fig. 6. a Human motions HU. b Game theoretic planner GT. c Reciprocal velocity obstacles RVO. d Social forces SF

5 Evaluation: Multi-agent Motion Planning and Coordination

In this section, we validate the human likeness of our game theoretic motion planner for multiple controlled agents (see Sect. 4.3.3). Therefore, a variation of the Turing test was conducted in terms of an online video study. We presumed that the agents controlled by our motion planner behave equivalent to humans and state our hypothesis.

Hypothesis 1

While watching a video that shows walking pedestrians, humans cannot distinguish between motions that are based on our game theoretic motion planner and motions that are based on human motions. They are perceived as equally human-like motions.

To allow for a better comparison, we rated the human likeness of two additional motion planners: the reciprocal velocity obstacles [84] and the social forces [27, 31]. These two algorithms were chosen because they are interaction-aware and often used for comparisons. For example, Kretzschmar et al. [38] also compared the performance of their planner with these two algorithms.

The following subsections discuss the setup of the video study, its evaluation and its statistical results. The four compared motion planning methods are abbreviated as human motions HU, game theoretic planner GT, reciprocal velocity obstacles RVO, and social forces SF.

5.1 Experimental Setup: Online Video Study

The term online study refers to a questionnaire that was posted on the Internet. Within our study, participants were asked to watch several videos. The videos showed visualizations of pedestrians walking in an urban environment, as depicted in Fig. 7 (the paths were drawn into the pictures afterward; they were not visible to the participants). The motions in the videos were generated using two methods. One method was to reproduce trajectories from previously video-taped motions such that the simulated trajectories are based on real, human behavior (HU). The other method was to generate artificial walking motions with the same start and goal as in the recordings by using one of the three motion planners (GT, RVO, or SF). After watching a video, the participants were asked to decide whether the watched walking motions are based on human recordings or artificial. This method is inspired by the Turing test and the results from our definition of human likeness in Sect. 3.

The human trajectories were taken from the hotel sequence of the BIWI Walking Pedestrians dataset [62], which shows walking pedestrians on a sidewalk (Fig. 6). Overall, six sequences were selected.Footnote 4 They were chosen such that at least four pedestrians were moving. Moreover, some pedestrians should walk in different directions such that interaction occurs. The resulting sequences contained four to seven moving pedestrians and lasted up to seven seconds. From each sequence, the agents’ average speeds \(\hat{v}_{n}\), their initial states \(\mathbf {x}^{\mathrm {init}}_{n}\), and their goal regions \(\mathcal {X}^{\mathrm {goal}}_{n}\) were extracted and given as input to the three motion planners. An exemplary output of the trajectories created by the motion planners is shown in Fig. 7. For the game theoretic motion planner, we used the parameters summarized in Table 2 in the column ‘video’. The implementation details of the reciprocal velocity obstacles and the social forces approach are listed in the “Appendix”. The sequences were visualized with the robot simulator V-REP.Footnote 5 This simulator is compatible with the ROS framework and allows for the agents to exactly follow the trajectories by setting the desired poses at a certain time.

5.2 Statistical Data Analysis

Altogether, 227 persons finished the study. In addition to age and gender, the participants were asked for their level of experience with robotics on a scale from 1 (no experience) to 5 (a lot of experience). The majority of the participants were male, in their early thirties, and had minor experience with robots (Table 3 ‘video’).

Table 3 Empirical data of the video study (Sect. 5) and the virtual reality study (Sect. 6)
Fig. 8
figure 8

Human likeness of motion planning methods, result of the video study; HU human recordings, GT game theoretic motion planner, RVO reciprocal velocity obstacles, SF social forces

The study took approximately 10 minutes to complete. Each of the participants watched \(6 \times 4 = 24\) videos in a random order (6 sequences, 4 motion planning methods). After each video, the participants had to decide whether the watched movements were based on human recordings or artificial. Thus, each participant rated the human likeness of the motion planners. For example, if a participant perceived the motions based on the reciprocal velocity obstacles planner as based on human motions in four out of six cases, they rated its human likeness to be \(\frac{4}{6} \approx 67\%\).

The results of the study are illustrated in Fig. 8, where bar graphs depict the average human likeness of the motion planning methods. The bar graphs show that the motions based on human recordings (HU) reach a human likeness of \(71\%\) and are most often perceived as human. This result was expected. More interestingly, with a rating of \(69\%\), the game theoretic planner is perceived as almost as human-like as the human recordings. Clearly lower is the average human likeness of the reciprocal velocity obstacles (\(43\%\)) and the social forces (\(30\%\)).

Fig. 9
figure 9

Box plots of the difference of the online video study; the p values of the post hoc Friedman test are printed within the corresponding box plots, and significant differences are marked in gray and are asterisked \({*}\); HU human recordings, GT game theoretic motion planner, RVO reciprocal velocity obstacles, SF social forces

A nonparametric Friedman test was conducted to check whether any of the motion planners were rated consistently more or less human-like than the others. This test was chosen because our independent variable, the motion planning method, has more than two levels (HU, GT, RVO, and SF) and our dependent variable, the rating of the human likeness, is ordinal. Moreover, the Friedman test takes within-subject data into account. The resulting p value of our test is \(\ll 0.001{*}\) with a 5% significance level. Hence, at least one group differs significantly from another one. To decide which motion planners are rated significantly different, a post hoc analysis was performed by conducting the Wilcoxon-Nemenyi-McDonald-Thompson test [29, p. 295]. Figure 9 shows the p values of the group comparisons and box plots of the differences of the ratings. Significant differences are marked in gray. Notably, there is a significant difference between all group comparisons but one: the comparison between our game theoretic planner and the human motions (GT-HU). This result means that the participants could not distinguish between the two of them. The game theoretic motion planner succeeds in generating human-like motions. Our hypothesis Hypothesis 1 is confirmed for a multi-agent motion planning task. In contrast, the participants perceived the reciprocal velocity obstacles and the social forces as being significantly less human-like than the human recordings and the game theoretic planner. The difference is depicted in Fig. 9: the greater the distance of a box plot to zero, the greater is the difference between how human-like a planner was perceived, for example, the box plot of the difference between human recordings and social forces (HU-SF). Here, the mean of the difference is the highest, meaning that the human likeness of the human recordings was significantly higher than the one of the social forces. In comparison, the difference of perceived human likeness between the reciprocal velocity obstacles and social forces (RVO-SF) is smaller, yet still significant, whereas there is statistically no difference between the game theoretic motion planner and humans (GT-HU).

6 Evaluation: Motion Planning Among Humans

This chapter evaluates whether our game theoretic planner generates human-like motions for an artificial agent that is moving in the same environment as a human. Therefore, a collision avoidance study within virtual reality was set up. We considered a scenario where a human and an artificial agent had to avoid a collision while passing each other. Then, the human likeness was rated with a Turing-like test with the following hypothesis.

Hypothesis 2

While walking within virtual reality with another agent, humans cannot distinguish if the agents’ motions are based on our game theoretic planner or on human motions. They are perceived as equally human-like motions.

In addition to the game theoretic planner, the reciprocal velocity obstacles planner was implemented, and its human likeness was validated accordingly. The social forces planner was neglected since its human likeness was the lowest in the previous evaluation. Apart from that, the procedure from the video study (Sect. 5) was maintained: asking participants whether the observed motions are based on human motions or artificially generated. However, in the second study, the participants could move within the same environment as the agent whose motions they should judge. The participants walked actively (no remote control was used) and could react to the behavior of the other agent and vice versa. This was realized by using a head-mounted display and transferring the participants into virtual reality.

6.1 Experimental Setup: Walking Within Virtual Reality

To set up the collision avoidance study, a robot simulator, a head-mounted display, and a motion capture system were combined. The virtual reality was created with the simulator V-REP. The environment and its components are displayed in Fig. 10. It is a reproduction of the laboratory shown in Fig. 11a and contains a carpet, colored start and goal markers and two agents. Its dimensions and the starting positions of the agents are illustrated in Fig. 11b. The participants were asked to wear an Oculus DK2 (Fig. 11c), through which they could see the virtual reality. An example of the participant’s view is shown in Fig. 11d. To adopt the participant’s view to the respective position and orientation of her/his head, the Oculus was equipped with reflective markers, as shown in Fig. 11c. Their positions were tracked with the vision-based motion capture system QualisysFootnote 6 (update frequency \(250\ \hbox {Hz}\)) and passed on to the simulator. Thus, the participants could not only see the virtual reality but also walk freely within it. Together with the Oculus and V-REP, a frame rate between 17 and \(25\ \hbox {fps}\) was reached.

Fig. 10
figure 10

Components within the virtual reality. The environment is a reproduction of the laboratory shown in Fig. 11a

Fig. 11
figure 11

Experimental setup to validate the human likeness of different motion planners in comparison to a human. Through a head-mounted display, the participant was transferred into virtual reality (Fig. 10). The participant was asked to walk from a yellow marked starting point to a yellow marked goal point. At the same time, the participant had to pay attention to another agent who was walking within the environment, called Bill. Bill was either controlled by a human, i.e., the experimenter’s position, or by a motion planner. a Testing laboratory. b Dimensions of the setup. c Head-mounted display. d Participant’s point of view. (Color figure online)

For the second study, the participants’ task was to repeatedly walk from a fixed start to a fixed goal position. While doing so, the participant should pay attention to the behavior of the other walking agent in the room, called Bill. The start and goal positions were chosen such that the agents would most likely collide if both choose the trajectories leading straight to the goal. Hence, this setup expects the agents to interact to avoid a collision. The participant’s start and goal positions were marked as yellow fields on the floor. The goal was equipped with a bordering yellow plane at the wall (compare Fig. 10) such that the participant could avoid looking down. Bill’s start and goal positions were similarly marked in blue. Before a round started, the participant was asked to put on the head-mounted display and to position himself on the yellow starting field. Another plane on the wall indicated whether the participant was within the desired region by turning green (Fig. 10, positioning check). Subsequently, a traffic light countdown told the participant when to start (change from red, to red/yellow, to green) together with a sound signal. When the light turned green, the participant started walking to the goal while paying attention to Bill, who also started to move toward his goal. Bill’s trajectories were generated by different methods. One method was to project the motions of a real person into the virtual reality (HU). Therefore, Bill’s pose was matched with the pose of the experimenter in the laboratory. The experimenter wore a plate covered with reflective markers that were constantly tracked with the motion capture system (compare Fig. 11a. While the experimenter moved, her pose was transferred to the simulator. The other method was to control Bill with a motion planner. The planner was either the game theoretic planner (GT) or the reciprocal velocity obstacles planner (RVO). For this study, the game theoretic planner used the parameters listed in Table 2 in the column ‘virtual’. The parameters of the reciprocal velocity obstacles planner are summarized in Table 7 in the “Appendix” . Additionally, the average velocity of each participant was determined in a test run and monitored during the experiment. The result was forwarded to both motion planners, which used it to model the desired velocity of the human.

After each round, the participant was asked to fill out a questionnaire. The following questions were asked:

  • Question 1: In your opinion, how was Bill controlled in the simulation, through a real person or through a computer program?

  • Question 2: How cooperative did Bill behave on a scale from 1 (very cooperative) to 9 (not cooperative at all)?

  • Question 3: How comfortable did you feel during this round on a scale from 1 (very comfortable) to 9 (not comfortable at all)?

Each participant walked thirty rounds, resulting from ten repetitions for each motion planning method (HU, GT, and RVO). The order of the planning methods was randomized, and the experiment took approximately one hour to complete. Note that irrespective of whether Bill was controlled by the experimenter or by a motion planner, the experimenter always moved from the start to the goal. In the case where Bill was controlled by a motion planner, the experimenter could see Bill’s virtual position on a screen and adopted her position accordingly. Thus, the participants were unable to tell by the presence or absence of an air draft whether Bill was controlled by the experimenter. Moreover, to conceal the sound of footsteps, the participants were asked to wear earplugs and elevator music was played.

6.2 Statistical Data Analysis

Fig. 12
figure 12

Average human likeness of motion planning methods as rated by participants of the virtual reality study; HU human recordings, GT game theoretic motion planner, RVO reciprocal velocity obstacles

Fig. 13
figure 13

Box plots of the difference, p values of the post hoc Friedman test are printed within the box plots, significant differences are marked in gray and are asterisked \({*}\); a the reciprocal velocity obstacles planner is rated significantly less human-like than a human and the game theoretic planner, b the participants felt more comfortable during rounds with the human and the reciprocal velocity obstacles planner than during rounds with the game theoretic planner, and c the human and the reciprocal velocity obstacles planner are perceived as more cooperative than the game theoretic planner

Altogether, 27 volunteers participated in our experiment. They were asked for their age, gender, and level of experience with robotics and computer games. The participants were mostly male, in their late twenties, and had some experience with robots and PC games (Table 3 ‘virtual’).

The human likeness of each motion planning method was rated with Question 1. The results are shown in the bar graph in Fig. 12. Again, the human is perceived most often as human with \(66\%\), closely followed by our game theoretic planner (\(60\%\)). With a human likeness of \(30\%\), the motions of the reciprocal velocity obstacles planner were mostly judged as being artificial. Note that this order is identical to the one in our previous video study. Moreover, the percentages resemble each other, although they are lower. To test Hypothesis 2, we need to statistically check whether the participants could differentiate between the planning methods. A Friedman test evaluated whether any of the motion planning methods were rated consistently more or less human-like than the others. The resulting p value of the test is \(p = 0.001{*}\) with a 5% significance level; hence, at least one group differs significantly from another one. The corresponding output of the Wilcoxon-Nemenyi-McDonald-Thompson post hoc analysis is depicted in Fig. 13a. The results are consistent with the one of the video study. There is a significant difference between all group comparisons but one: the comparison between the human motions and our game theoretic planner (GT-HU). Hence, the participants again could not distinguish between our planner and a real human but detected the difference with the reciprocal velocity obstacles planner. This result further demonstrates that our motion planner generates human-like behavior.

Table 4 Means and standard deviations of different variables of the virtual reality study
Fig. 14
figure 14

All walked paths of one participant of the virtual reality study together with Bill’s paths. Bill’s motions were either based on a the motions of the human experimenter, b the game theoretic motion planner, or c the reciprocal velocity obstacles planner

However, the results are inverted when examining the evaluation of Question 2 and Question 3 regarding the level of comfort and cooperation. The mean values and standard deviations of the participants’ ratings are listed in Table 4. Although the levels of comfort and cooperation of the game theoretic planner are both comparatively high, they are not as high as the respective levels of the human and the reciprocal velocity obstacles planner. This is confirmed by two Friedman tests, for comfort and cooperation, which were both significant with \(p \ll 0.001{*}\). The post hoc test for comfort—illustrated in Fig. 13b—revealed that the participants felt similarly comfortable while walking with a human or an agent controlled by reciprocal velocity obstacles. Meanwhile, they felt significantly less comfortable with the game theoretic planner. This might be substantiated by the level of cooperation. Similar to the level of comfort, the participants rated the human and the reciprocal velocity obstacles planner as significantly more cooperative than the game theoretic planner (see Fig. 13c).

To elucidate a possible reason for these results, we further analyzed the trajectories of the participants. The paths of one participant are plotted as an example in Fig. 14 for all three planning methods. Additionally, the paths of the human experimenter and the paths calculated by the motion planners are drawn. Notably, the human and particularly the reciprocal velocity obstacles planner start the avoidance maneuver earlier than the game theoretic planner. This is why we decided to further calculate the average minimum distance between the two agents for each planner and compare it among them. Additionally, the participants’ average velocities and the average absolute curvatures of the paths were computed. Table 4 shows the results. To test whether the differences are statistically significant, the samples of the distance, the velocity, and the curvature were first tested according to their normal distribution with a Shapiro-Wilk test. The p values of all tests are larger than 0.05; hence, all samples are normally distributed. Moreover, our dependent variables are continuous in this case. Consequently, we can use a repeated measure ANOVA to further analyze our data. The results are shown in Table 5. Note that for all three variables, Mauchly’s test for sphericity was not significant (\(p > 0.05\)), suggesting that the results meet the sphericity assumption for mixed-model ANOVAs. Furthermore, for the velocity and the curvature, the p values of the mixed-model ANOVA are greater than 0.05, revealing no significant difference. Hence, the participants neither adopted their velocity nor the curvature dependent on the planning method. This is, however, different for the minimal distance. Here, the p value is \(\ll 0.001{*}\), and pairwise comparisons using paired t tests (Table 6) revealed that the distance during the round with the game theoretic planner is significantly smaller than that with the other two planning methods. There is no significant difference between the human and the reciprocal velocity obstacles planner. Consequently, we assume that the human and the reciprocal velocity obstacles were perceived as being more cooperative and comfortable because both methods maintained a greater distance.

Table 5 Output of repeated measure ANOVA
Table 6 Pairwise comparisons of distances using paired t tests, p value adjustment method: Bonferroni

6.3 Remarks

Few researchers will be surprised by the final result that distance and comfort are related. What is notable, though, is that a human-like motion does not necessarily result in an increased feeling of comfort. This finding is in line with the observations in [61] during a robot avoidance study: human participants preferred a robot keeping larger distances, but at the same time, they judged this behavior to be unnatural on some occasions [39]. Another reason for this result may be the uncanny valley problem [55]. However, none of the participants stated anything pointing in this direction. It is more likely that the participants’ level of comfort is dependent on the level of cooperation and distance. Additionally, the plotted paths in Fig. 14 overall show that there is still a difference between the behavior of the human experimenter HU and the game theoretic motion planner GT (compare black paths on the left and in the middle). However, the differences appear to be too small to be noticeable by humans. The result that some differences were missed could be due to imperfections in the virtual reality. An alternative experimental setup within the ‘real’ world would be to use a robotic platform that is either remote controlled (i.e., human) or controlled by a motion planner (i.e., artificial). However, in this case, we can neither rule out the possibility that the participants behave differently when confronted with a robot (e.g., as with the large robot in Fig. 1) nor the possibility that the experimenter who controls the robot is impaired by the viewpoint. These uncontrolled variables are ruled out by conducting a study within virtual reality. Nevertheless, it is still exciting to conduct this comparative study to infer whether our results can be transferred to robots. The psychological studies mentioned in Sect. 2 indicate that the chances are high because motions considerably contribute to the occurrence of anthropomorphism [21, 26, 54].

Another aspect that we want to address is the replanning time \(\varDelta t\) of the game theoretic motion planner. The algorithm runs fluently with a frequency up to \(20\,\hbox {Hz}\) given the experimental setup described above and using two agents that can choose from 31 actions. The bottleneck for this setup was to generate the different trajectories with the RRT path planner. We refrained from optimizing our code because the algorithm was sufficiently fast for our purposes. However, the performance can be improved by using more efficient code or a different path planner. In our final experiment, we even reduced the frequency to \(10\,\hbox {Hz}\) (see Table 2). This was necessary to stay comparable to the reciprocal velocity obstacles planner that showed an oscillating behavior if a replanning time \(\varDelta t< 0.10\,\hbox {s}\) was used.

With an increasing number of agents, the calculation of the Nash equilibria will become the main bottleneck since the number of possible allocations increases exponentially. Our implementation uses a brute force searching method. The calculation time decreases significantly if an efficient searching method for Nash equilibria is implemented [66, 88]. In the case that one aims to simulate a large population, other techniques such as the mean field game theory [18, 42] may be more suitable. Further examples are mentioned in Setc. 2. However, in contrast to most of these approaches, our method models interdependencies between all agents.

7 Potential Bias in the Study Design

It is plausible that our results are biased due to our study design in two ways. As the focus of our studies was set on a setup similar to a Turing test, we deliberately decided to ask whether the observed motions are based on human motion. Consequently, we used a human rather than a robotic-like avatar to avoid confusion and to make the task as clear as possible for the participants. We further named the controlled human in the second study Bill to arouse the participants’ interest and to encourage them to closely observe their counterpart. By choosing a human image, there is a possibility that the participants were biased to consider the motions as being generated by a human. We believe, that this does not affect the performance of the different methods relative to each other. However, it is not inconceivable that similar studies with a robotic-like agent could result in a lower percentage of participants who believe that the agent is human. Subsequent studies should consider this bias. For a virtual reality setup, it is possible to add a group that faces a robotic-like avatar. A video-based study could even include videos showing simplistic, moving shapes such as triangles, as previously proposed in [12, 26].

We want to further note that the majority of the participants were male in both studies. There is a chance for a bias because men may perceive motions or social norms differently than women.

8 Conclusions and Future Work

We succeeded in devising a (multi-agent) motion planner that generates human-like trajectories in the sense that the motions of the artificial agent(s) are indistinguishable from human motions. The motion planner is based on repeatedly playing a non-cooperative, static game and searching for Nash equilibria, which approximates the human decision making during navigation in populated environments [81]. Two self-contained studies provided additional—and consistent—support of the human likeness of our motion planner: participants of our online video study and our virtual reality study could not distinguish between human motions and motions that were generated by our game theoretic planner. In contrast, they could tell human motions apart from motions based on reciprocal velocity obstacles or social forces. Our technique shows high potential for robots that navigate in the vicinity of humans and share their workspaces, for example, museum guides or delivery robots. We are confident that in these cases, a human-like motion behavior enhances the acceptance and collaboration between robots and humans. Further promising applications for our technique are computer animations that rely on a realistic motion behavior of simulated humans, for example, in computer games or virtual reality trainings.

Regrettably, the results from our second study indicated that humans feel slightly, but noticeably, less comfortable when moving toward an agent controlled by our motion planner compared to moving toward a human agent. This motivates us to further investigate the cost function and solution concept used for the static game. Recently, Kuleshov and Schrijvers [41] presented an exciting method to combine learning and game theory: inverse game theory determines cost functions that are consistent with a given equilibrium allocation. Learning-based approaches in general are very promising, which suggests a comparison of our algorithm with recent approaches mentioned in parts in Sect. 2. Highly interesting results are published in [14, 38, 48, 77]. Apart from examining this, future work will mainly concentrate on further experimental studies with a real robotic platform, as in Fig. 1. Subsequent experimental investigations are needed to evaluate the efficiency and safety of the presented motion planner. They should further clarify to which extent humans judge motions differently when being in virtual reality or when facing a robot in the real world.