Robust Equilibria in Mean-Payoff Games

Brenguier, Romain

doi:10.1007/978-3-662-49630-5_13

Robust Equilibria in Mean-Payoff Games

Romain Brenguier¹⁵

Conference paper

964 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9634))

Abstract

We study the problem of finding robust equilibria in multiplayer concurrent games with mean payoff objectives. A (k, t)-robust equilibrium is a strategy profile such that no coalition of size k can improve the payoff of one its member by deviating, and no coalition of size t can decrease the payoff of other players. While deciding whether there exists such equilibria is undecidable in general, we suggest algorithms for two meaningful restrictions on the complexity of strategies. The first restriction concerns memory. We show that we can reduce the problem of the existence of a memoryless robust equilibrium to a formula in the (existential) theory of reals. The second restriction concerns randomisation. We suggest a general transformation from multiplayer games to two-player games such that pure equilibria in the first game correspond to winning strategies in the second one. Thanks to this transformation, we show that the existence of robust equilibria can be decided in polynomial space, and that the decision problem is PSPACE-complete.

R. Brenguier—Work supported by ERC Starting Grant inVEST (279499) and EPSRC grant EP/M023656/1.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Games are intensively used in computer science to model interactions in computerised systems. Two player antagonistic games have been successfully used for the synthesis of reactive systems. In this context, the opponent acts as a hostile environment, and winning strategies provide controllers that ensure correctness of the system under any scenario. In order to model complex systems in which several rational entities interact, multiplayer concurrent games come into the picture. Correctness of the strategies can be specified with different solution concepts, which describe formally what is a “good” strategy. In game theory, the fundamental solution concept is Nash equilibrium [15], where no player can benefit from changing its own strategy. The notion of robust equilibria refines Nash equilibria in two ways: 1. a robust equilibrium is resilient, i.e. when a “small” coalition of player changes its strategy, it can not improve the payoff of one of its participants; 2. it is immune, i.e. when a “small” coalition changes its strategy, it will not lower the payoff of the non-deviating players. The size of what is considered a small coalition is determined by a bound k for resilience and another t for immunity. When a strategy is both k-resilient and t-immune, it is called a (k, t)-robust equilibrium. We also generalise this concept to the notion (k, t, r)-robust equilibrium, where if t players are deviating, the others should not have their payoff decrease by more than r.

Example. In the design of network protocols, when many users are interacting, coalitions can easily be formed and resilient strategies are necessary to avoid deviation. It is also likely that some clients are faulty and begin to behave unexpectedly, hence the need for immune strategies.

As an example, consider a program for a server that distributes files, of which a part is represented in Fig. 1. The functions listen and send_files will be run in parallel by the server. Some choices in the design of these functions have not been fixed yet and we wish to analyse the robustness of the different alternatives.

This program uses a table clients to keep track of the clients which are connected. Notice that the table has fixed size 2, which means that if 3 clients try to connect at the same time, one of them may have its socket overwritten in the table and will have to reconnect later to get the file. We want to know what strategy the clients should use and how robust the protocol will be: can clients exploit the protocol to get their files faster than the normal usage, and how will the performance of the over clients be affected.

We consider different strategies to chose between the possible alternatives in the program of Fig. 1. The strategy that chooses alternatives 1 and 3 does not give 1-resilient equilibria even for just two clients: Player 1 can always reconnect just after its socket was closed, so that clients[0] points to player 1 once again. In this way, he can deviate from any profile to never have to wait for the file. Since the second player could do the same thing, no profile is 1-resilient (nor 1-immune). For the same reasons, the strategy 2, 3 does not give 1-resilient equilibria. The strategy 1, 4 does not give 1-resilient equilibria either, since player 1 can launch a new connection after player 2 to overwrite clients[1].

The strategy 2, 4 is the one that may give the best solution. We modelled the interaction of this program as a concurrent game for a situation with 2 potential clients in Fig. 2. The labels represent the content of the table clients: 0 means no connection, 1 means connected with player 1 and 2 connected with player 2; and the instruction that the function send_files is executing. Because the game is quite big we represent only the part where player 1 connects before player 2 and the rest of the graph can be deduced by symmetry. The actions of the players are either to wait (action w) or to connect (action ch or ct). Symbol $*$ means any possible action. Note that in the case both player try to connect at the same time we simulate a matching penny game in order to determine which one will be treated first, this is the reason why we have two different possible actions to connect (ch for “head” and ct for “tail”). Clients have a positive reward when we send them the file they requested, this corresponds for Player i to states labelled with send(i).

If both clients try to connect in the same slot, we use a matching penny game to decide which request was the first to arrive. For a general method to transform a game with continuous time into a concurrent game, see [6, Chap. 6].

Related Works and Comparison with Nash Equilibria and Secure Equilibria. Other solution concepts have been proposed as concepts for synthesis of distributed systems, in particular Nash equilibrium [4, 18, 19], subgame perfect equilibria [10, 17], and secure equilibria [12]. A subgame perfect equilibria is a particular kind of Nash equilibria, where at any point in the game, if we forget the history the players are still playing a Nash equilibrium. In a secure equilibria, we ask that no player can benefit or keep the same reward while reducing the payoff of other players by changing its own strategy. However these concepts present two weaknesses: 1. There is no guarantee when two (or more) users deviate together. It can happen on a network that the same person controls several devices (a laptop and a phone for instance) and can then coordinate there behaviour. In that case, the devices would be considered as different agents and Nash equilibria offers no guarantee. 2. When a deviation occurs, the strategies of the equilibrium can punish the deviating user without any regard for payoffs of the others. This can result in a situation where, because of a faulty device, the protocol is totally blocked. By comparison, finding resilient equilibria with k greater than 1, ensures that clients have no interest in forming coalitions (up to size k), and finding immune equilibria with t greater than 0 ensures that other clients will not suffer from some agents (up to t) behaving differently from what was expected.

Note that the concept of robust equilibria for games with LTL objectives is expressible in logics such as strategy logic [13] or ATL$^*$ [2]. However, satisfiability in these logic is difficult: it is 2EXPTIME-complete for ATL$^*$ and undecidable for strategy logic in general (2EXPTIME-complete fragments exist [14]). Moreover, these logics cannot express equilibria in quantitative games such as mean-payoff.

Contributions. In this paper, we study the problem of finding robust equilibria in multiplayer concurrent games. This problem is undecidable in general (see Sect. 2.3). In Sect. 3, we show that if we look for stationary (but randomised) strategies, then the problem can be decided using the theory of reals. We then turn to the case of pure (but memoryful) strategies. In Sect. 4, we describe a generic transformation from multiplayer games to two-player games. The resulting two-player game is called the deviator game. We show that pure equilibria in the original game correspond to winning strategies in the second one. In Sect. 5, we study quantitative games with mean-payoff objectives. We show that the game obtained by our transformation is equivalent to a multidimensional mean-payoff game. We then show that this can be reduced to a value problem with linear constraints in multidimensional mean-payoff games. We show that this can be solved in polynomial space, by making use of the structure of the deviator game. In Sect. 7, we prove the matching lower bound which shows the robustness problem is PSPACE-complete. Due to space constraints, most proofs have been omitted from this paper; they can be found in the long version of this paper [7].

2 Definitions

2.1 Weighted Concurrent Games

We study concurrent game as defined in [2] with the addition of weights on the edges.

Concurrent Games. A weighted concurrent game (or simply a game) ${\mathcal {G}} $ is a tuple $\langle \textsf {Stat}, s_0,$ $\textsf {Agt}, \textsf {Act}, \textsf {Tab}, (w_A)_{A\in \textsf {Agt}} \rangle $, where: $\textsf {Stat}$ is a finite set of states and $s_0 \in \textsf {Stat}$ is the initial state; $\textsf {Agt}$ is a finite set of players; $\textsf {Act}$ is a finite set of actions; a tuple $(a_A)_{A\in \textsf {Agt}}$ containing one action $a_A$ for each player A is called a move; $\textsf {Tab}: \textsf {Stat}\times \textsf {Act}^\textsf {Agt}\rightarrow \textsf {Stat}$ is the transition function, it associates with a given state and a given move, the resulting state; for each player $A \in \textsf {Agt}$, $w_A :\textsf {Stat}\mapsto \mathbb {Z}$ is a weight function which assigns to each agent an integer weight.

In a game ${\mathcal {G}} $, whenever we arrive at a state $s$, the players simultaneously select an action. This results in a move $a_\textsf {Agt}$; the next state of the game is then $\textsf {Tab}(s,a_\textsf {Agt})$. This process starts from $s_0$ and is repeated to form an infinite sequence of states.

An example of a game is given in Fig. 2. It models the interaction of two clients $A_1$ and $A_2$ with the program presented in the introduction. The weight functions for this game are given by $w_{A_1} = 1$ in states labelled by send(1) and $w_{A_1} = 0$ elsewhere, similarly $w_{A_2} = 1$ in states labelled by send(2).

History and Plays. A history of the game ${\mathcal {G}} $ is a finite sequence of states and moves ending with a state, i.e. an word in $(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^*\cdot \textsf {Stat}$. We write $h_i$ the i-th state of h, starting from 0, and $\textsf {move}_i(h)$ its i-th move, thus $h= h_0 \cdot \textsf {move}_0(h) \cdot h_1 \cdots \textsf {move}_{n-1}(h) \cdot h_n$. The length |h| of such a history is $n+1$. We write $\texttt {last}(h)$ the last state of h, i.e. $h_{|h|-1}$. A play $\rho $ is an infinite sequence of states and moves, i.e. an element of $(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^\omega $. We write $\rho _{\le n}$ for the prefix of $\rho $ of length $n+1$, i.e. the history $\rho _0 \cdot \textsf {move}_0(\rho ) \cdots \textsf {move}_{n-1}(\rho ) \cdot \rho _n$.

The mean-payoff of weight w along a play $\rho $ is the average of the weights along the play: ${\textsf {MP}}_w(\rho ) = \liminf _{n \rightarrow \infty } \frac{1}{n} \sum _{0 \le k\le n} w(\rho _{k}). $ The payoff for agent $A\in \textsf {Agt}$ of a play $\rho $ is the mean-payoff of the corresponding weight: $\textsf {payoff} _A(\rho ) = {\textsf {MP}}_{w_A}(\rho )$. Note that it only depends on the sequence of states, and not on the sequence of moves. The payoff vector of the run $\rho $ is the vector $p\in \mathbb {R}^\textsf {Agt}$ such that for all players $A \in \textsf {Agt}$, $p_A = \textsf {payoff} _A(\rho )$; we simply write $\textsf {payoff} (\rho )$ for this vector.

Strategies. Let ${\mathcal {G}} $ be a game, and $A\in \textsf {Agt}$. A strategy for player A maps histories to probability distributions over actions. Formally, a strategy is a function $\sigma _A:(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^* \cdot \textsf {Stat}\rightarrow \mathcal {D}(\textsf {Act})$, where $\mathcal {D}(\textsf {Act})$ is the set of probability distributions over $\textsf {Act}$. For an action $a \in \textsf {Act}$, we write $\sigma _A(a \mid h)$ the probability assigned to a by the distribution $\sigma (h)$. A coalition $C\subseteq \textsf {Agt}$ is a set of players, its size is the number of players it contains and we write it |C|. A strategy $\sigma _C = (\sigma _A)_{A\in C}$ for a coalition $C\subseteq \textsf {Agt}$ is a tuple of strategies, one for each player in C. We write $\sigma _{-C}$ for a strategy of coalition $\textsf {Agt}\setminus C$. A strategy profile is a strategy for $\textsf {Agt}$. We will write $(\sigma _{- C},\sigma '_C)$ for the strategy profile $\sigma ''_\textsf {Agt}$ such that if $A \in C$ then $\sigma ''_A = \sigma '_A$ and otherwise $\sigma ''_A = \sigma _A$. We write $\textsf {Strat}_{\mathcal {G}} (C)$ for the set of strategies of coalition C. A strategy $\sigma _A$ for player A is said deterministic if it does not use randomisation: for all histories h there is an action a such that $\sigma (a \mid h) = 1$. A strategy $\sigma _A$ for player A is said stationary if it depends only on the last state of the history: for all histories h, $\sigma _A(h) = \sigma _A(\texttt {last}(h))$.

Outcomes. Let C be a coalition, and $\sigma _C$ a strategy for C. A history h is compatible with the strategy $\sigma _C$ if, for all $k<|h| - 1$, $(\textsf {move}_k(h))_A = \sigma _A(h_{\le k})$ for all $A\in C$, and $\textsf {Tab}(h_{k}, \textsf {move}_k(h)) = h_{k+1}$. A play $\rho $ is compatible with the strategy $\sigma _C$ if all its prefixes are. We write $\textsf {Out}_{{\mathcal {G}}}(s,\sigma _C)$ for the set of plays in ${\mathcal {G}} $ that are compatible with strategy $\sigma _C$ of C and have initial state $s$, these paths are called outcomes of $\sigma _C$ from $s$. We simply write $\textsf {Out}_{{\mathcal {G}}}(\sigma _C)$ when $s= s_0$ and $\textsf {Out}_{{\mathcal {G}}}$ is the set of plays that are compatible with some strategy. Note that when the coalition C is composed of all the players and the strategies are deterministic the outcome is unique.

An objective $\varOmega $ is a set of plays and a strategy $\sigma _C$ is said winning for objective $\varOmega $ if all its outcomes belong to $\varOmega $.

Probability Measure Induced by a Strategy Profile. Given a strategy profile $\sigma _\textsf {Agt}$, the conditional probability of $a_\textsf {Agt}$ given history $h\in (\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^* \cdot \textsf {Stat}$ is $\sigma _\textsf {Agt}(a_\textsf {Agt}\mid h) = \prod _{A \in \textsf {Agt}} \sigma _A(a_A\mid h)$. The probabilities $\sigma _\textsf {Agt}(a_\textsf {Agt}\mid h)$ induce a probability measure on the Borel $\sigma $-algebra over $(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^\omega $ as follows: the probability of a basic open set $h \cdot (\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^\omega $ equals the product $\prod _{j=1}^n \sigma _\textsf {Agt}(h^\textsf {Act}_{j,A} \mid h_{\le j})$ if $h_0 = s_0$ and $\textsf {Tab}(h_j,h^\textsf {Act}_{\textsf {Agt},j}) = h_{j+1}$ for all $1 \le j < n$; in all other cases, this probability is 0. By Carath?odory’s extension theorem, this extends to a unique probability measure assigning a probability to every Borel subset of $(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^\omega $, which we denote by $\Pr ^{\sigma _\textsf {Agt}}$. We denote by $\text {E}^{\sigma _\textsf {Agt}}$ the expectation operator that corresponds to $\Pr ^{\sigma _\textsf {Agt}}$, that is $\text {E}^{\sigma _\textsf {Agt}}(f) = \int f\ \text {d}\Pr ^{\sigma _\textsf {Agt}}$ for all Borel measurable functions $f :(\textsf {Stat}\cdot \textsf {Act}^\textsf {Agt})^\omega \mapsto \mathbb {R} \cup \{ \pm \infty \}$. The expected payoff for player A of a strategy profile $\sigma _\textsf {Agt}$ is $\textsf {payoff} _A(\sigma _\textsf {Agt}) = \text {E}^{\sigma _\textsf {Agt}}({\textsf {MP}}_{w_A})$.

2.2 Equilibria Notions

We now present the different solution concepts we will study. Solution concepts are formal descriptions of “good” strategy profiles. The most famous of them is Nash equilibrium [15], in which no single player can improve the outcome for its own preference relation, by only changing its strategy. This notion can be generalised to consider coalitions of players, it is then called a resilient strategy profile. Nash equilibria correspond to the special case of 1-resilient strategy profiles.

Resilience [3]. Given a coalition $C \subseteq \textsf {Agt}$, a strategy profile $\sigma _{\textsf {Agt}}$ is C-resilient if for all agents A in C, A cannot improve her payoff even if all agents in C change their strategies, i.e. $\sigma _{\textsf {Agt}}$ is said C-resilient when:

$$ \forall \sigma '_C \in \textsf {Strat}_{\mathcal {G}} (C).\ \forall A\in C.\ \textsf {payoff} _{A} (\sigma _{-C}, \sigma '_C) \le \textsf {payoff} _A(\sigma _\textsf {Agt}) $$

Given an integer k, we say that a strategy profile is k-resilient if it is C-resilient for every coalition C of size k.

Immunity [1]. Immune strategies ensure that players not deviating are not too much affected by deviation. Formally, a strategy profile $\sigma _{\textsf {Agt}}$ is (C, r)-immune if all players not in C, are not worse off by more than r if players in C deviates, i.e. when:

$$ \forall \sigma '_C \in \textsf {Strat}_{\mathcal {G}} (C).\ \forall A\in \textsf {Agt}\setminus C.\ \textsf {payoff} _A(\sigma _\textsf {Agt}) - r \le \textsf {payoff} _A(\sigma _{-C}, \sigma '_C) $$

Given an integer t, a strategy profile is said (t, r)-immune if it is (C, r)-immune for every coalition C of size t. Note that t-immunity as defined in [1] corresponds to (t, 0)-immunity.

Robust Equilibrium [1]. Combining resilience and immunity, gives the notion of robust equilibrium: a strategy profile is a (k, t, r)-robust equilibrium if it is both k-resilient and (t, r)-immune.

The aim of this article is to characterise robust equilibria in order to construct the corresponding strategies, and precisely describe the complexity of the following decision problem for mean-payoff games.

Robustness Decision Problem. Given a game ${\mathcal {G}} $, integers k, t, rational r does there exist a profile $\sigma _\textsf {Agt}$, that is a (k, t, r)-robust equilibrium $\sigma _\textsf {Agt}$ in ${\mathcal {G}} $?

2.3 Undecidability

We show that allowing strategies that can use both randomisation and memory leads to undecidability. The problem of existence of Nash equilibria has been shown to be undecidable if we put constraints on the payoffs in [19]. The proof was improved to involve only 3 players and no constraint on the payoffs for games with terminal-reward (the weights are non-zero only in terminal vertices of the game) [5]. This corresponds to the particular case where $k=1$, $t=0$ for the robustness decision problem. Therefore, the following is a corollary of [5, Theorem 13].

Theorem 1

For randomised strategies, the robustness problem is undecidable even for 3 players, $k=1$ and $t=0$.

To recover decidability, two restrictions are natural. In the next section, we will show that for randomised strategies with no memory, the problem is decidable. The rest of the article is devoted to the second restriction, which concerns pure strategies.

3 Stationary Strategies

We use the existential theory of reals, which is the set of all existential first-order sentences that hold in the ordered field $\mathfrak {R} := (\mathbb {R},+,\cdot ,0,1,\le )$, to show that the robustness decision problem is decidable. The associated decision problem is in the PSPACE complexity class [11]. However, since the system of equation we produce is of exponential size, we can only show that the robustness problem for (randomised) stationary strategies is in EXPSPACE.

Encoding of Strategies. We encode a stationary strategy $\sigma _A$, by a tuple of real variables $(\varsigma _{s,a})_{s\in \textsf {Stat},a\in \textsf {Act}}\in \mathbb {R}^{\textsf {Stat}\times \textsf {Act}}$ such that $\varsigma _{s,a} = \sigma _A(a \mid s)$ for all $s\in \textsf {Stat}$ and $a\in \textsf {Act}$. We then write a formula saying that these variables describe a correct stationary strategy. The following lemma is a direct consequence of the definition of strategy.

Lemma 1

Let $(\varsigma _{s,a})_{s\in S, a \in \textsf {Act}}$ be a tuple of real variables. The mapping $\sigma :s,a \mapsto \varsigma _s$ is a stationary strategy if, and only if, $\varsigma $ is a solution of the following equation:

$$\mu (\varsigma ) := \bigwedge _{s\in S} \left( \sum _{a \in \textsf {Act}} \varsigma _{s,a} = 1 \wedge \bigwedge _{a\in \textsf {Act}} \varsigma _{s,a} \ge 0 \right) $$

Payoff of a Profile. We now give an equation which links a stationary strategy profile and its payoff. For this we notice that a concurrent game where the stationary strategy profile has been fixed corresponds to a Markov reward process [16]. We recall that a Markov reward process is a tuple $\langle S, P, r \rangle $ where: 1. S is a set of states; 2. $P \in \mathbb {R}^{S\times S}$ is a transition matrix; 3. $r :S \mapsto \mathbb {R}$ is a reward function. The expected value is then the expectation of the average reward of r. This value for each state is described by a system of equations in [16, Theorem 8.2.6]. We reuse these equations to obtain Theorem 2. Details of the proof are in the appendix.

Theorem 2

Let $\sigma _\textsf {Agt}$ be a stationary strategy profile. The expectation for ${\textsf {MP}}_w$ of $\sigma _\textsf {Agt}$ from s is the component $\gamma _s$ of the solution $\gamma $ of the equation:

$\psi (\gamma ,\sigma _\textsf {Agt},w):= \exists \beta \in \mathbb {R}^S.\bigwedge _{s \in S} \left( \! \gamma _s = \sum _{a_\textsf {Agt}\in \textsf {Act}^\textsf {Agt}} \gamma _{\textsf {Tab}(s,a_\textsf {Agt})} \cdot \prod _{A \in \textsf {Agt}} \sigma _A(a_A\mid s)\! \right) \!$

$ \wedge \bigwedge _{s \in S} \left( \gamma _s + \beta _s = w(s) + \sum _{a_\textsf {Agt}\in \textsf {Act}^\textsf {Agt}} \beta _{\textsf {Tab}(s,a_\textsf {Agt})} \cdot \prod _{A \in \textsf {Agt}} \sigma _A(a_A \mid s) \right) $

Optimal Payoff of a Deviation. We now want to keep the strategy profile fixed but also allow deviations and investigate what is the maximum payoff that a coalition can achieve by deviating. The system that is obtained is then a Markov decision processes. We recall that a Markov decision process (MDP) is a tuple $\langle S,A,P, r\rangle $, where: 1. S is a the non-empty, countable set of states. 2. A is a set of actions. 3. $P :S\times A \times S \mapsto [0,1]$ is the transition relation. It is such that for each $s \in S \setminus S_\exists $ and $a \in A$, $\sum _{(s,a,s') \in S \times A \times S} P(s,a,s') = 1$. 4. $r :S \mapsto \mathbb {R}$ is the reward function. The optimal value is then maximal expectation that can be obtained by a strategy. The optimal values in such systems can be described by a linear program [16, Sect. 9.3]. We reuse this linear program to characterise optimal values against a strategy profile C. Details of the proof can be found in the appendix.

Theorem 3

Let $\sigma _\textsf {Agt}$ be a stationary strategy profile, C a coalition, s a state and w a weight function. The highest expectation $\textsf {Agt}\setminus C$ can obtain for w in ${\mathcal {G}} $ from s: $\sup _{\sigma _{-C}} E^{\sigma _C, \sigma _{-C}}({\textsf {MP}}_w,s)$, is the smallest $\gamma _s$ component of a solution of the system of inequation:

$\phi (\gamma ,\sigma _C,w) :=\bigwedge _{a_{-C}\in \textsf {Act}^{\textsf {Agt}\setminus C}} \bigwedge _{s\in S} \gamma _s \ge \sum _{a_C \in \textsf {Act}^C} \gamma _{\delta (s,a_C,a_{-C})} \cdot \prod _{A\in C} \sigma _A(a_A \mid s)$

$\wedge \bigwedge _{a_{-C} \in \textsf {Act}^{\textsf {Agt}\setminus C}} \bigwedge _{s\in S} \gamma _s\ge w(s) - \beta _{s} + \sum _{a_C \in \textsf {Act}^C} \left( \beta _{\delta (s,a_C,a_{-C})} \cdot \prod _{A\in C} \sigma _A(a_A \mid s)\right) \!$

Expressing the Existence of Robust Equilibria. Putting these results together, we obtain the results. Intuitively, $\psi (\gamma ,\sigma _\textsf {Agt},w_A)$ ensures that $\gamma $ correspond to the payoff for $\sigma _\textsf {Agt}$ of A, and $\phi (\gamma ',\sigma _{-C},w_A)$ makes $\gamma '$ correspond to the payoff for a deviation of $\sigma _C$.

Theorem 4

Let $\sigma _\textsf {Agt}$ be a strategy profile.

It is C-resilient if, and only if, it satisfies the formula: $ \rho (C,\sigma _\textsf {Agt}) := \bigwedge _{A\in C} \exists \gamma ,\gamma '.\ \left( \psi (\gamma ,\sigma _\textsf {Agt},w_A) \wedge \phi (\gamma ',\sigma _{-C},w_A) \wedge (\gamma ' \le \gamma )\right) $
It is C, r-immune if, and only if, it satisfies equation: $ \iota (C,\sigma _\textsf {Agt}) := \bigwedge _{A\not \in C} \exists \gamma ,\gamma '.\ \left( \psi (\gamma ,\sigma _\textsf {Agt},A) \wedge \phi (\gamma ',\sigma _{-C},-w_A) \wedge \gamma - r \le -\gamma '\right) $
There is a robust equilibria if, and only if, the following equation is satisfiable: $ \exists \varsigma \in \mathbb {R}^{\textsf {Agt}\times \textsf {Stat}\times \textsf {Act}}.\ \mu (\varsigma ) \wedge \bigwedge _{C \subseteq \textsf {Agt}\mid |C| \le k} \rho (C,\varsigma ) \wedge \bigwedge _{C \subseteq \textsf {Agt}\mid |C| \le t} \iota (C,\varsigma ) $

Theorem 5

The robustness problem is in EXPSPACE for stationary strategies.

Proof

By Theorem 4, the existence of a robust equilibria is equivalent to the satisfiability of a formula in the existential theory of reals. This formula can be of exponential size with respect to k and t, since a conjunction over coalitions of these size is considered. The best known upper bound for the theory of the reals in PSPACE [11], which gives the EXPSPACE upper bound for our problem.

4 Deviator Game

We now turn to the case of non-randomised strategies. In order to obtain simple algorithms for the robustness problem, we use a correspondence with zero-sum two-players game. Winning strategies has been well studied in computer science and we can make use of existing algorithms. We present the deviator game, which is a transformation of multiplayer game into a turn-based zero-sum game, such that there are strong links between robust equilibria in the first one and winning strategies in the second one. This is formalised in Theorem 6. Note that the proofs of this section are independent from the type of objectives we consider, and the result could be extended beyond mean-payoff objectives.

Deviator. The basic notion we use to solve the robustness problem is that of deviators. It identifies players that cause the current deviation from the expected outcome. A deviator from move $a_\textsf {Agt}$ to $a'_\textsf {Agt}$ is a player $D \in \textsf {Agt}$ such that $a_D \ne a'_D$. We write this set of deviators: $ \textsf {Dev}(a_\textsf {Agt},a'_\textsf {Agt}) = \{ A\in \textsf {Agt}\mid a_A \ne a'_A \}. $ We extend the definition to histories and strategies by taking the union of deviator sets, formally $\textsf {Dev}(h,\sigma _\textsf {Agt}) = \bigcup _{0 \le i < |h|} \textsf {Dev}(\textsf {move}_i(h), \sigma _\textsf {Agt}(h_{\le i}))$. It naturally extends to plays: if $\rho $ is a play, then $\textsf {Dev}(\rho ,\sigma _\textsf {Agt}) = \bigcup _{i\in \mathbb {N}} \textsf {Dev}(\textsf {move}_i(\rho ), \sigma _\textsf {Agt}(\rho _{\le i}))$.

Intuitively, given an play $\rho $ and a strategy profile $\sigma _\textsf {Agt}$, deviators represent the agents that need to change their strategies from $\sigma _\textsf {Agt}$ in order to obtain the play $\rho $. The intuition is formalised in the following lemma.

Lemma 2

Let $\rho $ be a play, $\sigma _\textsf {Agt}$ a strategy profile and $C \subseteq \textsf {Agt}$ a coalition. Coalition C contains $\textsf {Dev}(\rho ,\sigma _\textsf {Agt})$ if, and only if, there exists $\sigma '_C$ such that $\rho \in \textsf {Out}_{\mathcal {G}} (\rho _0, \sigma '_C, \sigma _{-C})$.

4.1 Deviator Game

We now use the notion of deviators to draw a link between multiplayer games and a two-player game that we will use to solve the robustness problem. Given a concurrent game structure ${\mathcal {G}} $, we define the deviator game ${\mathcal {D}({\mathcal {G}})} $ between two players called Eve and Adam. Intuitively Eve needs to play according to an equilibrium, while Adam tries to find a deviation of a coalition which will profit one of its player or harm one of the others. The states are in $\textsf {Stat}' = \textsf {Stat}\times 2^\textsf {Agt}$; the second component records the deviators of the current history. The game starts in $(s_0,\varnothing )$ and then proceeds as follows: from a state (s, D), Eve chooses an action profile $a_\textsf {Agt}$ and Adam chooses another one $a'_\textsf {Agt}$, then the next state is $(\textsf {Tab}(s,a'_\textsf {Agt}),D \cup \textsf {Dev}(a_\textsf {Agt},a'_\textsf {Agt}))$. In other words, Adam chooses the move that will apply, but this can be at the price of adding players to the D component when he does not follow the choice of Eve. The weights of a state (s, D) in this game are the same than that of s in ${\mathcal {G}} $. The construction of the deviator arena is illustrated in Fig. 3.

We now define some transformations between the different objects used in games ${\mathcal {G}} $ and $\mathcal {D}({\mathcal {G}})$. We define projections $\pi _\textsf {Stat}$, $\pi _\textsf {Dev}$ and $\pi _\textsf {Act}$ from $\textsf {Stat}'$ to $\textsf {Stat}$, from $\textsf {Stat}'$ to $2^\textsf {Agt}$ and from $\textsf {Act}^\textsf {Agt}\times \textsf {Act}^\textsf {Agt}$ to $\textsf {Act}^\textsf {Agt}$ respectively. They are given by $\pi _\textsf {Stat}(s,D) = s$, $\pi _\textsf {Dev}(s,D)=D$ and $\pi _\textsf {Act}(a_\textsf {Agt},a'_\textsf {Agt}) = a'_\textsf {Agt}$. We extend these projections to plays in a natural way, letting $\pi _\textsf {Out}(\rho ) = \pi _\textsf {Stat}(\rho _0) \cdot \pi _\textsf {Act}(\textsf {move}_0(\rho )) \cdot \pi _\textsf {Stat}(\rho _1) \cdot \pi _\textsf {Act}(\textsf {move}_1(\rho )) \cdots $ and $\pi _\textsf {Dev}(\rho ) = \pi _\textsf {Dev}(\rho _0) \cdot \pi _\textsf {Dev}(\rho _1) \cdots $. Note that for any play $\rho $, and any index i, $\pi _\textsf {Dev}(\rho _i) \subseteq \pi _\textsf {Dev}(\rho _{i+1})$, therefore $\pi _\textsf {Dev}(\rho )$ seen as a sequence of sets of coalitions is increasing and bounded by $\textsf {Agt}$, its limit $\delta (\rho )=\cup _{i\in \mathbb {N}} \pi _\textsf {Dev}(\rho _i)$ is well defined. Moreover to a strategy profile $\sigma _\textsf {Agt}$ in ${\mathcal {G}} $, we can naturally associate a strategy $\kappa (\sigma _\textsf {Agt})$ for Eve in ${\mathcal {D}({\mathcal {G}})} $ such that for all histories h by $\kappa (\sigma _\textsf {Agt})(h) = \sigma _\textsf {Agt}(\pi _\textsf {Out}(h))$.

The following lemma states the correctness of the construction of the deviator game, in the sense that it records the set of deviators in the strategy profile suggested by Adam with respect to the strategy profile suggested by Eve.

Lemma 3

Let ${\mathcal {G}} $ be a game and $\sigma _\textsf {Agt}$ be a strategy profile and $\sigma _\exists = \kappa (\sigma _\textsf {Agt})$ the associated strategy in the deviator game.

1.
If $\rho \in $ Out $_{{\mathcal {D}({\mathcal {G}})}} (\sigma _\exists )$, then Dev $(\pi _\textsf {Out}(\rho ),\sigma _\textsf {Agt}) = \delta (\rho )$.
2.
If $\rho \in $ Out $_{\mathcal {G}} $ and $\rho '= ((\rho _i,$ Dev $(\rho _{\le i},\sigma _\textsf {Agt})) \cdot (\sigma _\textsf {Agt}(\rho _{\le i}), \textsf {move}_i(\rho )))_{i\in \mathbb {N}}$ then $\rho ' \in $ Out $_{\mathcal {D}({\mathcal {G}})} (\sigma _\exists )$

4.2 Objectives of the Deviator Game

We now show how to transform equilibria notions into objectives of the deviator game. These objectives are defined so that winning strategies correspond to equilibria of the original game. First, we define an objective $\varOmega (C,A,G)$ in the following lemma, such that a profile which ensures some quantitative goal $G\subseteq \mathbb {R}$ in ${\mathcal {G}} $ against coalition C corresponds to a winning strategy in the deviator game.

Lemma 4

Let $C\subseteq \textsf {Agt}$ be a coalition, $\sigma _\textsf {Agt}$ be a strategy profile, $G \subseteq \mathbb {R}$ and A a player. We have that for all strategies $\sigma '_C$ for coalition C, $\textsf {payoff} _A(\sigma _{-C},\sigma '_C) \in G$ if, and only if, $\kappa (\sigma _\textsf {Agt})$ is winning in ${\mathcal {D}({\mathcal {G}})} $ for objective $\varOmega (C,A,G) = \{ \rho \mid \delta (\rho ) \subseteq C \Rightarrow \textsf {payoff} _A(\pi _\textsf {Out}(\rho )) \in G \}$.

This lemma makes it easy to characterise the different kinds of equilibria, using objectives in ${\mathcal {D}({\mathcal {G}})} $. For instance, we define a resilience objective where if there are more than k deviators then Eve has nothing to do; if there are exactly k deviators then she has to show that none of them gain anything; and if there are less than k then no player at all should gain anything. This is because if a new player joins the coalition, its size remains smaller or equal to k. Similar characterisations for immune and robust equilibria lead to the following theorem.

Theorem 6

Let ${\mathcal {G}} $ be a concurrent game, $\sigma _\textsf {Agt}$ a strategy profile in ${\mathcal {G}} $, $p = \textsf {payoff} (\textsf {Out}(\sigma _\textsf {Agt}))$ the payoff profile of $\sigma _\textsf {Agt}$, k and t integers, and r a rational.

The strategy profile $\sigma _\textsf {Agt}$ is k-resilient if, and only if, strategy $\kappa (\sigma _\textsf {Agt})$ is winning in $\mathcal {D}({\mathcal {G}})$ for the resilience objective $\mathcal {R}e(k,p)$ where $\mathcal {R}e(k,p)$ is defined by: $\mathcal {R}e(k,p) = $ $ \{ \rho \mid ~ |\delta (\rho )| > k \} $ $\cup \{ \rho \mid ~ |\delta (\rho )| = k \wedge \forall A \in \delta (\rho ).\ \textsf {payoff} _{A}(\pi _\textsf {Out}(\rho )) \le p(A)\}\cup \{ \rho \mid ~ |\delta (\rho )| < k \wedge \forall A \in Agt.\ \textsf {payoff} _{A}(\pi _\textsf {Out}(\rho )) \le p(A) \} $
The strategy profile $\sigma _\textsf {Agt}$ is (t, r)-immune if, and only if, strategy $\kappa (\sigma _\textsf {Agt})$ is winning for the immunity objective $\mathcal {I}(t,r,p)$ $\mathcal {I}(t,r,p)$ is defined by: $\mathcal {I}(t,r,p) =$ $\{ \rho \mid |\delta (\rho )| > t \} \cup \{ \rho \mid ~ \forall A \in \textsf {Agt}\setminus \delta (\rho ).\ p(A) - r \le \textsf {payoff} _{A}(\pi _\textsf {Stat}(\rho )) \} $
The strategy profile $\sigma _\textsf {Agt}$ is a (k, t, r)-robust profile in ${\mathcal {G}} $ if, and only if, $\kappa (\sigma _\textsf {Agt})$ is winning for the robustness objective $\mathcal {R}(k,t,r,p)= \mathcal {R}e(k,p) \cap \mathcal {I}(t,r,p)$.

5 Reduction to Multidimensional Mean-Payoff Objectives

We first show that the deviator game reduces the robustness problem to a winning strategy problem in multidimensional mean-payoff games. We then solve this by requests to the polyhedron value problem of [9].

5.1 Multidimensional Objectives

Multidimensional Mean-Payoff Objective. Let $\mathcal {G}$ be a two-player game, $v :\textsf {Stat}\mapsto \mathbb {Z}^{d}$ a multidimensional weight functions and $I,J \subseteq [\![1, d ]\!]$ ^{Footnote 1} a partition of $[\![1, d]\!]$ (i.e. $I \uplus J = [\![1, d ]\!]$). We say that Eve ensures threshold $u\in \mathbb {R}^{d}$ if she has a strategy $\sigma _\exists $ such that all outcomes $\rho $ of $\sigma _\exists $ are such that for all $i \in I$, ${\textsf {MP}}_{v_i}(\rho ) \ge u_i$ and for all $j \in J$, $\overline{{\textsf {MP}}}_{v_{j}}(\rho ) \ge u_{j}$, where ${\overline{\textsf {MP}}}_{v_j}(\rho ) = \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{0 \le k\le n} v_j(\rho _{k}). $ That is, for all dimensions $i\in I$, the limit inferior of the average of $v_i$ is greater than $u_i$ and for all dimensions $j\in J$ the limit superior of $v_j$ is greater than $u_j$.

We consider two decision problems on these games: 1. The value problem, asks given $\langle \mathcal {G},v,I,J\rangle $ a game with multidimensional mean-payoff objectives, and $u\in \mathbb {R}^d$, whether Eve can ensure u. 2. The polyhedron value problem, asks given $\langle \mathcal {G},v,I,J\rangle $ a game with multidimensional mean-payoff objectives, and $(\lambda _1,\dots ,\lambda _n)$ a tuple of linear inequations, whether there exists a threshold u which Eve can ensure and that satisfies the inequation $\lambda _i$ for all i in $[\![1 ,d ]\!]$. We assume that all linear inequations are given by a tuple $(a_1,\dots ,a_d,b)\in \mathbb {Q}^{d+1}$ and that a point $u\in \mathbb {R}^d$ satisfies it when $\sum _{i\in [\![1 , d ]\!]} a_i \cdot u_i \ge b$. The value problem was showed to be coNP-complete [20] while the polyhedron value problem is $\Sigma _2$ P-complete [9]. Our goal is now to reduce our robustness problem to a polyhedron value problem for some well chosen weights.

In our case, the number d of dimensions will be equal to $4 \cdot |\textsf {Agt}|$. We then number players so that $\textsf {Agt}= \{ A_1, \dots , A_{|\textsf {Agt}|}\}$. Let $W = \max \{ |w_{i}(s)| \mid A_i \in \textsf {Agt}, s \in \textsf {Stat}\}$ be the maximum constant occurring in the weights of the game, notice that for all players $A_i$ and play $\rho $, $ - W - 1 < {\textsf {MP}}_{i}(\rho ) \le W$. We fix parameters k, t and define our weight function $v:\textsf {Stat}\mapsto \mathbb {Z}^{d}$. Let $i\in [\![1 ,|\textsf {Agt}| ]\!]$, the weights are given for $(s,D) \in \textsf {Stat}\times 2^\textsf {Agt}$ by:

1.
if $|D|\le t$ and $A_i\not \in D$, then $v_i(s,D) = w_{A_i}(s)$;
2.
if $|D|> t$ or $A_i\in D$, then $v_i(s,D) = W$;
3.
if $|D|<k$, then for all $A_i \in \textsf {Agt}$, $v_{|\textsf {Agt}|+i}(s,D) = - w_{A_i}(s)$;
4.
if $|D|=k$ and $A_i\in D$, then $v_{|\textsf {Agt}|+i}(s,D) = - w_{A_i}(s)$;
5.
if $|D| > k$ or $A_i \not \in D$ and $|D| = k$, then $v_{|\textsf {Agt}|+i}(s,D) = W$.
6.
if $D = \varnothing $ then $v_{2\cdot |\textsf {Agt}|+i}(s,D) = w_{A_i}(s)= -v_{3\cdot |\textsf {Agt}|+i}(s,D)$;
7.
if $D \ne \varnothing $ then $v_{2\cdot |\textsf {Agt}|+i}(s,D) = W = v_{3\cdot |\textsf {Agt}|+i}(s,D)$;

We take $I = [\![1 , |\textsf {Agt}|]\!]\cup [\![2\cdot |\textsf {Agt}|+1, 3 \cdot |\textsf {Agt}| ]\!]$ and $J = [\![|\textsf {Agt}|+1, 2\cdot |\textsf {Agt}|]\!]\cup [\![3 \cdot |\textsf {Agt}|+1, 4\cdot |\textsf {Agt}|]\!]$. Intuitively, the components $[\![1,|\textsf {Agt}|]\!]$ are used for immunity, the components $[\![|\textsf {Agt}|+1, 2\cdot |\textsf {Agt}|]\!]$ are used for resilience and components $[\![2\cdot |\textsf {Agt}|+1, 4\cdot |\textsf {Agt}|]\!]$ are used to constrain the payoff in case of no deviation.

5.2 Correctness of the Objectives for Robustness

Let ${\mathcal {G}} $ be a concurrent game, $\rho $ a play of ${\mathcal {D}({\mathcal {G}})} $ and $p\in \mathbb {R}^\textsf {Agt}$ a payoff vector. The following lemma links the weights we chose and our solution concepts.

Lemma 5

Let $\rho $ be a play, $\sigma _\textsf {Agt}$ a strategy profile and $ p =\textsf {payoff} (\sigma _\textsf {Agt})$.

$\rho $ satisfies objective $\delta (\rho ) = \varnothing \Rightarrow {\textsf {MP}}_{A_i}(\rho ) = p_i$ if, and only if, ${\textsf {MP}}_{2\cdot v_{|\textsf {Agt}|+i}}(\rho ) \ge p(A_i)$ and ${\overline{\textsf {MP}}}_{3\cdot v_{|\textsf {Agt}|+i}}(\rho ) \ge -p(A_i)$.
If $\rho $ is an outcome of $\kappa (\sigma _\textsf {Agt})$ then $\rho $ satisfies objective $\mathcal {R}e(k,p)$ if, and only if, for all agents $A_i$, ${\overline{\textsf {MP}}}_{v_{|\textsf {Agt}|+i}}(\rho ) \ge -p(A_i)$.
If $\rho $ is an outcome of $\kappa (\sigma _\textsf {Agt})$, then $\rho $ satisfies objective $\mathcal {I}(t,r,p)$ if, and only if, for all agents $A_i$, ${\textsf {MP}}_{v_i}(\rho ) \ge p(A_i)-r$.
If $\rho $ is an outcome of $\kappa (\sigma _\textsf {Agt})$ with $\textsf {payoff} (\sigma _\textsf {Agt}) = p$, then play $\rho $ satisfies objective $\mathcal {R}(k,t,r,p)$ if, and only if, for all agents $A_i$, ${\textsf {MP}}_{v_i}(\rho ) \ge p(A_i)-r$ and ${\overline{\textsf {MP}}}_{v_{|\textsf {Agt}| + i}}(\rho ) \ge -p(A_i)$.

Putting together this lemma and the correspondence between the deviator game and robust equilibria of Theorem 6 we obtain the following proposition.

Lemma 6

Let ${\mathcal {G}} $ be a concurrent game with mean-payoff objectives. There is a (k, t, r)-robust equilibrium in ${\mathcal {G}} $ if, and only if, for the multidimensional mean-payoff objective given by v, $I = [\![1 , |\textsf {Agt}|]\!]\cup [\![2\cdot |\textsf {Agt}|+1, 3 \cdot |\textsf {Agt}| ]\!]$ and $J = [\![|\textsf {Agt}|+1, 2\cdot |\textsf {Agt}|]\!]\cup [\![3 \cdot |\textsf {Agt}|+1, 4\cdot |\textsf {Agt}|]\!]$, there is a payoff vector p such that Eve can ensure threshold u in ${\mathcal {D}({\mathcal {G}})} $, where for all $i \in [\![1,|\textsf {Agt}|]\!]$, $u_{i} = p(A_i) -r$, $u_{|\textsf {Agt}|+i} = -p(A_i)$, $u_{2\cdot |\textsf {Agt}|+i} = p(A_i)$, and $u_{3\cdot |\textsf {Agt}|+i} = -p(A_i)$.

5.3 Formulation of the Robustness Problem as a Polyhedron Value Problem

From the previous lemma, we can deduce an algorithm which works by querying the polyhedron value problem. Given a game ${\mathcal {G}} $ and parameters k, t, r, we ask whether there exists a payoff u that Eve can ensured in the game ${\mathcal {D}({\mathcal {G}})} $ with multidimensional mean-payoff objective given by v, I, J, and such that for all $i \in [\![1 , |\textsf {Agt}|]\!]$, $u_i + r = - u_{|\textsf {Agt}|+i} = u_{2 \cdot |\textsf {Agt}| + i} = - u_{3\cdot |\textsf {Agt}|+i}$. As we will show in Theorem 7 thanks to Lemma 6, the answer to this question is yes if, and only if, there is a (k, t, r)-robust equilibrium. From the point of view of complexity, however, the deviator game on which we perform the query can be of exponential size compared to the original game. To describe more precisely the complexity of the problem, we remark by applying the bound of [8, Theorem 22], that given a query, we can find solutions which have a small representation.

Lemma 7

If there is a solution to the polyhedron value problem in ${\mathcal {D}({\mathcal {G}})} $ then there is one whose encoding is of polynomial size with respect to ${\mathcal {G}} $ and the polyhedron given as input.

We can therefore enumerate all possible solutions in polynomial space. To obtain an polynomial space algorithm, we must be able to check one solution in polynomial space as well. This account to show that queries for the value problem in the deviator game can be done in space polynomial with respect to the original game. This is the goal of the next section, and it is done by considering small parts of the deviator game called fixed coalition games.

6 Fixed Coalition Game

Although the deviator game may be of exponential size, it presents a particular structure. As the set of deviators only increases during any run, the game can be seen as the product of the original game with a directed acyclic graph (DAG). The nodes of this DAG correspond to possible sets of deviators, it is of exponential size but polynomial degree and depth. We exploit this structure to obtain a polynomial space algorithm for the value problem and thus also for the polyhedron value problem and the robustness problem. The idea is to compute winning states in one component at a time, and to recursively call the procedure for states that are successors of the current component. We will therefore consider one different game for each component.

We now present the details of the procedure. For a fixed set of deviator D, the possible successors of states of the component $\textsf {Stat}\times D$ are the states in: $\textsf {Succ}(D) = \{ \textsf {Tab}_\mathcal {D}((s,D), (m_\textsf {Agt},m_\textsf {Agt}')) \mid s \in \textsf {Stat}, m_\textsf {Agt},m'_\textsf {Agt}\in \textsf {Mov}(s) \} ~\setminus ~ \textsf {Stat}\times D\}.$ Note that the size of $\textsf {Succ}(D)$ is bounded by $|\textsf {Stat}| \times |\textsf {Tab}|$, hence it is polynomial. Let u be a payoff threshold, we want to know whether Eve can ensure u in ${\mathcal {D}({\mathcal {G}})} $, for the multi-dimensional objective defined in Sect. 5.1. A winning path $\rho $ from a state in $\textsf {Stat}\times D$ is either: (1) such that $\delta (\rho )= D$; (2) or it reaches a state in $\textsf {Succ}(D)$ and follow a winning path from there. Assume we have computed all the states in $\textsf {Succ}(D)$ that are winning. We can stop the game as soon as $\textsf {Succ}(D)$ is reached, and declare Eve the winner if the state that is reached is a winning state of ${\mathcal {D}({\mathcal {G}})} $. This process can be seen as a game $\mathcal {F}(D,u)$, called the fixed coalition game.

In this game the states are those of $(\textsf {Stat}\times D) \cup \textsf {Succ}(D)$; transitions are the same than in ${\mathcal {D}({\mathcal {G}})} $ on the states of $\textsf {Stat}\times D$ and the states of $\textsf {Succ}(D)$ have only self loops. The winning condition is identical to $\mathcal {R}(k,t,p)$ for the plays that never leave $(\textsf {Stat}\times D)$; and for a play that reach some $(s',D') \in \textsf {Succ}(D)$, it is considered winning Eve has a winning strategy from $(s',D')$ in ${\mathcal {D}({\mathcal {G}})} $ and losing otherwise.

In the fixed coalition game, we keep the weights previously defined for states of $\textsf {Stat}\times D$, and fix it for the states that are not in the same D component by giving a high payoff on states that are winning and a low one on the losing ones. Formally, we define a multidimensional weight function $v^f$ on $\mathcal {F}(D,u)$ by: 1. for all $s \in \textsf {Stat}$, and all $i \in [\![1, 4\cdot |\textsf {Agt}|]\!]$, $v^f_i(s,D) = v_i(s,D)$. 2. if $(s,D') \in \textsf {Succ}(D)$ and Eve can ensure u from $(s,D')$, then for all $i \in [\![1, 4 \cdot |\textsf {Agt}|]\!]$, $v^f_i(s,D') = W$. 3. if $(s,D') \in \textsf {Succ}(D)$ and Eve cannot ensure u from $(s,D')$, then for all $i \in [\![1, 4 \cdot |\textsf {Agt}|]\!]$, $v^f_i(s,D') = -W-1$.

Lemma 8

Eve can ensure payoff $u \in [\![-W, W]\!]^d$ in ${\mathcal {D}({\mathcal {G}})} $ from (s, D) if, and only if, she can ensure u in the fixed coalition game $\mathcal {F}(D,p)$ from (s, D).

Using this correspondence, we deduce a polynomial space algorithm to check that Eve can ensure a given value in the deviator game and thus a polynomial space algorithm for the robustness problem.

Theorem 7

There is a polynomial space algorithm, that given a concurrent game ${\mathcal {G}} $, tells if there is a (k, t, r)-robust equilibrium.

Proof

We first show that there is a polynomial space algorithm to solve the value problem in ${\mathcal {D}({\mathcal {G}})} $. We consider a threshold u and a state (s, D). In the fixed coalition game $\mathcal {F}(D,u)$, for each $(s',D') \in \textsf {Succ}(D)$, we can compute whether it is winning by recursive calls. Once the weights for all $(s',D') \in \textsf {Succ}(D)$ have been computed for $\mathcal {F}(D,u)$, we can solve the value problem in $\mathcal {F}(D,u)$. Thanks to Lemma 8 the answer to value problem in this game is yes exactly when Eve can ensure u from (s, D) in ${\mathcal {D}({\mathcal {G}})} $. There is a co NP algorithm [20] to check the value problem in a given game, and therefore there also is an algorithm which uses polynomial space. The size of the stack of recursive calls is bounded by $|\textsf {Agt}|$, so the global algorithm uses polynomial space.

We now use this to show that there is a polynomial space algorithm for the polyhedron value problem in ${\mathcal {D}({\mathcal {G}})} $. We showed in Lemma 7, that if the polyhedron value problem has a solution then there is a threshold u of polynomial size that is witness of this property. We can enumerate all the thresholds that satisfy the size bound in polynomial space. We can then test that these thresholds satisfy the given linear inequations, and that the algorithm for the value problem answers yes on this input, in polynomial space thanks to the previous algorithm. If this is the case for one of the thresholds, then we answer yes for the polyhedron value problem. The correctness of this procedure holds thanks to Lemma 7.

We now use this to show that there is a polynomial space algorithm for the robustness problem. Given a game ${\mathcal {G}} $ and parameters (k, t, r), we define a tuple of linear equations, for all $i \in [\![1 , |\textsf {Agt}| ]\!]$, $x_{2\cdot |\textsf {Agt}|+i} = x_i + r \wedge x_{2\cdot |\textsf {Agt}|+i} = - x_{|\textsf {Agt}|+i} \wedge x_{2\cdot |\textsf {Agt}|+i} = - x_{3 \cdot |\textsf {Agt}|+i} $ (each equation can be expressed by two inequations). Thanks to Lemma 6, there is a payoff which satisfies these constraints and which Eve can ensure in ${\mathcal {D}({\mathcal {G}})} $ if, and only if, there is a (k, t, r)-robust equilibrium. Then, querying the algorithm we described for the polyhedron value problem in ${\mathcal {D}({\mathcal {G}})} $ with our system of inequations, answers the robustness problem.

7 Hardness

In this section, we show a matching lower bound for the resilience problem. The lower bound holds for weights that are 0 in every states except on some terminal states where they can be 1. This is also called simple reachability objectives.

Theorem 8

The robustness problem is PSPACE-complete.

Note that we already proved PSPACE-membership in Theorem 7. We give the construction and intuition of the reduction to show hardness and leave the proof of correctness in the appendix. We encode QSAT formulas with n variable into a game with $2\cdot n+2$ players, such that the formula is valid if, and only if, there is n-resilient equilibria. We assume that we are given a formula of the form $\phi = \forall x_1. \exists x_2.\ \forall x_3 \cdot \exists x_n.\ C_1 \wedge \cdots \wedge C_k$, where each $C_k$ is of the form $\ell _{1,k} \vee \ell _{2,k} \vee \ell _{3,k}$ and each $\ell _{j,k}$ is a literal (i.e. $x_m$ or $\lnot x_m$ for some m). We define the game ${\mathcal {G}} _\phi $ as illustrated by an example in Fig. 4. It has a player $A_m$ for each positive literal $x_m$, and a player $B_m$ for each negative literal $\lnot x_m$. We add two extra players Eve and Adam. Eve is making choices for the existential quantification and Adam for the universal ones. When they chose a literal, the corresponding player can either go to a sink state $\bot $ or continue the game to the next quantification. Once a literal has been chosen for all the variables, Eve needs to chose a literal for each clause. The objective for Eve and the literal players is to reach $\bot $. The objective for Adam is to reach $\top $. We ask whether there is a $(n+1)$-resilient equilibrium.

To a history $h=\texttt {Adam} _1 \cdot X_1\cdot \texttt {Eve} _2 \cdot X_2 \cdot \texttt {Adam} _3 \cdots \texttt {Eve} _m \cdot X_m$ with $X_i \in \{ A_i,B_i\}$, we associate a valuation $v_h$, such that $v_h(x_i) = \textsf {true}$ if $X_i = B_i$ and $v_h(x_i) = \textsf {false}$ if $X_i = A_i$. Intuitively, Eve has to find a valuation that makes the formula hold, while Adam tries to falsify it.

Notes

1.
We write $[\![i, j]\!]$ for the set of integers $\{ k \in \mathbb {Z} \mid i \le k \le j\}$.

References

Abraham, I., Dolev, D., Gonen, R., Halpern, J.: Distributed computing meets game theory: robust mechanisms for rational secret sharing and multiparty computation. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Principles of Distributed Computing, pp. 53–62. ACM (2006)
Google Scholar
Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J. ACM (JACM) 49(5), 672–713 (2002)
Article MathSciNet MATH Google Scholar
Aumann, R.: Acceptable points in general cooperative n-person games. Top. Math. Econ. Game Theory Essays Honor Robert J Aumann 23, 287–324 (1959)
MathSciNet MATH Google Scholar
Bouyer, P., Brenguier, R., Markey, N., Ummels, M.: Concurrent games with ordered objectives. In: Birkedal, L. (ed.) FOSSACS 2012. LNCS, vol. 7213, pp. 301–315. Springer, Heidelberg (2012)
Chapter Google Scholar
Bouyer, P., Markey, N., Stan, D.: Mixed nash equilibria in concurrent terminal-reward games. In: 34th International Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, vol. 29, pp. 351–363 (2014)
Google Scholar
Brenguier, R.: Nash equilibria in concurrent games: application to timed games. Ph.D. thesis, Cachan, Ecole normale supérieure (2012)
Google Scholar
Brenguier, R.: Robust equilibria in concurrent games. CoRR, http://arxiv.org/abs/1311.7683 (2015)
Brenguier, R., Raskin, J.-F.: Optimal values of multidimensional mean-payoff games (2014). https://hal.archives-ouvertes.fr/hal-00977352/
Brenguier, R., Raskin, J.-F.: Pareto curves of multidimensional mean-payoff games. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 251–267. Springer, Heidelberg (2015)
Chapter Google Scholar
Brihaye, T., Bruyère, V., De Pril, J.: Equilibria in quantitative reachability games. In: Ablayev, F., Mayr, E.W. (eds.) CSR 2010. LNCS, vol. 6072, pp. 72–83. Springer, Heidelberg (2010)
Chapter Google Scholar
Canny, J.: Some algebraic and geometric computations in pspace. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 460–467. ACM (1988)
Google Scholar
Chatterjee, K., Henzinger, T.A., Jurdziński, M.: Games with Secure Equilibria$^{,}$. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2004. LNCS, vol. 3657, pp. 141–161. Springer, Heidelberg (2005)
Chapter Google Scholar
Chatterjee, K., Henzinger, T.A., Piterman, N.: Strategy logic. Inform. Comput. 208(6), 677–693 (2010)
Article MathSciNet MATH Google Scholar
Mogavero, F., Murano, A., Perelli, G., Vardi, M.Y.: What makes Atl* decidable? A decidable fragment of strategy logic. In: Koutny, M., Ulidowski, I. (eds.) CONCUR 2012. LNCS, vol. 7454, pp. 193–208. Springer, Heidelberg (2012)
Chapter Google Scholar
Nash Jr., J.F.: Equilibrium points in $n$-person games. Proc. Nat. Acad. Sci. USA 36(1), 48–49 (1950)
Article MathSciNet MATH Google Scholar
Puterman, M.L.: Markov decision processes: Discrete stochastic dynamic programming (1994)
Google Scholar
Ummels, M.: The complexity of nash equilibria in infinite multiplayer games. In: Amadio, R.M. (ed.) FOSSACS 2008. LNCS, vol. 4962, pp. 20–34. Springer, Heidelberg (2008)
Chapter Google Scholar
Ummels, M., Wojtczak, D.: The complexity of nash equilibria in simple stochastic multiplayer games. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 297–308. Springer, Heidelberg (2009)
Chapter Google Scholar
Ummels, M., Wojtczak, D.: The Complexity of Nash Equilibria in Limit-Average Games. In: Katoen, J.-P., König, B. (eds.) CONCUR 2011. LNCS, vol. 6901, pp. 482–496. Springer, Heidelberg (2011)
Chapter Google Scholar
Velner, Y., Chatterjee, K., Doyen, L., Henzinger, T.A., Rabinovich, A., Raskin, J.-F.: The complexity of multi-mean-payoff and multi-energy games. CoRR abs/1209.3234 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Romain Brenguier

Authors

Romain Brenguier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Romain Brenguier .

Editor information

Editors and Affiliations

Radboud University Nijmegen, Nijmegen, The Netherlands
Bart Jacobs
RWTH Aachen University, Aachen, Germany
Christof Löding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brenguier, R. (2016). Robust Equilibria in Mean-Payoff Games. In: Jacobs, B., Löding, C. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2016. Lecture Notes in Computer Science(), vol 9634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49630-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-662-49630-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49629-9
Online ISBN: 978-3-662-49630-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Abstract

1 Introduction

2 Definitions

2.1 Weighted Concurrent Games

2.2 Equilibria Notions

2.3 Undecidability

Theorem 1

3 Stationary Strategies

Lemma 1

Theorem 2

Theorem 3

Theorem 4

Theorem 5

Proof

4 Deviator Game

Lemma 2

4.1 Deviator Game

Lemma 3

4.2 Objectives of the Deviator Game

Lemma 4

Theorem 6

5 Reduction to Multidimensional Mean-Payoff Objectives

5.1 Multidimensional Objectives

5.2 Correctness of the Objectives for Robustness

Lemma 5

Lemma 6

5.3 Formulation of the Robustness Problem as a Polyhedron Value Problem

Lemma 7

6 Fixed Coalition Game

Lemma 8

Theorem 7

Proof

7 Hardness

Theorem 8

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation