Approximation Schemes for Stochastic Mean Payoff Games with Perfect Information and Few Random Positions
 311 Downloads
Abstract
We consider twoplayer zerosum stochastic mean payoff games with perfect information. We show that any such game, with a constant number of random positions and polynomially bounded positive transition probabilities, admits a polynomial time approximation scheme, both in the relative and absolute sense.
Keywords
Stochastic mean payoff games Approximation schemes Approximation algorithms Nash equilibrium1 Introduction
The rise of the Internet has led to an explosion in research in game theory, the mathematical modeling of competing agents in strategic situations. The central concept in such models is that of a Nash equilibrium, which defines a state where no agent gains an advantage by changing to another strategy. Nash equilibria serve as predictions for the outcome of strategic situations in which selfish agents compete.
A fundamental result in game theory states that if the agents can choose a mixed strategy (i.e., probability distributions of deterministic strategies), a Nash equilibrium is guaranteed to exist in finite games [24, 25]. Often, however, already pure (i.e., deterministic) strategies lead to a Nash equilibrium. Still, the existence of Nash equilibria might be irrelevant in practice since their computation would take too long (finding mixed Nash equilibria in twoplayer games is PPADcomplete in general [11]). Thus, algorithmic aspects of game theory have gained a lot of interest. Following the dogma that only polynomial time algorithms are feasible algorithms, it is desirable to show polynomial time complexity for the computation of Nash equilibria.
We consider twoplayer zerosum stochastic mean payoff games with perfect information. In this case the concept of Nash equilibria coincides with saddle points or mini–max/maxi–min strategies. The decision problem associated with computing such strategies and the values of these games is in the intersection of NP and coNP, but it is unknown whether it can be solved in polynomial time. In cases where efficient algorithms are not known to exist, an approximate notion of a saddle point has been suggested. In an approximate saddle point, no agent can gain a substantial advantage by changing to another strategy. In this paper, we design approximation schemes for saddle points for such games when the number of random positions is fixed (see Sect. 1.2 for a definition).
In the remainder of this section, we introduce the concepts used in this paper. Our results are summarized in Sect. 1.4. After that, we present our approximation schemes (Sect. 2). We conclude with a list of open problems (Sect. 3), where we address in particular the question of polynomial smoothed complexity of mean payoff games. In the conference version of this paper [2], we wrongly claimed that stochastic mean payoff games can be solved in smoothed polynomial time.
1.1 Stochastic Mean Payoff Games
1.1.1 Definition and Notation

\(G=(V,E)\) is a directed graph that may have loops and multiple edges, but no terminal positions, i.e., no positions of outdegree 0. The vertex set V of G is partitioned into three disjoint subsets \(V = V_B \cup V_W \cup V_R\) that correspond to black, white, and random positions, respectively. The edges stand for moves. The black and white positions are owned by two players: Black —the minimizer—owns the black positions in \(V_B\), and White —the maximizer—owns the white positions in \(V_W\). The positions in \(V_R\) are owned by nature.

P is the vector of probability distributions for all positions \(v \in V_R\) owned by nature. We assume that \(\sum _{u: (v,u) \in E} p_{vu} = 1\) for all \(v \in V_R\) and \(p_{vu} > 0\) for all \(v \in V_R\) and \((v,u) \in E\).

r is the vector of rewards; each edge e has a local reward \(r_e\).
Starting from a given initial position \(v_0 \in V\), the game yields an infinite walk \((v_0, v_1, v_2, \ldots )\), called a play. Let \(b_i\) denote the reward \(r_{(v_{i1},v_{i})}\) received by White in step i. The undiscounted limit average effective payoff is defined as the Cesàro average \(c=\liminf _{n\rightarrow \infty }\frac{\sum _{i=1}^n{{\mathrm{\mathbb {E}}}}[b_i]}{n}\). White’s objective is to maximize c, while the objective of Black is to minimize it.
In this paper, we will restrict our attention to the sets of pure (that is, nonrandomized) and stationary (that is, historyindependent) strategies of players White and Black, denoted by \(S_W\) and \(S_B\), respectively; such strategies are called positional strategies. Formally, a positional strategy \(s_W \in S_W\) for White is a mapping that assigns a move \((v,u) \in E\) to each position in \(V_W\). We sometimes abbreviate \(s_W(v)=(v,u)\) by \(s_W(v)=u\). Strategies \(s_B \in S_B\) for Black are analogously defined. A pair of strategies \(s = (s_W, s_B)\) is called a situation. By abusing notation, let \(s(v) = u\) if \(v \in V_W\) and \(s_W(v) = u\) or \(v \in V_B\) and \(s_B(v) = u\).
1.1.2 Strategies and Saddle Points
If we consider \(c_{v_0}(s)\) for all possible situations s, we obtain a matrix game \(C_{v_0} : S_W \times S_B \rightarrow \mathbb {R}\), with entries \(C_{v_0}(s_W,s_B) = c_{v_0}(s_W,s_B)\). It is known that every such game has a saddle point in pure strategies [19, 29]. Such a saddle point defines an equilibrium state in which no player has an incentive to switch to another strategy. The value at that state coincides with the limiting payoff in the corresponding BWRgame [19, 29].
We call a pair of strategies optimal if they correspond to a saddle point. It is wellknown that there exist optimal strategies \((s^*_W,s^*_B)\) that do not depend on the starting position \(v_0\). Such strategies are called uniformly optimal. Of course there might be several optimal strategies, but they all lead to the same value. We define this to be the value of the game and write \(\mu _{v_0}(\mathcal {G}) = C_{v_0}(s^*_W,s_B^*)\), where \((s^*_W,s^*_B)\) is any pair of optimal strategies. Note that \(\mu _{v_0}(\mathcal {G})\) may depend on the starting node \(v_0\). Note also that for an arbitrary situation s, \(\mu _{v_0}(\mathcal {G}(s))\) denotes the effective payoff \(c_{v_0}(s)\) in the Markov chain \(\mathcal {G}(s)\).
An algorithm is said to solve the game if it computes an optimal pair of strategies.
1.2 Approximation and Approximate Equilibria
A situation \((s_W^*,s_B^*)\) is called relatively \(\varepsilon \)optimal, if satisfies (1), and it is called absolutely \(\varepsilon \)optimal if it satisfies (3). In the following, we will drop the specification of absolute and relative if it is clear from the context. If the pair \((s_W^*,s_B^*)\) is (absolutely or relatively) \(\varepsilon \)optimal for all starting positions, it is called uniformly (absolutely or relatively) \(\varepsilon \)optimal (also called subgame perfect).
An algorithm for approximating (absolutely or relatively) the values of the game is said to be a fully polynomialtime (absolute or relative) approximation scheme (FPTAS) if the runningtime depends polynomially on the input size and \(1/\varepsilon \). In what follows, we assume without loss of generality that \(1/\varepsilon \) is an integer.
1.3 Previous Results
BWRgames are an equivalent formulation [21] of the stochastic games with perfect information and mean payoff that were introduced in 1957 by Gillette [19]. As it was noticed already in [21], the BWR model generalizes a variety of games and problems: BWRgames without random positions (\(V_R = \emptyset \)) are called cyclic or mean payoff games [16, 17, 21, 33, 34]; we call these BWgames. If one of the sets \(V_B\) or \(V_W\) is empty, we obtain a Markov decision process for which polynomialtime algorithms are known [32]. If both are empty (\(V_B = V_W = \emptyset \)), we get a weighted Markov chain. If \(V=V_W\) or \(V=V_B\), we obtain the minimum meanweight cycle problem, which can be solved in polynomial time [27].
If all rewards are 0 except for m terminal loops, we obtain the socalled Backgammonlike or stochastic terminal payoff games [7]. The special case \(m=1\), in which every random node has only two outgoing arcs with probability 1 / 2 each, defines the socalled simple stochastic games (SSGs), introduced by Condon [13, 14]. In these games, the objective of White is to maximize the probability of reaching the terminal, while Black wants to minimize this probability. Recently, it has been shown that Gillette games (and hence BWRgames [3]) are equivalent to SSGs under polynomialtime reductions [1]. Thus, by recent results of Halman [22], all these games can be solved in randomized strongly subexponential time \(2^{O(\sqrt{n_d\log n_d})}{{\mathrm{{\text {poly}}}}}(V)\), where \(n_d=V_B+V_W\) is the number of deterministic positions.
Besides their many applications [26, 30], all these games are of interest to complexity theory: The decision problem “whether the value of a BWgame is positive” is in the intersection of NP and coNP [28, 40]; yet, no polynomial algorithm is known even in this special case. We refer to Vorobyov [39] for a survey. A similar complexity claim holds for SSGs and BWRgames [1, 3]. On the other hand, there exist algorithms that solve BWgames in practice very fast [21]. The situation for these games is thus comparable to linear programming before the discovery of the ellipsoid method: linear programming was known to lie in the intersection of NP and coNP, and the simplex method proved to be fast in practice. In fact, a polynomial algorithm for linear programming in the unit cost model would already imply a polynomial algorithm for BWgames [37]; see also [4] for an extension to BWRgames.
While there are numerous pseudopolynomial algorithms known for BWgames [21, 35, 40], pseudopolynomiality for BWRgames (with no restriction on the number of random positions) is in fact equivalent to polynomiality [1]. Gimbert and Horn [20] have shown that a generalization of simple stochastic games on k random positions having arbitrary transition probabilities [not necessarily (1 / 2, 1 / 2)] can be solved in time \(O(k!(VE+L))\), where L is the maximum bit length of a transition probability. There are various improvements with smaller dependence on k [9, 15, 20, 23] (note that even though BWRgames are polynomially reducible to simple stochastic games, under this reduction the number of random positions does not stay constant, but is only polynomially bounded in n, even if the original BWRgame had a constant number of random positions). Recently, a pseudopolynomial algorithm was given for BWRgames with a constant number of random positions and polynomial common denominator of transition probabilities, but under the assumption that the game is ergodic (that is, the value does not depend on the initial position) [5]. Then, this result was extended for the nonergodic case [6]; see also [4].
As for approximation schemes, the only result we are aware [36] of is the observation that the values of BWgames can be approximated within an absolute error of \(\varepsilon \) in polynomialtime, if all rewards are in the range \([1,1]\). This follows immediately from truncating the rewards and using any of the known pseudopolynomial algorithms [21, 35, 40].
On the negative side, it was observed recently [18] that obtaining an \(\varepsilon \)absolute FPTAS without the assumption that all rewards are in \([1,1]\), or an \(\varepsilon \)relative FPTAS without the assumption that all rewards are nonnegative, for BWgames, would imply their polynomial time solvability. In that sense, our results below are the best possible unless there is a polynomial algorithm for solving BWgames.
1.4 Our Results
In this paper, we extend the absolute FPTAS for BWgames [36] in two directions. First, we allow a constant number of random positions, and, second, we derive an FPTAS with a relative approximation error. Throughout the paper, we assume the availability of a pseudopolynomial algorithm \(\mathbb {A}\) that solves any BWRgame \(\mathcal {G}\) with integral rewards and rational transition probabilities in time polynomial in n, D, and R, where \(n=n(\mathcal {G})\) is the total number of positions, \(R=R(\mathcal {G}):=r^+(\mathcal {G})r^(\mathcal {G})\) is the size of the range of the rewards, \(r^+(\mathcal {G})=\max _e r_e\) and \(r^(\mathcal {G})=\min _e r_e\), and \(D=D(\mathcal {G})\) is the common denominator of the transition probabilities. Note that the dependence on D is inherent in all known pseudopolynomial algorithms for BWRgames. Note also that the affine scaling of the rewards does not change the game.
Let \(p_{\min }=p_{\min }(\mathcal {G})\) be the minimum positive transition probability in the game \(\mathcal {G}\). Throughout this paper, we will assume that the number k of random positions is bounded by a constant.
The following theorem says that a pseudopolynomial algorithm can be turned into an absolute approximation scheme.
Theorem 1
Given a pseudopolynomial algorithm for solving any BWRgame with \(k=O(1)\) (in uniformly optimal strategies), there is an algorithm that returns, for any given BWRgame with rewards in \([1,1]\), \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that (uniformly) approximates the value within an absolute error of \(\varepsilon \). The runningtime of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n, 1/p_{\min }, 1/\varepsilon )\) [assuming \(k=O(1)\)].
We also obtain an approximation scheme with a relative error.
Theorem 2
Given a pseudopolynomial algorithm for solving any BWRgame with \(k=O(1)\), there is an algorithm that returns, for any given BWRgame with nonnegative integral rewards, \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that approximates the value within a relative error of \(\varepsilon \). The runningtime of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,1/p_{\min },\log R, 1/\varepsilon )\) [assuming \(k=O(1)\)].
We remark that Theorem 1 (apart from the dependence of the running time on \(\log R\)) can be obtained from Theorem 2 (see Sect. 2). However, our reduction in Theorem 1, unlike Theorem 2, has the property that if the pseudopolynomial algorithm returns uniformly optimal strategies, then the approximation scheme also returns uniformly \(\varepsilon \)optimal strategies. For BWgames, i.e., the special case without random positions, we can also strengthen the result of Theorem 2 to return a pair of strategies that is uniformly \(\varepsilon \)optimal.
Theorem 3
Assume that there is a pseudopolynomial algorithm for solving any BWgame in uniformly optimal strategies. Then for any \(\varepsilon > 0\), there is an algorithm that returns, for any given BWgame with nonnegative integral rewards, a pair of uniformly relatively \(\varepsilon \)optimal strategies. The runningtime of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,\log R, 1/\varepsilon )\).
In deriving these approximation schemes from a pseudopolynomial algorithm, we face two main technical challenges that distinguish the computation of \(\varepsilon \)equilibria of BWRgames from similar standard techniques used in combinatorial optimization. First, the runningtime of the pseudopolynomial algorithm depends polynomially both on the maximum reward and the common denominator D of the transition probabilities. Thus, in order to obtain a fully polynomialtime approximation scheme (FPTAS) with an absolute guarantee whose runningtime is independent of D, we have to truncate the probabilities and bound the change in the game value, which is a nonlinear function of D. Second, in order to obtain an FPTAS with a relative guarantee, one needs (as often in optimization) a (trivial) lower/upper bound on the optimum value. In the case of BWRgames, it is not clear what bound we can use, since the game value can be arbitrarily small. The situation becomes even more complicated if we look for uniformly \(\varepsilon \)optimal strategies. This is because we have to output just a single pair of strategies that guarantees \(\varepsilon \)optimality from any starting position.
In order to resolve the first issue, we analyze the change in the game values and optimal strategies if the rewards or transition probabilities are changed. Roughly speaking, we use results from Markov chain perturbation theory to show that if the probabilities are perturbed by a small error \(\delta \), then the change in the game value is \(O(\delta n^2/p_{\min }^{2k})\) (see Sect. 2.1). It is worth mentioning that a somewhat related result was obtained recently for the class of socalled almostsure ergodic games (not necessarily with perfect information) [10]. More precisely, it was shown that for this class of games there is an \(\varepsilon \)optimal strategy with rational representation with denominator \(D=O(\frac{n^3}{\varepsilon p_{\min }^{k}})\) [10]. The second issue is resolved through repeated applications of the pseudopolynomial algorithm on a truncated game. After each such application we have one of the following situations: either the value of the game has already been approximated within the required accuracy or it is guaranteed that the range of the rewards can be shrunk by a constant factor without changing the value of the game (see Sects. 2.3, 2.4).
Since BWRgames with a constant number of random positions admit a pseudopolynomial algorithm, as was recently shown [5, 6], we obtain the following results.
Corollary 1
 (i)
There is an FPTAS that solves, within an absolute error guarantee, in uniformly \(\varepsilon \)optimal strategies, any BWRgame with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and rewards in \([1,1]\).
 (ii)
There is an FPTAS that solves, within a relative error guarantee, in \(\varepsilon \)optimal strategies, any BWRgame with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and nonnegative rational rewards.
 (iii)
There is an FPTAS that solves, within a relative error guarantee, in uniformly \(\varepsilon \)optimal strategies, any BWgame with nonnegative (rational) rewards.
The proofs of Theorems 1, 2, and 3 will be given in Sects. 2.2, 2.3, and 2.4, respectively.
2 Approximation Schemes
2.1 The Effect of Perturbation
Our approximation schemes are based on the following three lemmas. The first one (which is known) says that a linear change in the rewards corresponds to a linear change in the game value. In our approximation schemes, we truncate and scale the rewards to be able to run the pseudopolynomial algorithm in polynomial time. We need the lemma to bound the error in the game value resulting from the truncation.
Lemma 1
Proof
Lemma 2
Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWRgame with \(r\in [1,1]^{E}\), and let \(\varepsilon \le p_{\min }/2 =p_{\min }(\mathcal {G})/2\) be a positive constant. Let \(\hat{\mathcal {G}}\) be a game \((G=(V,E),\hat{P},r)\) with \(\Vert P\hat{P} \Vert _{\infty }\le \varepsilon \) (and \(\hat{p}_{uv}=0\) if \(p_{uv}=0\) for all arcs (u, v)). Then we have \(\mu _v(\mathcal {G})\mu _v(\hat{\mathcal {G}})\le \delta (\mathcal {G},\varepsilon )\) for any \(v\in V\). Moreover, if the pair \((\tilde{s}_W,\tilde{s}_B)\) is absolutely \(\varepsilon ^{\prime }\)optimal in \((\hat{\mathcal {G}},v)\), then it is absolutely \((\varepsilon ^{\prime }+2\delta (\mathcal {G},\varepsilon ))\)optimal in \((\mathcal {G},v)\).
Proof
Since we assume that the runningtime of the pseudopolynomial algorithm for the original game \(\mathcal {G}\) depends on the common denominator D of the transition probabilities, we have to truncate the probabilities to remove this dependence on D. By Lemma 2, the value of the game does not change too much after such a truncation.
The third result that we need concerns relative approximation. The main idea is to use the pseudopolynomial algorithm to test whether the value of the game is larger than a certain threshold. If it is, we get already a good relative approximation. Otherwise, the next lemma says that we can reduce all large rewards without changing the value of the game.
Lemma 3
Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWRgame with \(r\ge 0\), and let v be any vertex with \(\mu _v(\mathcal {G})< t\). Suppose that \(r_e\ge t^{\prime }=ntp_{\min }^{(2k+1)}\) for some \(e\in E\). Let \(\hat{\mathcal {G}}=(G=(V,E),P,\hat{r})\), where \(\hat{r}_e=\min \{r_e,t^{\prime \prime }\}\), \(t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\) for some \(\varepsilon \ge 0\), and \(\hat{r}_{e^{\prime }}=r_{e^{\prime }}\) for all \(e^{\prime }\ne e\). Then \(\mu _v(\hat{\mathcal {G}})=\mu _v(\mathcal {G})\), and any relatively \(\varepsilon \)optimal situation in \((\hat{\mathcal {G}},v)\) is also relatively \(\varepsilon \)optimal in \((\mathcal {G},v)\).
Proof
We assume that \(\hat{r}_e=t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\), since otherwise there is nothing to prove. Let \(s^*=(s^*_W,s^*_B)\) be an optimal situation for \((\mathcal {G},v)\). This means that \(\mu _v(\mathcal {G})=\mu _v(\mathcal {G}(s^*))=\rho (s^*)^Tr< t\). Lemma 8 says that \(\rho _e(s^*)>0\) implies \(\rho _{e}(s^*)\ge p_{\min }^{2k+1}/n\). Hence, \(r_{e}\rho _e(s^*)\le \rho (s^*)^Tr=\mu _v(\mathcal {G})<t\) implies that \(r_{e}<t^{\prime }\), if \(\rho _e(s^*)>0\). We conclude that \(\rho _e(s^*)=0\), and hence \(\mu _v(\hat{\mathcal {G}}(s^*))=\mu _v(\mathcal {G})\).
2.2 Absolute Approximation
In this section, we assume that \(r^=1\) and \(r^+=1\), i.e., all rewards are from the interval \([1,1]\). We may assume also that \(\varepsilon \in (0,1)\) and \(\frac{1}{\varepsilon }\in \mathbb {Z}_+\). We apply the pseudopolynomial algorithm \(\mathbb {A}\) on a truncated game \(\tilde{\mathcal {G}}=(G=(V,E),\tilde{P},\tilde{r})\) defined by rounding the rewards to the nearest integer multiple of \(\varepsilon /4\) (denoted \(\tilde{r}:=\lfloor r\rceil _{\frac{\varepsilon }{4}}\)) and truncating the vector of probabilities \((p_{(v,u)})_{u \in V}\) for each random node \(v\in V_R\), as described in the following lemma.
Lemma 4
 (i)
\(\Vert \alpha ^{\prime }\Vert _{1}=1\);
 (ii)
for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i = c_i/2^{B}\) where \(c_i \in \mathbb {N}\) is an integer;
 (iii)
for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i>0\) if and only \(\alpha _i>0\); and
 (iv)
\(\Vert \alpha \alpha ^{\prime }\Vert _{\infty } \le 2^{B}\).
Proof
This is straightforward, and we include the proof only for completeness. Without loss of generality, we assume \(\alpha _i>0\) for all i (set \(\alpha _i^{\prime }=0\) for all i such that \(\alpha _i=0\)). Initialize \(\varepsilon _0=0\) and iterate for \(i=1,\ldots ,n\): set \(\alpha _i^{\prime } =\lfloor \alpha _i+\varepsilon _{i1}\rceil _{2^{B}}\) and \(\varepsilon _{i} =\alpha _i+\varepsilon _{i1}\alpha _i^{\prime }\). The construction implies (4). Note that \(\varepsilon _i\le 2^{(B+1)}\) for all i, and \(\varepsilon _n=\sum _i \alpha _i\sum _i\alpha _i^{\prime }\), which implies (4). Furthermore, \(\alpha _i\alpha _i^{\prime }=\varepsilon _i\varepsilon _{i1}\le 2^{B}\), which implies (4). Note finally that (4) follows from (4) since \(\min _{i:\alpha _i>0}\{\alpha _i\}>2^{B}\). \(\square \)
Lemma 5
Let \(\mathbb {A}\) be a pseudopolynomial algorithm that solves, in (uniformly) optimal strategies, any BWRgame \(\mathcal {G}=(G,P,r)\) in time \(\tau (n,D,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in (uniformly) absolutely \(\varepsilon \)optimal strategies, any given BWRgame \(\mathcal {G}=(G,P,r)\) in time bounded by \(\tau \bigl (n,\frac{2^{k+4}n^2(3k+1)}{\varepsilon p_{\min }^{k}},\frac{8}{\varepsilon }\bigr )\).
Proof
We apply \(\mathbb {A}\) to the game \(\tilde{\mathcal {G}}=(G,\tilde{P},\tilde{r})\), where \(\tilde{r}:=\frac{4}{\varepsilon }\lfloor r\rceil _{\frac{\varepsilon }{4}}\). The probabilities \(\tilde{P}\) are obtained from P by applying Lemma 4 with \(B=\lceil \log _2(1/\varepsilon ^{\prime })\rceil \), where we select \(\varepsilon ^{\prime }\) such that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \frac{\varepsilon }{4}\) [as defined by (6)]. It is easy to check that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \varepsilon /4\) for \(\varepsilon ^{\prime }=\frac{\varepsilon p_{\min }^{k}}{2^{k+3}n^2(3k+1)}\), as \(r^*=1\). Note that all rewards in \(\tilde{\mathcal {G}}\) are integers in the range \([\frac{4}{\varepsilon },\frac{4}{\varepsilon }]\). Since \(D(\tilde{\mathcal {G}})=2^B\) and \(R(\tilde{\mathcal {G}})= 8/\varepsilon \), the statement about the runningtime follows.
Let \(\tilde{s}\) be the pair of (uniformly) optimal strategies returned by \(\mathbb {A}\) on input \(\tilde{\mathcal {G}}\). Let \(\hat{\mathcal {G}}\) be the game \((G,\tilde{P},r)\). Since \(\Vert \tilde{r}\frac{4}{\varepsilon }r\Vert _{\infty }\le 1\), we can apply Lemma 1 (with \(\hat{r}=\tilde{r}\), \(\theta _1=\theta _2=\frac{4}{\varepsilon }\) and \(\gamma _1=\gamma _2=1\)) to conclude that \(\tilde{s}\) is a (uniformly) absolutely \(\frac{\varepsilon }{2}\)optimal pair for \(\hat{\mathcal {G}}\). Now we apply Lemma 2 and conclude that \(\tilde{s}\) is (uniformly) \((\frac{\varepsilon }{2}+2\delta (\mathcal {G},\varepsilon ^{\prime }))\)optimal for \(\mathcal {G}\). \(\square \)
Note that the above technique yields an approximation algorithm with polynomial runningtime only for \(k=O(1)\), even if the pseudopolynomial algorithm \(\mathbb {A}\) works for arbitrary k.
2.3 Relative Approximation
Lemma 6
Proof
The algorithm \({{\mathrm{{\text {FPTASBWR}}}}}(\mathcal {G},w,\varepsilon )\) is given as Algorithm 1. The bound on the runningtime follows since, by step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are nonnegative integers with a maximum value of \(r^+(\hat{\mathcal {G}})\le \theta \), and the smallest common denominator of the transition probabilities is at most \(\tilde{D}:=\frac{2}{\varepsilon ^{\prime }}\). Thus the time taken by algorithm \(\mathbb {A}\) for each recursive call is at most \(\tau \bigl (n,\tilde{D},\theta )\).
What remains to be done is to argue by induction (on \(r^+(\mathcal {G})\)) that the algorithm returns a pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) of \(\varepsilon \)optimal strategies. For the base case, we have either \(r^+(\mathcal {G})\le 2\) or the value returned by the pseudopolynomial \(\mathbb {A}\) satisfies \(\mu _w(\hat{\mathcal {G}})\ge 3/\varepsilon \). In the former case, note that since \(\Vert P\tilde{P}\Vert _{\infty }\le \varepsilon ^{\prime }\) and \(r^+(\mathcal {G})\le 2\), Lemma 2 implies that the pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) returned in step 2 is absolutely \(\varepsilon ^{\prime \prime }\)optimal, where \(\varepsilon ^{\prime \prime }=2\delta (\mathcal {G},\varepsilon ^{\prime })<\frac{\varepsilon p_{\min }^{2k+1}}{n}\). Lemma 8 and the integrality of the nonnegative rewards imply that, for any situation s, \(\mu _w(\mathcal {G}(s))\ge p_{\min }^{2k+1}/n\) if \(\mu _w(\mathcal {G}(s))>0\). Thus, if \(\mu _w(\mathcal {G})>0\), then \(\varepsilon ^{\prime \prime }\le \varepsilon \mu _w(\mathcal {G})\), and it follows that \((\tilde{s}_W,\tilde{s}_B)\) is relatively \(\varepsilon \)optimal. On the other hand, if \(\mu _w(\mathcal {G})=0\), then \(\mu _w(\mathcal {G}(\tilde{s}))\le \mu _w(\mathcal {G})+\varepsilon ^{\prime \prime }< p_{\min }^{2k+1}/n\), implying that \(\mu _w(\mathcal {G}(\tilde{s}))=0\). Thus, we get a relative \(\varepsilon \)approximation in both cases.
On the other hand, if \(\mu _w(\hat{\mathcal {G}})< 3/\varepsilon \) then, by (7), \(\mu _w(\mathcal {G})<\frac{K(3+2\varepsilon )}{\varepsilon }=\frac{p_{\min }^{2k+1}r^+}{2(1+\varepsilon )n}\). By Lemma 3, applied with \(t= K(3+2\varepsilon )/\varepsilon \), the game \(\tilde{\mathcal {G}}\) defined in step 11 satisfies \(\mu _w(\mathcal {G})=\mu _w(\tilde{\mathcal {G}})\), and any (relatively) \(\varepsilon \)optimal strategy in \((\tilde{\mathcal {G}},w)\) (in particular the one returned by induction in step 11) is also \(\varepsilon \)optimal for \((\mathcal {G},w)\). \(\square \)
Note that the runningtime in the above lemma simplifies to \({{\mathrm{{\text {poly}}}}}(n, 1/\varepsilon , 1/p_{\min }) \cdot \log R\) for \(k = O(1)\).
2.4 Uniformly Relative Approximation for BWGames
Lemma 7
Let \(\mathbb {A}\) be a pseudopolynomial algorithm that solves, in uniformly optimal strategies, any BWgame \(\mathcal {G}\) in time \(\tau (n,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in uniformly relatively \(\varepsilon \)optimal strategies, any BWgame \(\mathcal {G}\), in time \(O\bigl (\bigl (\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )+{{\mathrm{{\text {poly}}}}}(n)\bigr ) \cdot h\bigr )\), where \(h=\lceil \log R\rceil +1\), and \(\varepsilon ^{\prime }=\frac{\ln (1+\varepsilon )}{3h}\approx \frac{\varepsilon }{3h}\).
Proof
The algorithm \({{\mathrm{{\text {FPTASBW}}}}}(\mathcal {G},\varepsilon )\) is given as Algorithm 2. The bound on the runningtime is obvious: in step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are integral with a maximum value of \(r^+(\hat{\mathcal {G}})\le \frac{r^+(\mathcal {G})}{K}\le \frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\). Thus, the time that algorithm \(\mathbb {A}\) needs in each recursive call is bounded from above by \(\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )\).
So it remains to argue (by induction) that the algorithm returns a pair \((\tilde{s}_W,\tilde{s}_B)\) of (relatively) uniformly \(\varepsilon \)optimal strategies. Let us index the different recursive calls of the algorithm by \(i=1,2,\ldots ,h^{\prime }\le h\) and denote by \(\mathcal {G}^{(i)}=(G^{(i)}=(V^{(i)},E^{(i)},r^{(i)})\) the game input to the ith recursive call of the algorithm (so \(\mathcal {G}^{(1)}=\mathcal {G}\)) and by \(\hat{s}^{(i)}=(\hat{s}^{(i)}_W,\hat{s}^{(i)}_B)\), \(\tilde{s}^{(i)}=(\tilde{s}^{(i)}_W,\tilde{s}^{(i)}_B)\) the pair of strategies returned either in steps 2, 4, 5, or 11. Similarly, we denote by \(V^{(i)}=V_W^{(i)}\cup V_B^{(i)}\), \(U^{(i)}\), \(r^{(i)}\), \(K^{(i)}\) \(\hat{r}^{(i)}\), \(\hat{\mathcal {G}}^{(i)}\), \(\tilde{\mathcal {G}}^{(i)}\) the instantiations of V, \(V_B\), \(V_W,\) U, r, \(\hat{r}\), \(\hat{\mathcal {G}}\), K, \(\tilde{\mathcal {G}}\), respectively, in the ith call of the algorithm. We denote by \(S_W^{(i)}\) and \(S_B^{(i)}\) the set of strategies in \(\mathcal {G}^{(i)}\) for White and Black, respectively. For a set U of positions, a game \(\mathcal {G}\), and a situation s, we denote by \(\mathcal {G}[U]=(G[U],r)\) and s[U], respectively, the game and situation induced on U. \(\square \)
Claim 1
 (i)
There does not exist an edge \((v,u)\in E\) such that \(v\in V_B^{(i)}\cap U^{(i)}\) and \(u\in V^{(i)}{\setminus } U^{(i)}\).
 (ii)
For all \(v \in V_W^{(i)}\cap U^{(i)}\), there exists a \(u\in U^{(i)}\) with \((v,u)\in E\).
 (i’)
There does not exist an edge \((v,u)\in E\) such that \(v\in V_W^{(i)}{\setminus } U^{(i)}\) and \(u\in U^{(i)}\).
 (ii’)
For all black positions \(v \in V_B^{(i)}{\setminus } U^{(i)}\), there exists a \(u\in V^{(i)}{\setminus } U^{(i)}\) such that \((v,u)\in E\).
 (iii)
Let \(\hat{s}^{(i)}=(\hat{s}_W^{(i)},\hat{s}_B^{(i)})\) be the situation returned in step 4. Then, for all \(v\in U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in U^{(i)}\), and, for all \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in V^{(i)}{\setminus } U^{(i)}\).
Proof
 (I)
\(\mu _v(\hat{\mathcal {G}}^{(i)})=\min \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for \(v\in V_B^{(i)}\), and
 (II)
\(\mu _v(\hat{\mathcal {G}}^{(i)})=\max \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for any \(v\in V_W^{(i)}\).
Note that Claim 1 implies that the game \(\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}]\) is welldefined since the graph \(G[V^{(i)}{\setminus } U^{(i)}]\) has no sinks. For a strategy \(s_W\) (and similarly for a strategy \(s_B\)) and a subset \(V^{\prime }\subseteq V\), we write \(S_W(V^{\prime })=\{s_W(u) \mid u\in V^{\prime }\}\). The following two claims state respectively that the values of the positions in \(U^{(i)}\) are wellapproximated by the pseudopolynomial algorithm and that these values are sufficiently larger than those in the residual set \(V^{(i)}{\setminus } U^{(i)}\).
Claim 2
Proof
This follows from Lemma 1 by the uniform optimality of \(\hat{s}^{(i)}\) in \(\hat{\mathcal {G}}^{(i)}\) and the fact that \(\mu _w(\hat{\mathcal {G}}^{(i)})\ge 1/\varepsilon ^{\prime }\) for every \(w\in U^{(i)}.\) \(\square \)
Claim 3
For all \(u \in U^{(i)}\) and \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \((1+\varepsilon ^{\prime })\mu _u(\mathcal {G}^{(i)})>\mu _v(\mathcal {G}^{(i)})\).
Proof
We observe that the strategy \(\tilde{s}^{(i)}\), returned by the ith call to the algorithm, is determined as follows (c.f. steps 11 and 11): for \(w\in U^{(i)}\), \(\tilde{s}^{(i)}(w)=\hat{s}^{(i)}(w)\) is chosen by the solution of the game \(\hat{\mathcal {G}}^{(i)}\), and for \(w\in V^{(i)}{\setminus } U^{(i)}\), \(\tilde{s}^{(i)}(w)\) is determined by the (recursive) solution on the residual game \(\tilde{\mathcal {G}}^{(i)}=\mathcal {G}^{(i+1)}\). The following claim states that the value of any vertex \(u\in V^{(i)}{\setminus } U^{(i)}\) in the residual game is a good (relative) approximation of the value in the original game \(\mathcal {G}^{(i)}.\)
Claim 4
Proof
Let us fix \(\varepsilon _{h^{\prime }}=\varepsilon ^{\prime }\), and for \(i=h^{\prime }1,h^{\prime }2,\ldots ,1\), let us choose \(\varepsilon _i\) such that \(1+\varepsilon _i\ge (1+\varepsilon ^{\prime })(1+2\varepsilon ^{\prime })(1+\varepsilon _{i+1})\). Next, we claim that the strategies \((\tilde{s}_W^{(i)},\tilde{s}_B^{(i)})\) returned by the ith call of \({{\mathrm{{\text {FPTASBW}}}}}(\mathcal {G},\varepsilon )\) are relatively \(\varepsilon _i\)optimal in \(\mathcal {G}^{(i)}\).
Claim 5
Proof
The proof is by induction on \(i=h^{\prime },h^{\prime }1,\ldots ,1\). For \(i=h^{\prime }\), the statement follows directly from Claim 1 since \(U^{(h^{\prime })}=V^{(h^{\prime })}\). So suppose that \(i<h^{\prime }\).
Proof of (11): Consider an arbitrary strategy \(s_W\in S_W^{(i)}\) for White. Suppose first that \(w\in U^{(i)}.\) Note that, by Claim 1(iii), \(\tilde{s}_B^{(i)}(u)\in U^{(i)}\) for all \(u\in V_B\cap U^{(i)}.\) If also \(s_W(u)\in U^{(i)}\) for all \(u\in V_W\cap U^{(i)}\), such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\), then Claim 2 implies \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\).
Suppose therefore that \(v=s_W(u)\not \in U^{(i)}\) for some \(u\in V_W\cap U^{(i)}\) such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\).
If \(w\in V^{(i)}{\setminus } U^{(i)},\) then a similar argument as in (15) and (16) shows that \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\). Thus, (11) follows.
Proof of (12): Consider an arbitrary strategy \(s_B\in S_B^{(i)}\) for Black. If \(w\in U^{(i)},\) then we have \(\mu _w(\mathcal {G}^{(i)}(\tilde{s}_W^{(i)},s_B))\ge (1\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\ge (1\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\) from Claims 1(i–iii), and \(\varepsilon _i\ge \varepsilon ^{\prime }\).
3 Concluding Remarks
In this paper, we have shown that computing the game values of classes of stochastic mean payoff games with perfect information and a constant number of random positions admits approximation schemes, provided that the class of games at hand can be solved in pseudopolynomial time.
 1.
First, in the conference version of this paper [2], we claimed that, up to some technical requirements, a pseudopolynomial algorithm for a class of stochastic mean payoff games implies that this class has polynomial smoothed complexity (smoothed analysis is a paradigm to analyze algorithms with poor worstcase and good practical performance. Since its invention, it has been applied to a variety of algorithms and problems to explain their performance or complexity, respectively [31, 38]).
However, the proof of this result is flawed. In particular, the proof of a lemma that is not contained in the proceedings version, but only in the accompanying technical report (Oberwolfach Preprints, OWP 201022, Lemma 4.3) is flawed. The reason for this is relatively simple: If we are just looking for an optimal solution, then we can show that the secondbest solution is significantly worse than the best solution. For twoplayer games, where one player maximizes and the other player minimizes, we have an optimization problem for either player, given an optimal strategy of the other player. However, the optimal strategy of the other player depends on the random rewards of the edges. Thus, the two strategies are dependent. As a consequence, we cannot use the full randomness of the rewards to use an isolation lemma to compare the best and secondbest response to the optimal strategy of the other player.
Therefore, the question, whether stochastic mean payoff games have polynomial smoothed complexity, remains open.
 2.
In Sect. 2.3 we gave an approximation scheme that relatively approximates the value of a BWRgame from any starting position. If we apply this algorithm from different positions, we are likely to get two different relatively \(\varepsilon \)optimal strategies. In Sect. 2.4 we have shown that a modification of the algorithm in Sect. 2.3 yields a uniformly relatively \(\varepsilon \)optimal strategies when there are no random positions. It remains an interesting question whether this can be extended to BWRgames with a constant number of random positions.
 3.
Is it true that pseudopolynomial solvability of a class of stochastic mean payoff games implies polynomial smoothed complexity? In particular, do mean payoff games have polynomial smoothed complexity?
 4.
Related to Question 3: is it possible to prove an isolation lemma for (classes of) stochastic mean payoff games? We believe that this is not possible and that different techniques are required to prove smoothed polynomial complexity of these games.
 5.
While stochastic mean payoff games include parity games as a special case, the probabilistic model that we used here does not make sense for parity games. However, parity games can be solved in quasipolynomial time [8]. One wonders if they also have polynomial smoothed complexity under a reasonable probabilistic model.
 6.
Finally, let us remark that removing the assumption that k is constant in the above results remains a challenging open problem that seems to require totally new ideas. Another interesting question is whether stochastic mean payoff games with perfect information can be solved in parameterized pseudopolynomial time with the number k of stochastic positions as the parameter?
References
 1.Andersson, D., Miltersen, P.B.: The complexity of solving stochastic games on graphs. In: 20th International Symposium on Algorithms and Computation (ISAAC), Lecture Notes in Computer Science, vol. 5878, pp. 112–121. Springer (2009)Google Scholar
 2.Boros, E., Elbassioni, K., Fouz, M., Gurvich, V., Makino, K., Manthey, B.: Stochastic mean payoff games: smoothed analysis and approximation schemes. In: Proceedings of the 38th International Colloquium on Automata, Languages and Programming (ICALP), Part I, Lecture Notes in Computer Science, vol. 6755, pp. 147–158. Springer (2011)Google Scholar
 3.Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: Every stochastic game with perfect information admits a canonical form. RRR092009, RUTCOR. Rutgers University, New Brunswick (2009)Google Scholar
 4.Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: A convex programmingbased algorithm for mean payoff stochastic games with perfect information. Optim. Lett. (2017)Google Scholar
 5.Boros, E., Elbassioni, K.M., Gurvich, V., Makino, K.: A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In: Proceedings of the 14th International Conference on Integer Programming and Combinatorial Optimization (IPCO), Lecture Notes in Computer Science, vol. 6080, pp. 341–354. Springer (2010)Google Scholar
 6.Boros, E., Elbassioni, K.M., Gurvich, V., Makino, K.: A pseudopolynomial algorithm for mean payoff stochastic games with perfect information and a few random positions. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M.Z., Peleg, D. (eds.) Proceedings of 40th International Colloquium on Automata, Languages and Programming, Part I, Lecture Notes in Computer Science, vol. 7965, pp. 220–231. Springer (2013)Google Scholar
 7.Boros, Endre, Gurvich, Vladimir: Why chess and backgammon can be solved in pure positional uniformly optimal strategies? RRR212009, RUTCOR. Rutgers University, New Brunswick (2009)Google Scholar
 8.Calude, C.S., Jain, S., Khoussainov, B., Li, W., Stephan, F.: Deciding parity games in quasipolynomial time. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19–23, 2017, pp. 252–263 (2017)Google Scholar
 9.Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Termination criteria for solving concurrent safety and reachability games. In: Proceedings of the 20th Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 197–206 (2009)Google Scholar
 10.Chatterjee, K., IbsenJensen, R.: The complexity of ergodic meanpayoff games. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) Proceedings of the 41st International Colloquium on Automata, Languages and Programming, Part II, Lecture Notes in Computer Science, vol. 8572, pp. 122–133. Springer (2014)Google Scholar
 11.Chen, X., Deng, X., Teng, S.H.: Settling the complexity of computing twoplayer Nash equilibria. J. ACM 56(3), 14 (2009)MathSciNetCrossRefMATHGoogle Scholar
 12.Cho, Grace E., Meyer, Carl D.: Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl. 316(1–3), 21–28 (2000)MathSciNetCrossRefMATHGoogle Scholar
 13.Condon, Anne: The complexity of stochastic games. Inf. Comput. 96(2), 203–224 (1992)MathSciNetCrossRefMATHGoogle Scholar
 14.Condon, A.: On algorithms for simple stochastic games. In: Cai, J.Y. (ed.) Advances in Computational Complexity Theory, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 13, pp. 51–73. AMS, Providence, RI (1993)Google Scholar
 15.Dai, D., Ge, R.: Another subexponential algorithm for the simple stochastic game. Algorithmica 61, 1092–1104 (2011)MathSciNetCrossRefMATHGoogle Scholar
 16.Ehrenfeucht, A., Mycielski, J.: Positional games over a graph. Not. Am. Math. Soc. 20, A334 (1973)Google Scholar
 17.Ehrenfeucht, Andrzej, Mycielski, Jan: Positional strategies for mean payoff games. Int. J. Game Theory 8, 109–113 (1979)MathSciNetCrossRefMATHGoogle Scholar
 18.Gentilini, Raffaella: A note on the approximation of meanpayoff games. Inf. Process. Lett. 114(7), 382–386 (2014)MathSciNetCrossRefMATHGoogle Scholar
 19.Gillette, D.: Stochastic games with zero stop probabilities. In: Dresher, M., Tucker, A.W., Wolfe, P. (eds.) Contributions to the Theory of Games, Vol. 3, Annals of Mathematics Studies, vol. 39, pp. 179–187. Princeton University Press, Princeton (1957)Google Scholar
 20.Gimbert, H., Horn, F.: Simple stochastic games with few random vertices are easy to solve. In: Proceedings of the 11th International Conference on Foundations of Software Science and Computational Structures (FoSSaCS), Lecture Notes in Computer Science, vol. 4962, pp. 5–19. Springer (2008)Google Scholar
 21.Gurvich, Vladimir, Karzanov, Alexander V., Khachiyan, Leonid: Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85–91 (1988)CrossRefMATHGoogle Scholar
 22.Halman, Nir: Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LPtype problems. Algorithmica 49(1), 37–50 (2007)MathSciNetCrossRefMATHGoogle Scholar
 23.IbsenJensen, R., Miltersen, P.B.: Solving simple stochastic games with few coin toss positions. In: Epstein, L., Ferragina, P. (eds.) Proceedings of the 20th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, vol. 7501, pp. 636–647. Springer (2012)Google Scholar
 24.Nash Jr., J.F.: Equilibrium points in \(n\)person games. In: Proceedings of the National Academy of Sciences, Vol. 36, pp. 48–49 (1950)Google Scholar
 25.Nash Jr., J.F.: Noncooperative games. Ann. Math. 54(1), 286–295 (1951)MathSciNetCrossRefMATHGoogle Scholar
 26.Jurdziński, M.: Games for verification: algorithmic issues. Ph.D. thesis, University of Aarhus, BRICS (2000)Google Scholar
 27.Karp, Richard M.: A characterization of the minimum cycle mean in a digraph. Discrete Math. 23, 309–311 (1978)MathSciNetCrossRefMATHGoogle Scholar
 28.Karzanov, Alexander V., Lebedev, Vasilij N.: Cyclical games with prohibition. Math. Program. 60, 277–293 (1993)MathSciNetCrossRefMATHGoogle Scholar
 29.Liggett, Thomas M., Lippman, Steven A.: Stochastic games with perfect information and timeaverage payoff. SIAM Rev. 4, 604–607 (1969)MathSciNetCrossRefMATHGoogle Scholar
 30.Littman, M.L.: Algorithms for sequential decision making. Ph.D. thesis, Department of Computer Science, Brown University (1996)Google Scholar
 31.Manthey, B., Röglin, H.: Smoothed analysis: analysis of algorithms beyond worst case. IT Inf. Technol. 53(6), 280–286 (2011)CrossRefGoogle Scholar
 32.Mine, H., Osaki, S.: Markovian decision process. Elsevier, Amsterdam (1970)MATHGoogle Scholar
 33.Moulin, Hervé: Extension of two person zero sum games. J. Math. Anal. Appl. 5(2), 490–507 (1976)MathSciNetCrossRefMATHGoogle Scholar
 34.Moulin, H.: Prolongement des jeux à deux joueurs de somme nulle. Bull. Soc. Math. Fr. Mem. 45, 5–111 (1976)Google Scholar
 35.Pisaruk, Nicolai N.: Mean cost cyclical games. Math. Oper. Res. 24(4), 817–828 (1999)MathSciNetCrossRefMATHGoogle Scholar
 36.Roth, A., Balcan, M.F., Kalai, A., Mansour, Y.: On the equilibria of alternating move games. In: Proceedings of the 21st Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 805–816. SIAM (2010)Google Scholar
 37.Schewe, S.: From parity and payoff games to linear programming. In: Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science, vol. 5734, pp. 675–686. Springer (2009)Google Scholar
 38.Spielman, Daniel A., Teng, ShangHua: Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Commun. ACM 52(10), 76–84 (2009)CrossRefGoogle Scholar
 39.Vorobyov, Sergei: Cyclic games and linear programming. Discrete Appl. Math. 156(11), 2195–2231 (2008)MathSciNetCrossRefMATHGoogle Scholar
 40.Zwick, Uri, Paterson, Mike: The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158(1–2), 343–359 (1996)MathSciNetCrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.