Keywords

1 Introduction

We consider two-player turn-based stochastic games played on graphs. Games are a central model in computer science, in particular for the verification and synthesis of reactive systems [11, 17, 18]. A stochastic game is played by two players on a graph with stochastic transitions, where the players represent the system and the adversarial environment, while the objective represents the functional requirement that the synthesized system aims to satisfy with a probability p as large as possible. The vertices of the graph are partitioned into system, environment, and probabilistic vertices. A stochastic game is played in infinitely many rounds, starting by initially placing a token on some vertex. In every round, if the token is on a system or an environment vertex, then the owner of the vertex chooses a successor vertex; if the token is on a probabilistic vertex, then the successor vertex is chosen according to a given probability distribution. The token moves to the successor vertex, from where the next round starts. The outcome is an infinite sequence of vertices, which is winning for the system if it satisfies the given objective. The associated quantitative satisfaction problem is to decide, given a threshold p, whether the system can win with probability at least p. The almost-sure problem is the special case where \(p=1\), and the positive problem is to decide whether the system can win with positive probability. The almost-sure and the positive problems are referred to as the qualitative satisfaction problems. When the answer to these decision problems is \(\textsf{Yes}\), it is useful to construct a winning strategy for the system that can be used as a model for an implementation that ensures the objective be satisfied with the given probability.

Traditional objectives in stochastic games are \(\omega \)-regular such as reachability, safety, and parity objectives [11], or quantitative such as mean-payoff objectives [16, 27]. For example, a parity objective may specify that every request of the environment is eventually granted by the system, and a mean-payoff objective may specify the long-run average power consumption of the system. A well-known drawback of parity and mean-payoff objectives is that only the long-run behaviour of the system may be specified [1, 9, 21], allowing weird transient behaviour: for example, a system may grant all its requests but with an unbounded response time; or a system with long-run average power consumption below some threshold may exhibit arbitrarily long (but finite) sequences with average power consumption above the threshold. This limitation has led to considering finitary versions of those objectives [8, 9, 23], where the sequences of undesired transient behaviours must be of fixed or bounded length. Window mean-payoff objectives [8] are quantitative finitary objectives that strengthen the traditional mean-payoff objective: the satisfaction of a window mean-payoff objective implies the satisfaction of the standard mean-payoff objective. Given a length \(\ell \ge 1\), the fixed window mean-payoff objective (\(\textsf{FWMP}(\ell )\)) is satisfied if except for a finite prefix, from every point in the play, there exists a window of length at most \(\ell \) starting from that point such that the mean payoff of the window is at least a given threshold. In the bounded window mean-payoff objective (\(\textsf{BWMP}\)), it is sufficient that there exists some length \(\ell \) for which the \(\textsf{FWMP}(\ell )\) objective is satisfied.

Contributions. We present algorithmic solutions for stochastic games with window mean-payoff objectives, and show that the positive and almost-sure problems are solvable in polynomial time for \(\textsf{FWMP}(\ell )\) (Theorem 5), and are in \(\textsf{NP}\cap \textsf{coNP}\) for \(\textsf{BWMP}\) (Theorem 6). The complexity result for the almost-sure problem entails that the quantitative satisfaction problem is in \(\textsf{NP}\cap \textsf{coNP}\) (for both the fixed and bounded version), using standard techniques for solving stochastic games with prefix-independent objectives [13]. Note that the \(\textsf{NP}\cap \textsf{coNP}\) bound for the quantitative satisfaction problem matches the special case of reachability objectives in simple stochastic games [14], and thus would require a major breakthrough to be improved.

As a consequence, using the \(\textsf{FWMP}(\ell )\) objective instead of the standard mean-payoff objective provides a stronger guarantee on the system, and even with a polynomial complexity for the positive and the almost-sure problems (which is not known for mean-payoff objectives), and at no additional computational cost for the quantitative satisfaction problem. The solution relies on a reduction to non-stochastic two-player games (stochastic games without probabilistic vertices). The key result is to show that in order to win positively from some vertex of the game graph, it is necessary to win from some vertex of the non-stochastic game obtained by transforming all probabilistic vertices into adversarial vertices. While this condition, that we call the sure-almost-sure (SAS) property (Definition 1), was used to solve finitary Streett objectives [13], we follow a similar approach and generalize it to arbitrary prefix-independent objectives (Theorem 4). The bounds on the memory requirement of optimal strategies of \(\text {Player}~{1}\) can also be derived from the key result, and are the same as optimal bounds for non-stochastic games. For the \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) objectives in particular, we show that the memory requirement of \(\text {Player}~{2}\) is also no more than the optimal memory required by winning strategies in non-stochastic games.

As solving a stochastic game with a prefix-independent objective \(\varphi _{}\) reduces to solving non-stochastic games with objective \(\varphi _{}\) and showing that \(\varphi _{}\) satisfies the \(\textsf{SAS}\) property, we show that the \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) objectives satisfy the \(\textsf{SAS}\) property (Lemma 4, Lemma 5) and rely on the solution of non-stochastic games with these objectives [8] to complete the reduction.

We improve the memory bounds for optimal strategies of both players in non-stochastic games. It is stated in [8] that \(|V| \cdot \ell \) memory suffices for both players, where \(|V|\) and \(\ell \) are the number of vertices and the window length respectively. In [6, Theorem 2] and [19, Theorem 6.4], the bound is loosened to \(\mathcal {O}(w_{\max } \cdot \ell ^2)\) and \(\mathcal {O}(w_{\max } \cdot \ell ^2 \cdot |V|)\) for \(\text {Player}~{1}\) and \(\text {Player}~{2}\) respectively, where \(w_{\max }\) is the maximum absolute payoff in the graph, as the original tighter bounds [8] were stated without proof. Since the payoffs are given in binary, this is exponential in the size of the input. In contrast, we tighten the bounds stated in [8]. We show that for \(\text {Player}~{1}\), memory \(\ell \) suffices (Theorem 1), and improve the bound on memory of \(\text {Player}~{2}\) strategies as follows. We compute the set \(W\) of vertices from which \(\text {Player}~{2}\) can ensure that the mean payoff remains negative for \(\ell \) steps, as well as the vertices from which \(\text {Player}~{2}\) can ensure that the game reaches \(W\). These vertices are identified recursively on successive subgames of the original input game. If k is the number of recursive calls, then we show that \(k \cdot \ell \) memory suffices for \(\text {Player}~{2}\) to play optimally (Theorem 2). Note that \(k \le |V|\). We also provide a lower bound on the memory size for \(\text {Player}~{2}\). Given \(\ell \ge 2\), for every \(k \ge 1\), we construct a graph with a set \(V\) of vertices such that \(\text {Player}~{2}\) requires at least \(k+1=\frac{1}{2}(|V|-\ell +3)\) memory to play optimally (Theorem 3). This is an improvement over the result in [8] which showed that memoryless strategies do not suffice for \(\text {Player}~{2}\). Our result is quite counterintuitive since given an open window (a window in which every prefix has a total payoff less than \(0\)) that needs to be kept open for another \(j \le \ell \) steps from a vertex \(v\), one would conjecture that it is sufficient for a \(\text {Player}~{2}\) winning strategy to choose an edge from \(v\) that leads to the minimum payoff over paths of length \(j\). Thus for every \(j\), \(\text {Player}~{2}\) should choose a fixed edge and hence memory of size \(\ell \) should suffice. However, we show that this is not the case.

To the best of our knowledge, this work leads to the first study of stochastic games with finitary quantitative objectives.

Related work. Window mean-payoff objectives were first introduced in [8] for non-stochastic games, where solving \(\textsf{FWMP}(\ell )\) was shown to be in \(\textsf{PTIME}\) and \(\textsf{BWMP}\) in \(\textsf{NP}\cap \textsf{coNP}\). These have also been studied for Markov decision processes (MDPs) in [3, 4]. In [4], a threshold probability problem has been studied, while in [3], the authors studied the problem of maximising the expected value of the window mean-payoff. Positive, almost-sure, and quantitative satisfaction of \(\textsf{BWMP}\) in MDPs are in \(\textsf{NP}\cap \textsf{coNP}\) [4], the same as in non-stochastic games.

Parity objectives can be viewed as a special case of mean-payoff objectives [22]. A bounded window parity objective has been studied in [9, 12, 20] where a play satisfies the objective if from some point on, there exists a bound \(\ell \) such that from every state with an odd priority, a smaller even priority occurs within at most \(\ell \) steps. Non-stochastic games with bounded window parity objectives can be solved in \(\textsf{PTIME}\) [12, 20]. Stochastic games with bounded window parity objectives have been studied in [13] where the positive and almost-sure problems are in \(\textsf{PTIME}\) while the quantitative satisfaction problem is in \(\textsf{NP}\cap \textsf{coNP}\). A fixed version of the window parity objective has been studied for two-player games and shown to be \(\textsf{PSPACE}\)-complete [26]. Another window parity objective has been studied in [5] for which both the fixed and the bounded variants have been shown to be in \(\textsf{PTIME}\) for non-stochastic games. The threshold problem for this objective has also been studied in the context of MDPs, and both fixed and bounded variants are in \(\textsf{PTIME}\) [4]. Finally, synthesis for bounded eventuality properties in LTL is 2-\(\textsf{EXPTIME}\)-complete [23].

Due to lack of space, some of the proofs have been omitted. A full version of the paper can be found in [15].

2 Preliminaries

Stochastic games. We consider two-player turn-based zero-sum stochastic games (or simply, stochastic games in the sequel). The two players are referred to as \(\text {Player}~{1}\) and \(\text {Player}~{2}\). A stochastic game is a weighted directed graph \(\mathcal {G}= ((V, E), (V_{1}, V_{2}, V_{\Diamond }), \mathbb {P}, w)\), where:

  • \((V, E)\) is a directed graph with a finite set \(V\) of vertices and a set \(E\subseteq V\times V\) of directed edges such that for all vertices \(v_{}\in V\), the set \(E(v_{}) = \{ v_{}' \in V\mid (v_{}, v_{}') \in E\}\) of out-neighbours of \(v_{}\) is nonempty, i.e., \(E(v_{}) \ne \varnothing \) (no deadlocks);

  • \((V_{1}, V_{2}, V_{\Diamond })\) is a partition of \(V\). The vertices in \(V_{1}\) belong to \(\text {Player}~{1}\), the vertices in \(V_{2}\) belong to \(\text {Player}~{2}\), and the vertices in \(V_{\Diamond }\) are called probabilistic vertices (in figures, \(\text {Player}~{1}\) vertices are shown as circles, \(\text {Player}~{2}\) vertices as boxes, and probabilistic vertices as diamonds, and we use pronouns “she/her” for \(\text {Player}~{1}\) and “he/him” for \(\text {Player}~{2}\));

  • \(\mathbb {P}:V_{\Diamond }\rightarrow \mathcal {D}(V)\), where \(\mathcal {D}(V)\) is the set of probability distributions over \(V\), is a transition function that maps probabilistic vertices \(v_{}\in V_{\Diamond }\) to a probability distribution \(\mathbb {P}(v_{})\) over the set \(E(v_{})\) of out-neighbours of \(v_{}\) such that \(\mathbb {P}(v_{})(v_{}') > 0\) for all \(v_{}' \in E(v_{})\) (i.e., all out-neighbours have nonzero probability); for the algorithmic and complexity results, we assume that probabilities are given as rational numbers.

  • \(w:E\rightarrow \mathbb {Q}\) is a payoff function assigning a rational payoff to every edge in the game.

Stochastic games are played in rounds. The game starts by initially placing a token on some vertex. At the beginning of a round, if the token is on a vertex \(v_{}\), and \(v_{}\in V_{i}\) for \(i \in \{1, 2\}\), then \(\text {Player}~{i}\) chooses an out-neighbour \(v_{}' \in E(v_{})\); otherwise \(v_{}\in V_{\Diamond }\), and an out-neighbour \(v' \in E(v_{})\) is chosen with probability \(\mathbb {P}(v_{})(v')\). \(\text {Player}~{1}\) receives from \(\text {Player}~{2}\) the amount \(w(v_{}, v_{}')\) given by the payoff function, and the token moves to \(v'\) for the next round. This continues ad infinitum, resulting in an infinite sequence \(\pi = v_{0} v_{1} v_{2} \cdots \in V^\omega \) such that \((v_{i}, v_{i+1}) \in E\) for all \(i \ge 0\), called a play. For \(i < j\), we denote by \(\pi (i, j)\) the infix \(v_{i} v_{i+1} \cdots v_{j}\) of \(\pi \). Its length is \( |\pi (i, j)| = j - i\), the number of edges. We denote by \(\pi (0, j)\) the finite prefix \(v_{0} v_{1} \cdots v_{j}\) of \(\pi \), and by \(\pi (i, \infty )\) the infinite suffix \(v_{i} v_{i+1} \ldots \) of \(\pi \). We denote by \(\textsf{Plays}_{\mathcal {G}}\) and \(\textsf{Prefs}^{}_{\mathcal {G}}\) the set of all plays and the set of all prefixes in \(\mathcal {G}\) respectively; the symbol \(\mathcal {G}\) is omitted when it can easily be derived from the context. We denote by \(\textsf{First}(\rho )\) and \(\textsf{Last}(\rho )\) the first vertex and the last vertex of a prefix \(\rho \in \textsf{Prefs}^{}_{\mathcal {G}}\) respectively. We denote by \(\textsf{Prefs}^{i}_{\mathcal {G}}\) (\(i \in \{1, 2\}\)) the set of all prefixes \(\rho \) such that \(\textsf{Last}(\rho ) \in V_{i}\).

Objectives. An objective \(\varphi _{}\) is a Borel-measurable subset of \(\textsf{Plays}_{\mathcal {G}}\) [2]. A play \(\pi \in \textsf{Plays}_{\mathcal {G}}\) satisfies an objective \(\varphi _{}\) if \(\pi \in \varphi _{}\). In a (zero-sum) stochastic game \(\mathcal {G}\) with objective \(\varphi _{}\), the objective of \(\text {Player}~{1}\) is \(\varphi _{}\), and the objective of \(\text {Player}~{2}\) is the complement set \(\overline{\varphi _{}} = \textsf{Plays}_{\mathcal {G}} \setminus \varphi _{}\). Common examples of objectives are qualitative objectives such as reachability, safety, Büchi, and coBüchi.

An objective \(\varphi _{}\) is closed under suffixes if for all plays \(\pi \) satisfying \(\varphi _{}\), all suffixes of \(\pi \) also satisfy \(\varphi _{}\), that is, \(\pi (j, \infty ) \in \varphi _{}\) for all \(j \ge 0\). An objective \(\varphi _{}\) is closed under prefixes if for all plays \(\pi \) satisfying \(\varphi _{}\), for all prefixes \(\rho \) such that the concatenation \(\rho \cdot \pi \) is a play in \(\mathcal {G}\), i.e., \(\rho \cdot \pi \in \textsf{Plays}_{\mathcal {G}}\), we have that \(\rho \cdot \pi \in \varphi _{}\). An objective \(\varphi _{}\) is prefix-independent if it is closed under both prefixes and suffixes. An objective \(\varphi _{}\) is closed under suffixes if and only if the complement objective \(\overline{\varphi _{}}\) is closed under prefixes. Thus, an objective \(\varphi _{}\) is prefix-independent if and only if its complement \(\overline{\varphi _{}}\) is prefix-independent.

Strategies. A (deterministic) strategy for Player \(i\in \{1, 2\}\) in a game \(\mathcal {G}\) is a function \(\sigma _{i}: \textsf{Prefs}^{i}_{\mathcal {G}} \rightarrow V\) that maps prefixes ending in a vertex \(v_{}\in V_{i}\) to a successor of \(v_{}\). The set of all strategies of Player \(i \in \{1, 2\}\) in the game \(\mathcal {G}\) is denoted by \(\varLambda _{i}\). Strategies can be realised as the output of a (possibly infinite-state) Mealy machine. A Mealy machine is a deterministic transition system with transitions labelled by an input/output pair. Formally, a Mealy machine \(M\) is a tuple \((Q, q_0, \varSigma _i, \varSigma _o, \varDelta , \delta )\) where \(Q\) is the set of states of \(M\) (the memory of the induced strategy), \(q_0 \in Q\) is the initial state, \(\varSigma _i\) is the input alphabet, \(\varSigma _o\) is the output alphabet, \(\varDelta :Q \times \varSigma _i \rightarrow Q\) is a transition function that reads the current state of \(M\) and an input letter and returns the next state of \(M\), and \(\delta :Q \times \varSigma _i \rightarrow \varSigma _o \) is an output function that reads the current state of \(M\) and an input letter and returns an output letter. We point the reader to [15] for a description of how a strategy is defined by a Mealy machine.

The memory size of a strategy \(\sigma _{i}\) is the smallest number of states a Mealy machine defining \(\sigma _{i}\) can have. A strategy \(\sigma _{i}\) is memoryless if \(\sigma _{i}(\rho )\) only depends on the last element of the prefix \(\rho \), that is for all prefixes \(\rho , \rho ' \in \textsf{Prefs}^{i}_{\mathcal {G}}\) if \(\textsf{Last}(\rho ) = \textsf{Last}(\rho ')\), then \(\sigma _{i}(\rho ) = \sigma _{i}(\rho ')\). Memoryless strategies can be defined by Mealy machines with only one state.

A play \(\pi = v_{0} v_{1} \cdots \) is consistent with a strategy \(\sigma _{i} \in \varLambda _{i}\) (\(i \in \{1, 2\}\)) if \(v_{j+1} = \sigma _{i}(\pi (0, j))\) for all \(j \ge 0\) such that \(v_{j} \in V_{i}\). A play \(\pi \) is an outcome of \(\sigma _{1}\) and \(\sigma _{2}\) if it is consistent with both \(\sigma _{1}\) and \(\sigma _{2}\). We denote by \(\textsf{Pr}^{\sigma _{1}, \sigma _{2}}_{\mathcal {G}, v_{}}(\varphi _{})\) the probability that an outcome of \(\sigma _{1}\) and \(\sigma _{2}\) in \(\mathcal {G}\) with initial vertex \(v_{}\) satisfies \(\varphi _{}\).

Non-stochastic two-player games. A stochastic game without probabilistic vertices (that is, with \(V_{\Diamond }= \varnothing \)) is called a non-stochastic two-player game (or simply, non-stochastic game in the sequel). In a non-stochastic game \(\mathcal {G}\) with objective \(\varphi _{}\), a strategy \(\sigma _{i}\) is winning from a vertex \(v_{}\in V\) for \(\text {Player~}i\) (\(i\in \{1, 2\}\)) if every play in \(\mathcal {G}\) with initial vertex \(v_{}\) that is consistent with \(\sigma _{i}\) satisfies the objective \(\varphi _{}\). A vertex \(v_{}\in V\) is winning for \(\text {Player}~{i}\) in \(\mathcal {G}\) if \(\text {Player}~{i}\) has a winning strategy in \(\mathcal {G}\) from \(v_{}\). The set of vertices in \(V\) that are winning for \(\text {Player}~{i}\) in \(\mathcal {G}\) is the winning region of \(\text {Player}~{i}\) in \(\mathcal {G}\), denoted \(\langle \! \langle i \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\varphi _{})\). If a vertex v belongs to the winning region of Player i (\(i \in \{1,2\}\)), then Player i is said to play optimally from v if she follows a winning strategy.

Subgames. Given a stochastic game \(\mathcal {G}= ((V, E), (V_{1}, V_{2}, V_{\Diamond }), \mathbb {P}, w)\), a subset \(V' \subseteq V\) of vertices induces a subgame if (i) every vertex \(v_{}' \in V'\) has an out-neighbour in \(V'\), that is \(E(v_{}') \cap V' \ne \varnothing \), and (ii) every probabilistic vertex \(v_{}' \in V_{\Diamond }\cap V'\) has all out-neighbours in \(V'\), that is \(E(v_{}') \subseteq V'\). The induced subgame is \(((V', E'), (V_{1}\cap V', V_{2}\cap V', V_{\Diamond }\cap V'), \mathbb {P}', w')\), where \(E' = E\cap (V' \times V')\), and \(\mathbb {P}'\) and \(w'\) are restrictions of \(\mathbb {P}\) and \(w\) respectively to \((V', E')\). We denote this subgame by \(\mathcal {G}\upharpoonright V'\). Let \(\varphi _{}\) be an objective in the stochastic game \(\mathcal {G}\). We define the restriction of \(\varphi _{}\) to a subgame \(\mathcal {G}'\) of \(\mathcal {G}\) to be the set of all plays in \(\mathcal {G}'\) satisfying \(\varphi _{}\), that is, the set \(\textsf{Plays}_{\mathcal {G}'} \cap \varphi _{}\).

Satisfaction probability. A strategy \(\sigma _{1}\) of \(\text {Player}~{1}\) is winning with probability \(p\) from an initial vertex \(v_{}\) in \(\mathcal {G}\) for objective \(\varphi _{}\) if \(\textsf{Pr}^{\sigma _{1}, \sigma _{2}}_{\mathcal {G}, v_{}}(\varphi _{}) \ge p\) for all strategies \(\sigma _{2}\) of \(\text {Player}~{2}\). A strategy \(\sigma _{1}\) of \(\text {Player}~{1}\) is positive winning (resp., almost-sure winning) from \(v\) for \(\text {Player}~{1}\) in \(\mathcal {G}\) with objective \(\varphi _{}\) if \(\textsf{Pr}^{\sigma _{1}, \sigma _{2}}_{\mathcal {G}, v_{}}(\varphi _{}) > 0\) (resp., \(\textsf{Pr}^{\sigma _{1}, \sigma _{2}}_{\mathcal {G}, v_{}}(\varphi _{}) = 1\)) for all strategies \(\sigma _{2}\) of \(\text {Player}~{2}\). We refer to positive and almost-sure winning as qualitative satisfaction of \(\varphi _{}\), while for arbitrary \(p \in [0,1]\), we call it quantitative satisfaction. We denote by \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\varphi _{})\) (resp., by \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\varphi _{})\)) the positive (resp., almost-sure) winning region of \(\text {Player}~{1}\), i.e., the set of all vertices in \(\mathcal {G}\) from which \(\text {Player}~{1}\) has a positive (resp., almost-sure) winning strategy for \(\mathcal {G}\) with objective \(\varphi _{}\). If a vertex v belongs to the positive (resp., almost-sure) winning region of \(\text {Player}~{1}\), then \(\text {Player}~{1}\) is said to play optimally from v if she follows a positive (resp., almost-sure) winning strategy from v. We omit analogous definitions for \(\text {Player}~{2}\).

Positive attractors and traps. The \(\text {Player~}i\) positive attractor (\(i \in \{1, 2\}\)) to \(T \subseteq V\), denoted \(\textsf{PosAttr}_{i}(T)\), is the set of vertices in \(V\) from which \(\text {Player~}i\) can ensure that the token reaches a vertex in \(T\) with positive probability. It is possible to compute the positive attractor in \(\mathcal {O}(|E|)\) time [10]. In non-stochastic games, a positive attractor to a set \(T\) is the same as an attractor to the set \(T\), which we denote by \(\textsf{Attr}_{i}(T)\). Computation of \(\textsf{PosAttr}_{i}(T)\) gives a memoryless strategy for \(\text {Player}~{i}\) that ensures that the token reaches \(T\) with positive probability. We call such a strategy a positive-attractor strategy of \(\text {Player}~{i}\).

A trap for \(\text {Player}~{1}\) is a set \(T \subseteq V\) such that for every vertex \(v_{}\in T\), if \(v_{}\in V_{1}\cup V_{\Diamond }\), then \(E(v_{}) \subseteq T\), and if \(v_{}\in V_{2}\), then \(E(v_{}) \cap T \ne \varnothing \). In other words, from every vertex \(v_{}\in T\), \(\text {Player}~{2}\) can ensure (with probability 1) that the token never leaves \(T\), moreover using a memoryless strategy. A trap for \(\text {Player}~{2}\) can be defined analogously.

Remark 1

Let \(\mathcal {G}\) be a non-stochastic game with objective \(\varphi _{}\) for \(\text {Player}~{1}\). If \(\varphi _{}\) is closed under suffixes, then the winning region of \(\text {Player}~{1}\) is a trap for \(\text {Player}~{2}\). As a corollary, if \(\varphi _{}\) is prefix-independent, then the winning region of \(\text {Player}~{1}\) is a trap for \(\text {Player}~{2}\) and the winning region of \(\text {Player}~{2}\) is a trap for \(\text {Player}~{1}\).

3 Window mean payoff

We consider two types of window mean-payoff objectives, introduced in [8]: \((i)\) fixed window mean-payoff objective (\(\textsf{FWMP}(\ell )\)) in which a window length \(\ell \ge 1\) is given, and \((ii)\) bounded window mean-payoff objective (\(\textsf{BWMP}\)) in which for every play, we need a bound on window lengths. We define these objectives below.

For a play \(\pi \) in a stochastic game \(\mathcal {G}\), the total payoff of an infix \(\pi (i, i+n) = v_{i} v_{i+1} \cdots v_{i+n}\) is defined as \(\textsf{TP}(\pi (i, i+n)) = \sum _{k=i}^{i+n-1} w(v_{k}, v_{k+1})\). The mean payoff of an infix \(\pi (i, i+n) \) is defined as \(\textsf{MP}(\pi (i, i+n)) = \frac{1}{n} \textsf{TP}(\pi (i, i+n))\). Observe that the mean payoff of an infix is nonnegative if and only if the total payoff of the infix is nonnegative. The mean payoff of a play \(\pi \) is defined as \(\textsf{MP}(\pi ) = \displaystyle {\liminf _{n \rightarrow \infty }}\;\textsf{MP}(\pi (0, n))\). Given a window length \(\ell \ge 1\), a play \(\pi = v_{0} v_{1} \cdots \) in \(\mathcal {G}\) satisfies the fixed window mean-payoff objective \(\textsf{FWMP}{_{\mathcal {G}}}(\ell )\) if from every position after some point, it is possible to start an infix of length at most \(\ell \) with a nonnegative mean payoff. Formally,

$$\begin{aligned} \textsf{FWMP}{_{\mathcal {G}}}(\ell ) = \{ \pi \in \textsf{Plays}_{\mathcal {G}} \mid \exists k \ge 0 \cdot \forall i \ge k \cdot \exists j \in \{1, \ldots , \ell \}: \textsf{MP}(\pi (i, i + j)) \ge 0\}. \end{aligned}$$

We omit the subscript \(\mathcal {G}\) when it is clear from the context. Note that when \(\ell = 1\), the \(\textsf{FWMP}(1)\) and \(\overline{\textsf{FWMP}(1)}\) (i.e., the complement of \(\textsf{FWMP}(1)\)) objectives reduce to coBüchi and Büchi objectives respectively. The following properties of \(\textsf{FWMP}(\ell )\) have been observed in [8]. For all window lengths \(\ell \ge 1\), if a play \(\pi \) satisfies \(\textsf{FWMP}(\ell )\), then \(\textsf{MP}(\pi ) \ge 0\). In all plays satisfying \(\textsf{FWMP}(\ell )\), there exists a suffix that can be decomposed into infixes of length at most \(\ell \), each with a nonnegative mean payoff. Such a desirable robust property is not guaranteed by the classical mean-payoff objective, where infixes of unbounded lengths may have negative mean payoff.

As defined in [8], given a play \(\pi = v_{0} v_{1} \cdots \) and \(0 \le i < j\), we say that the window \(\pi (i, j)\) is open if the total-payoff of \(\pi (i, k)\) is negative for all \(i < k \le j\). Otherwise, the window is closed. Given \(j > 0\), we say a window is open at \(j\) if there exists an open window \(\pi (i, j)\) for some \(i < j\). The window starting at position \(i\) closes at position \(j\) if \(j\) is the first position after \(i\) such that the total-payoff of \(\pi (i, j)\) is nonnegative. If the window starting at \(i\) closes at \(j\), then for all \(i \le k < j\), the windows \(\pi (k, j)\) are closed. This property is called the inductive property of windows.

We also have the bounded window mean-payoff objective \(\textsf{BWMP}\). A play \(\pi \) satisfies the \(\textsf{BWMP}\) objective if there exists a window length \(\ell \ge 1\) for which \(\pi \) satisfies \(\textsf{FWMP}(\ell )\), i.e.,

$$ \textsf{BWMP}{_{\mathcal {G}}} = \{ \pi \in \textsf{Plays}_{\mathcal {G}} \mid \exists \ell \ge 1 : \pi \in \textsf{FWMP}(\ell )\} $$

Equivalently, a play \(\pi \) does not satisfy \(\textsf{BWMP}\) if for every suffix of \(\pi \), for all \(\ell \ge 1\), the suffix contains an open window of length \(\ell \). Note that both \(\textsf{FWMP}(\ell )\) for all \(\ell \ge 1\) and \(\textsf{BWMP}\) are prefix-independent objectives.

Decision problems. Given a game \(\mathcal {G}\), an initial vertex \(v_{}\in V\), a rational threshold \(p \in [0, 1]\), and an objective \(\varphi _{}\) (that is either \(\textsf{FWMP}(\ell )\) for a given window length \(\ell \ge 1\), or \(\textsf{BWMP}\)), consider the problem of deciding:

  • Positive satisfaction of \(\varphi _{}\): whether \(\text {Player}~{1}\) positively wins \(\varphi _{}\) from \(v_{}\), i.e., whether \(v_{}\in \langle \! \langle 1 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\varphi _{})\).

  • Almost-sure satisfaction of \(\varphi _{}\): whether \(\text {Player}~{1}\) almost-surely wins \(\varphi _{}\) from \(v_{}\), i.e., whether \(v_{}\in \langle \! \langle 1 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\varphi _{})\).

  • Quantitative satisfaction of \(\varphi _{}\) (also known as quantitative value problem [13]): whether \(\text {Player}~{1}\) wins \(\varphi _{}\) from \(v_{}\) with probability at least \(p\), i.e., whether \(\sup _{\sigma _{1} \in \varLambda _{1}} \inf _{\sigma _{2} \in \varLambda _{2}} \) \(\textsf{Pr}^{\sigma _{1}, \sigma _{2}}_{\mathcal {G}, v_{}}(\varphi _{}) \ge p\).

Note that these three problems coincide for non-stochastic games. As considered in previous works [3, 4, 8], the window length \(\ell \) is usually small (typically \(\ell \le |V|\)), and therefore we assume that \(\ell \) is given in unary (while the payoff on the edges is given in binary). From determinacy of Blackwell games [24], stochastic games with window mean-payoff objectives as defined above are determined, i.e., the largest probability with which \(\text {Player}~{1}\) is winning and the largest probability with which \(\text {Player}~{2}\) is winning add up to \(1\).

Algorithms for non-stochastic window mean-payoff games. To compute the positive and almost-sure winning regions for \(\text {Player}~{1}\) for \(\textsf{FWMP}(\ell )\), we recall intermediate objectives defined in [8]. The good window objective \(\textsf{GW}_{\mathcal {G}}(\ell )\) consists of all plays \(\pi \) in \(\mathcal {G}\) such that the window opened at the first position in the play closes in at most \(\ell \) steps:

$$ \textsf{GW}_{\mathcal {G}}(\ell ) = \{\pi \in \textsf{Plays}_{\mathcal {G}} \mid \exists j \in \{1, \ldots , \ell \} : \textsf{MP}(\pi (0,j)) \ge 0 \} $$

The direct fixed window mean-payoff objective \(\textsf{DirFWMP}_{\mathcal {G}}(\ell )\) consists of all plays \(\pi \) in \(\mathcal {G}\) such that from every position in \(\pi \), the window closes in at most \(\ell \) steps:

$$ \textsf{DirFWMP}_{\mathcal {G}}(\ell ) = \{\pi \in \textsf{Plays}_{\mathcal {G}} \mid \forall i \ge 0 : \pi (i, \infty ) \in \textsf{GW}_{\mathcal {G}}(\ell ) \} $$

The \(\textsf{FWMP}{_{\mathcal {G}}}(\ell )\) objective can be expressed in terms of \(\textsf{DirFWMP}_{\mathcal {G}}(\ell )\):

$$ \textsf{FWMP}{_{\mathcal {G}}}(\ell ) = \{\pi \in \textsf{Plays}_{\mathcal {G}} \mid \exists k \ge 0 : \pi (k, \infty ) \in \textsf{DirFWMP}_{\mathcal {G}}(\ell ) \} $$

We refer to Algorithms 1, 2, and 3 from [8] shown here with the same numbering. They compute the winning regions for \(\text {Player}~{1}\) for the \(\textsf{FWMP}(\ell )\), \(\textsf{DirFWMP}(\ell )\), and \(\textsf{GW}(\ell )\) objectives in non-stochastic games respectively. The original algorithms in [8] contain subtle errors for which the fixes are known [6, 19]. For completeness, we refer the reader to [15] for counterexamples for the algorithms in [8] along with brief explanations of correctness for the modified versions.

Algorithm 3 uses dynamic programming to compute, for all \(v \in V\) and all lengths \(i \in \{ 1, \ldots , \ell \}\), the largest payoff \(C_i(v)\) that \(\text {Player}~{1}\) can ensure from \(v\) within at most \(i\) steps. The winning region for \(\textsf{GW}(\ell )\) for \(\text {Player}~{1}\) consists of all vertices \(v\) such that \(C_{\ell }(v) \ge 0\).

figure a
figure b

4 Memory requirement for non-stochastic window mean-payoff games

The memory requirement for winning strategies of \(\text {Player}~{1}\) in non-stochastic games with objective \(\textsf{FWMP}(\ell )\) is claimed to be \(\mathcal {O}(|V| \cdot \ell )\) without proof [8, Lemma 7], and further “correctly stated” as \(\mathcal {O}(w_{\max } \cdot \ell ^2)\), where \(w_{\max }\) is the maximum absolute payoff in the graph [6, Theorem 2]. We improve upon these bounds and show that memory of size \(\ell \) suffices for a winning strategy of \(\text {Player}~{1}\). We also present a family of games with arbitrarily many vertices where \(\text {Player}~{2}\) is winning and all his winning strategies require at least \(\frac{1}{2} (|V| - \ell )+3\) memory, while it was only known that memoryless strategies are not sufficient for \(\text {Player}~{2}\) [8].

4.1 Memory requirement for Player 1 for FWMP objective

Upper bound on memory requirement for Player 1. We show that memory of size \(\ell \) suffices for winning strategies of \(\text {Player}~{1}\) for the \(\textsf{DirFWMP}(\ell )\) objective (Lemma 1), which in turn shows that the same memory also works for the \(\textsf{FWMP}(\ell )\) objective (Theorem 1).

Lemma 1

If \(\text {Player}~{1}\) wins in a non-stochastic game with objective \(\textsf{DirFWMP}(\ell )\), then \(\text {Player}~{1}\) has a winning strategy with memory of size \(\ell \).

Proof (Sketch)

[Sketch] Given a non-stochastic game \(\mathcal {G}\), let \(W_d\) be the winning region of \(\text {Player}~{1}\) in \(\mathcal {G}\) for objective \(\textsf{DirFWMP}(\ell )\). By definition, every vertex in \(W_d\) is also winning for \(\text {Player}~{1}\) for the \(\textsf{GW}(\ell )\) objective.

A winning strategy \(\sigma _{\textsf{d}}\) of \(\text {Player}~{1}\) in \(W_d\) satisfies the objective \(\textsf{GW}(\ell )\) by closing a window within at most \(\ell \) steps and then restarts with the same strategy, playing for \(\textsf{GW}(\ell )\) and so on. Using memory space \(Q = \{1, \ldots , \ell \}\), we may store the number of steps remaining before the window must close. However, the window may close any time within \(\ell \) steps, and the difficulty lies in detecting this independently of the history. For memory state \(q=i\) and the next visited vertex being v, intuitively, the memory should be updated to \(q = i-1\) if the window did not close yet upon reaching v, and to \(q = \ell \) if it did, but that depends on which path was followed to reach v (not just on v), which is not stored in the memory space.

The crux is to show that it is not always necessary for \(\text {Player}~{1}\) to be able to infer when the window closes. Given the current memory state \(q = i\), and the next visited vertex v, the memory update is as follows: if \(C_i(v) \ge 0\) (that is, \(\text {Player}~{1}\) can ensure the window from v will close within i steps), then we update to \(q = i-1\) (decrement) although the window may or may not have closed upon reaching v; otherwise \(C_i(v) < 0\) and we update to \(q = \ell -1\) (reset to \(\ell \) and decrement) and we show that in this case the window did close. Intuitively, updating to \(q = i-1\) is safe even if the window did close, because the strategy of \(\text {Player}~{1}\) will anyway ensure the (upcoming) window is closed within \(i-1 < \ell \) steps. A formal description of a Mealy machine with \(\ell \) states defining a winning strategy of \(\text {Player}~{1}\) for the \(\textsf{DirFWMP}(\ell )\) objective is given in [15].    \(\square \)

Theorem 1

If \(\text {Player}~{1}\) wins in a non-stochastic game \(\mathcal {G}\) with objective \(\textsf{FWMP}(\ell )\), then \(\text {Player}~{1}\) has a winning strategy with memory of size \(\ell \).

Proof (Sketch)

[Sketch] Since \(\textsf{FWMP}(\ell )\) is a prefix-independent objective, we have that the winning region \(\langle \! \langle 1 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\textsf{FWMP}(\ell ))\) of \(\text {Player}~{1}\) is a trap for \(\text {Player}~{2}\) (Remark 1), and induces a subgame, say \(\mathcal {G}_0\). Let there be \(k+1\) calls to the subroutine \(\textsf {NonStocDirFWMP} \) from Algorithm 1 where \(k < |V|\). We denote by \((W_i)_{i \in \{1, \dots , k\}}\) the nonempty \(W_d\) returned by the \(i^{\text {th}}\) call to the subroutine, and let \(A_i = \textsf{Attr}_{1}(W_i)\). The \(A_i\)’s are pairwise disjoint, and their union is \(\bigcup _{i=1}^{k} A_i = \langle \! \langle 1 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\textsf{FWMP}(\ell ))\). For \(i \in \{1, \ldots , k\}\), inductively define \(\mathcal {G}_i\) to be the subgame induced by the complement of \(A_i\) in \(\mathcal {G}_{i-1}\). Since \(\textsf{DirFWMP}(\ell )\) is closed under suffixes, for all \(i \in \{1, \ldots , k\}\), we have that \(W_i\) is a trap for \(\text {Player}~{2}\) in \(\mathcal {G}_i\) (Remark 1).

We construct a strategy \(\sigma _{1}^{\textsf{NS}}\) that follows the (memoryless) attractor strategy in \(\bigcup _i (A_i \setminus W_i)\), and follows the winning strategy \(\sigma _{\textsf{d}}\) for \(\textsf{DirFWMP}(\ell )\) objective (defined in the proof of Lemma 1) in \(\bigcup _i W_i\). The reader is pointed to [15] for a formal description of a Mealy machine defining the strategy \(\sigma _{1}^{\textsf{NS}}\). For the correctness of the construction, the crux is to show that one of the sets \(W_i\) (\(i \in \{1, \ldots , k\}\)) is never left from some point on. Intuitively, given the token is in \(A_i\) for some \(i \in \{1, \ldots , k\}\) (thus in \(\mathcal {G}_i\)), following \(\sigma _{1}^{\textsf{NS}}\), the token will either remain in \(A_i\), or leave the subgame \(\mathcal {G}_i\) and enter \(A_j\) for a smaller index \(j < i\). The result follows since this can be done at most \(k\) times.    \(\square \)

Lower bound on memory requirement for Player 1. In [8], the authors show a game with \(\ell = 4\) where \(\text {Player}~{1}\) requires memory at least 3. This can be generalized to arbitrary \(\ell \) to show that memory of size \(\ell - 1\) may be necessary (See [15] for details).

4.2 Memory requirement for Player 2 for FWMP objective

Upper bound on memory requirement for Player 2. Now we show that for the \(\overline{\textsf{FWMP}(\ell )}\) objective, \(\text {Player}~{2}\) has a winning strategy that uses memory of size at most \(|V| \cdot \ell \). This has been loosely stated in [8] without a formal proof.

Theorem 2

Let \(\mathcal {G}\) be a non-stochastic game with objective \(\overline{\textsf{FWMP}(\ell )}\) for \(\text {Player}~{2}\). Then, \(\text {Player}~{2}\) has a winning strategy with memory size at most \(|V| \cdot \ell \).

Proof (Sketch)

[Sketch] Since \(\textsf{FWMP}(\ell )\) is a prefix-independent objective, so is \(\overline{\textsf{FWMP}(\ell )}\). We have that \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\overline{\textsf{FWMP}(\ell )})\) is a trap for \(\text {Player}~{1}\) (Remark 1) and induces a subgame, say \(\mathcal {H}_0\), of \(\mathcal {G}\). Let there be \(k+1\) calls to the subroutine \(\textsf {GoodWin} \) from Algorithm 2 (where \(k < |V|\)), and let \(\mathcal {H}_i\) be the subgame corresponding to the \(i^{\text {th}}\) call of the subroutine. We denote by \((W_i)_{i\in \{1, \ldots , k\}}\) the complement of \(W_{gw}\) in \(\mathcal {H}_i\), where \(W_{gw}\) is returned by the \(i^{\text {th}}\) call to the subroutine, and let \(A_i = \textsf{Attr}_{2}(W_i)\). The \(A_i\)’s are pairwise disjoint, and their union is \(\bigcup _{i=1}^{k} A_i = \langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\overline{\textsf{FWMP}(\ell )})\).

We describe a winning strategy for the \(\overline{\textsf{FWMP}(\ell )}\) objective with memory \(k \cdot \ell \), which is at most \(|V| \cdot \ell \). The strategy is always in either attractor mode or window-open mode. When the game begins, it is in attractor mode. If the strategy is in attractor mode and the token is on a vertex \(v \in A_i \setminus W_i\) for some \(i \in \{1, \ldots , k\}\), then the attractor strategy is to eventually reach \(W_i\). If the token reaches \(W_i\), then the strategy switches to window-open mode. Since all vertices in \(W_i\) are winning for \(\text {Player}~{2}\) for the \(\overline{\textsf{GW}(\ell )}\) objective, he can keep the window open for \(\ell \) more steps, provided that \(\text {Player}~{1}\) does not move the token out of the subgame \(\mathcal {H}_{i}\). If, at some point, \(\text {Player}~{1}\) moves the token out of the subgame \(\mathcal {H}_{i}\) to \(A_j\) for a smaller index \(j < i\), then the strategy switches back to attractor mode, this time trying to reach \(W_j\) in the bigger subgame \(\mathcal {H}_j\). Otherwise, if \(\text {Player}~{2}\) keeps the window open for \(\ell \) steps, then the strategy switches back to attractor mode until the token reaches a vertex in \(\bigcup _{i=1}^{k} W_i\). This strategy can be defined by a Mealy machine \(M_{2}^{\textsf{NS}}\) with states \(\{1, \ldots , k\} \times \{1, \ldots , \ell \}\), where the first component tracks the smallest subgame \(\mathcal {H}_i\) in which the window started to remain open, and the second component indicates how many more steps the window needs to be kept open for. A formal description of \(M_{2}^{\textsf{NS}}\) can be found in [15].    \(\square \)

Lower bound on memory requirement for Player 2. In [8], it was shown that memoryless strategies do not suffice for \(\text {Player}~{2}\). We improve upon this lower bound. Given a window length \(\ell \ge 2\), for every \(k \ge 1\), we construct a game \(\mathcal {G}_{k,\ell }\) with \(2k+\ell -1\) vertices such that every winning strategy of \(\text {Player}~{2}\) in \(\mathcal {G}_{k,\ell }\) requires at least \(k+1\) memory.

Theorem 3

There exists a family of non-stochastic games \(\{\mathcal {G}_{k,\ell }\}_{k \ge 1, \ell \ge 2}\) with objective \(\textsf{FWMP}(\ell )\) for \(\text {Player}~{1}\) and edge weights in \(\{-1,0,+1\}\) such that every winning strategy of \(\text {Player}~{2}\) requires at least \(\frac{1}{2}(|V| - \ell + 1) + 1\) memory, where \(|V| = 2k+\ell -1\).

Proof (Sketch)

[Sketch] Let \(A = \{a_1, \ldots , a_k\}\), \(B = \{b_1, \ldots , b_k\}\), and \(C = \{c_1, \ldots , c_{\ell - 1}\}\) be pairwise disjoint sets. The vertices of \(\mathcal {G}_{k,\ell }\) are \(A \cup B \cup C\) with \(V_{1}= A \cup C\) and \(V_{2}= B\). Figure 1 shows the game \(\mathcal {G}_{4, 3}\). A more formal description of \(\mathcal {G}_{k,\ell }\) can be found in [15].

Fig. 1.
figure 1

Game \(\mathcal {G}_{4,3}\) with parameter \(k = 4\) and window length \(\ell = 3\). Red edges (from \(a_p\) to \(b_r\) for \(p \le r\)) have payoff \(-1\), black edges (from \(b_r\) to \(c_2\)) have payoff \(0\), and blue edges (the remaining edges) have payoff \(+1\).

Table 1. Good choices \(\chi (u, b_r)\) for all \(u \in A \cup \{c_1\}\) and \(b_r \in B\) in the game \(\mathcal {G}_{4, 3}\)

Observe that the only open windows of length \(\ell \) in the game \(\mathcal {G}_{k,\ell }\) are sequences of the form \(a_p b_{r} c_{\ell -1} \cdots c_1\) for all \(p \le r\). Also note that \(\text {Player}~{2}\) has a winning strategy that wins starting from every vertex in the game, as \(\text {Player}~{2}\) can force the token to eventually take a red edge followed by two black edges.

When the token reaches a vertex \(b_r \in B\), \(\text {Player}~{2}\) can either move the token to \(a_r \in A\) or to \(c_{\ell - 1}\in C\). Depending on which vertex the token was on before reaching \(b_r\), one of the two choices is good for \(\text {Player}~{2}\). If the token reaches \(b_r\) from \(a_p\) for \(p \le r\), then it is good for \(\text {Player}~{2}\) to move the token to \(c_{\ell - 1} \in C\) so that the window starting at \(a_p\) remains open for \(\ell \) steps. Otherwise, if the token reaches \(b_r\) from \(a_{r+1}\), then it is good for \(\text {Player}~{2}\) to move the token to \(a_r\) so that an edge with negative payoff may eventually be taken. For all \(u \in A \cup \{c_1\}\), for all \(b_r \in B\) such that \((u, b_r)\) is an edge in \(\mathcal {G}_{k,\ell }\), we denote by \(\chi (u, b_r)\) the vertex \(a_r\) or \(c_{\ell -1}\) that is good for \(\text {Player}~{2}\). We list the good choices in the game \(\mathcal {G}_{4,3}\) in Table 1. The columns are indexed by \(u \in A \cup \{c_1\}\) and the rows are indexed by \(b_r \in B\).

We show that for each column in the table, there exists a distinct memory state in every Mealy machine defining a winning strategy of \(\text {Player}~{2}\). This gives a lower bound of \(k+1\) on the number of states of such a Mealy machine. Since \(\mathcal {G}_{k, \ell }\) has \(2k + \ell - 1\) vertices, the memory requirement of a winning strategy of \(\text {Player}~{2}\) is at least \(\frac{1}{2}(|V| - \ell +1)+1\).    \(\square \)

Given a winning strategy \(\sigma _{2}^{\textsf{NS}}\) of \(\text {Player}~{2}\) for the \(\overline{\textsf{FWMP}(\ell )}\) objective, the following lemma gives an upper bound on the number of steps between consecutive open windows of length \(\ell \) in any play consistent with \(\sigma _{2}^{\textsf{NS}}\). This lemma is used in Section 6, where we construct an almost-sure winning strategy of \(\text {Player}~{2}\) for the \(\overline{\textsf{FWMP}(\ell )}\) objective.

Lemma 2

Let \(\mathcal {G}\) be a non-stochastic game such that \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}}(\overline{\textsf{FWMP}(\ell )}) = V\). Let \(\sigma _{2}^{\textsf{NS}}\) be a finite-memory strategy of \(\text {Player}~{2}\) of memory size \(\textsf{M}\) that is winning for \(\overline{\textsf{FWMP}(\ell )}\) from all vertices in \(\mathcal {G}\). Then, for every play \(\pi \) of \(\mathcal {G}\) consistent with \(\sigma _{2}^{\textsf{NS}}\), every infix of \(\pi \) of length \(\textsf{M}\cdot |V| \cdot \ell \) contains an open window of length \(\ell \).

The proof is based on the pigeonhole principle and appears in [15].

5 Reducing stochastic games to non-stochastic games

For a stochastic game \(\mathcal {G}\), let \(\mathcal {G}_{\textsf{NS}}= ((V, E), (V_{1}, V_{2}\cup V_{\Diamond }, \varnothing ), w)\) be the (adversarial) non-stochastic game corresponding to \(\mathcal {G}\), obtained by changing all probabilistic vertices in \(\mathcal {G}\) to \(\text {Player~}2\) vertices. In [13], a property of finitary Streett objective was used to solve stochastic games by reducing them to non-stochastic games with the same objective. In this section, we generalize this property for arbitrary prefix-independent objectives.

Definition 1

(Sure-almost-sure (SAS) property). A prefix-independent objective \(\varphi _{}\) in a game \(\mathcal {G}\) satisfies the \(\textsf{SAS}\) property if \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\varphi _{}}) = V\) implies \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\overline{\varphi _{}}) = V\), that is, if \(\text {Player}~{2}\) wins the objective \(\overline{\varphi _{}}\) from every vertex in \(\mathcal {G}_{\textsf{NS}}\), then \(\text {Player}~{2}\) almost-surely wins the same objective \(\overline{\varphi _{}}\) from every vertex in \(\mathcal {G}\).

Every prefix-independent objective satisfies the converse of the \(\textsf{SAS}\) property since if \(\text {Player}~{2}\) even wins positively from all vertices in \(\mathcal {G}\), then since he controls all probabilistic vertices in \(\mathcal {G}_{\textsf{NS}}\), he wins from all vertices in \(\mathcal {G}_{\textsf{NS}}\) by choosing optimal successors of probabilistic vertices. We show in Section 6 that for all stochastic games \(\mathcal {G}\), the objectives \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) satisfy the \(\textsf{SAS}\) property, while in Example 1, we show that there exists a stochastic game in which Büchi objective does not satisfy the \(\textsf{SAS}\) property.

Example 1

Consider the game \(\mathcal {G}\) in Figure 2. The objective \(\varphi \) in this game is a Büchi objective: a play \(\pi \) satisfies the Büchi objective if \(\pi \) visits vertex \({v_{}}_1\) infinitely often. Although from every vertex, with positive probability (in fact, with probability \(1\)), a play visits \({v_{}}_1\) infinitely often, from none of the vertices, \(\text {Player}~{1}\) can ensure the Büchi objective in the non-stochastic game \(\mathcal {G}_{\textsf{NS}}\).

Fig. 2.
figure 2

Büchi objective does not satisfy the \(\textsf{SAS}\) property in this game.

Theorem 4 gives complexity bounds for solving stochastic games with objectives satisfying the \(\textsf{SAS}\) property in terms of the complexity of solving non-stochastic games with the same objective.

Theorem 4

Given \(\mathcal {G}\) and \(\varphi _{}\), suppose in every subgame \(\mathcal {G}'\) of \(\mathcal {G}\), the objective \(\varphi _{}\) restricted to \(\mathcal {G}'\) satisfies the \(\textsf{SAS}\) property. Let \(\textsf {NonStocWin} _{\varphi _{}}(\mathcal {G}_{\textsf{NS}})\) be an algorithm computing \(\langle \! \langle 1 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\varphi _{})\) in \(\mathcal {G}_{\textsf{NS}}\) in time \(\mathbb {C}\). Then, the positive and almost-sure satisfaction of \(\varphi _{}\) can be decided in time \(\mathcal {O}(|V| \cdot (\mathbb {C} + |E|))\) and \(\mathcal {O}(|V|^2 \cdot (\mathbb {C} + |E|))\) respectively.

Moreover, for positive and almost-sure satisfaction of \(\varphi _{}\), the memory requirement for \(\text {Player}~{1}\) to play optimally in stochastic games is no more than that for non-stochastic games.

Theorem 4 does not give bounds on the memory requirement for winning strategies of \(\text {Player}~{2}\) for objective \(\varphi _{}\) in the stochastic game, but we provide such bounds specifically for \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) in Section 6. We give a sketch of the proof of Theorem 4 below. The complete proof appears in [15].

The algorithms to compute the positive and almost-sure winning regions in \(\mathcal {G}\), and their proofs of correctness are the same as in the case of finitary Streett objectives described in [13]. The \(\textsf {PosWin} _{\varphi _{}}\) algorithm (Algorithm 4) uses \(\textsf {NonStocWin} _{\varphi _{}}\) as a subroutine to compute \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\varphi _{})\). The fact that \(\varphi _{}\) satisfies the \(\textsf{SAS}\) property is used to show the correctness of this algorithm. The depth of recursive calls of this algorithm is bounded above by \(|V|\), which gives the complexity bound. The \(\textsf {ASWin} _{\varphi _{}}\) algorithm (Algorithm 5) in turn uses \(\textsf {PosWin} _{\varphi _{}}\) as a subroutine to compute the \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\varphi _{})\). The depth of recursive calls of this algorithm is also bounded above by \(|V|\), which gives the complexity bound. The following lemma, which is a special case of Theorem 1 in [7], is used to show the correctness of this algorithm.

figure c

Lemma 3

[7, Theorem 1] For a stochastic game \(\mathcal {G}\) with prefix-independent objective \(\varphi _{}\), if \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\varphi _{}) = V\), then \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\varphi _{}) = V\).

For both positive and almost-sure winning, \(\text {Player}~{1}\) does not require any additional memory in the stochastic game compared to the non-stochastic game. We describe a strategy \(\sigma _{1}^{\textsf{Pos}}\) of \(\text {Player}~{1}\) that is positive winning from all vertices in \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\varphi _{})\). In each recursive call to \(\textsf {PosWin} _{\varphi _{}}\) algorithm, from every vertex in \(W^{}_{1}\), the strategy \(\sigma _{1}^{\textsf{Pos}}\) mimics a winning strategy of \(\text {Player}~{1}\) in \(\mathcal {G}_{\textsf{NS}}\), while for vertices in \(A^{}_{1} \setminus W^{}_{1}\), it follows a memoryless attractor strategy to reach \(W^{}_{1}\). The same strategy is almost-sure winning for \(\text {Player}~{1}\) from all vertices in \(\langle \! \langle 1 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\varphi _{})\).

Finally, we look at the quantitative decision problem. The quantitative satisfaction for \(\varphi _{}\) can be decided in \(\textsf{NP}^{B} \) ([13, Theorem 6]), where \(B\) is an oracle deciding positive and almost-sure satisfaction problems for \(\varphi _{}\). It is not difficult to see that the quantitative satisfaction for \(\varphi _{}\) can be decided in \(\textsf{NP}^{B} \cap \textsf{coNP}^{B}\). Moreover, from the proof of [13, Theorem 6], it follows that the memory requirement of winning strategies for both players for the quantitative decision problem is no greater than that for the qualitative decision problem.

Corollary 1

Given \(\mathcal {G}\) and \(\varphi _{}\) as described in Theorem 4, let \(B\) be an oracle deciding the qualitative satisfaction of \(\varphi _{}\). Then, the quantitative satisfaction of \(\varphi _{}\) is in \(\textsf{NP}^{B} \cap \textsf{coNP}^{B}\). Moreover, the memory requirement of optimal strategies for both players is no greater than that for the positive and almost-sure satisfaction of \(\varphi _{}\).

6 Reducing stochastic window mean-payoff games: A special case

In this section, we show that for all stochastic games \(\mathcal {G}\) and for all \(\ell \ge 1\), the objectives \(\textsf{FWMP}{_{\mathcal {G}}}(\ell )\) and \(\textsf{BWMP}{_{\mathcal {G}}}\), which are prefix-independent, satisfy the \(\textsf{SAS}\) property of Definition 1. Thus, by Theorem 4, we obtain bounds on the complexity and memory requirements of \(\text {Player}~{1}\) for the positive and almost-sure satisfaction of these objectives. We also show that for both these objectives, the memory requirements of \(\text {Player}~{2}\) to play optimally for positive and almost-sure winning in stochastic games is no more than that of the non-stochastic games. The algorithms to compute the positive and almost-sure winning regions of \(\text {Player}~{1}\) for both \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) objectives are obtained by instantiating \(\varphi _{}\) equal to \(\textsf{FWMP}(\ell )\) and \(\textsf{BWMP}\) in Algorithms 4 and 5. Thus, we obtain the algorithms \(\textsf {PosWin} _{\textsf{FWMP}(\ell )}\), \(\textsf {ASWin} _{\textsf{FWMP}(\ell )}\), \(\textsf {PosWin} _{\textsf{BWMP}}\), and \(\textsf {ASWin} _{\textsf{BWMP}}\).

6.1 Fixed window mean-payoff objective

We first discuss the \(\textsf{SAS}\) property for the \(\textsf{FWMP}(\ell )\) objective.

Lemma 4

In stochastic games, for all \(\ell \ge 1\), the \(\textsf{FWMP}(\ell )\) objective satisfies the \(\textsf{SAS}\) property.

Proof (Sketch)

[Sketch] We show that for all stochastic games \(\mathcal {G}\), if \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\textsf{FWMP}(\ell )}) = V\), then \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\overline{\textsf{FWMP}(\ell )}) = V\). If \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\textsf{FWMP}(\ell )}) = V\), then from Theorem 2, there exists a finite-memory strategy \(\sigma _{2}^{\textsf{NS}}\) (say, with memory \(\textsf{M}\)) of \(\text {Player}~{2}\) that is winning for objective \(\overline{\textsf{FWMP}(\ell )}\) from every vertex in \(\mathcal {G}_{\textsf{NS}}\). Given such a strategy, we construct below a strategy \(\sigma _{2}^{\textsf{AS}}\) of \(\text {Player}~{2}\) in the stochastic game \(\mathcal {G}\) that is almost-sure winning for \(\overline{\textsf{FWMP}(\ell )}\) from every vertex in \(\mathcal {G}\).

In \(\mathcal {G}_{\textsf{NS}}\), \(\text {Player}~{2}\) controls vertices in \(V_{2}\cup V_{\Diamond }\), while in \(\mathcal {G}\), \(\text {Player}~{2}\) only controls vertices in \(V_{2}\) and the probability function \(\mathbb {P}\) determines the successors of vertices in \(V_{\Diamond }\). While the strategy \(\sigma _{2}^{\textsf{NS}}\) is winning for \(\overline{\textsf{FWMP}(\ell )}\) from all vertices in \(\mathcal {G}_{\textsf{NS}}\), it may not be almost-sure winning for \(\overline{\textsf{FWMP}(\ell )}\) in \(\mathcal {G}\). This is because each time the token is on a probabilistic vertex, a deviation occurs with positive probability, i.e., the successor chosen by the distribution is not consistent with \(\sigma _{2}^{\textsf{NS}}\), resulting in a potentially worse outcome for \(\text {Player}~{2}\). For example, in Figure 3, we see a stochastic game \(\mathcal {G}\) and a Mealy machine \(M_{2}^{\textsf{NS}}\) defining a strategy \(\sigma _{2}^{\textsf{NS}}\) that is winning for \(\text {Player}~{2}\) from all vertices in the non-stochastic game \(\mathcal {G}_{\textsf{NS}}\). In all outcomes in \(\mathcal {G}_{\textsf{NS}}\) that are consistent with \(\sigma _{2}^{\textsf{NS}}\), the token never moves from \(v_6\) to \(v_7\). However, in \(\mathcal {G}\), a deviation may lead the token to move along \((v_6, v_7)\). This results in a losing outcome for \(\text {Player}~{2}\) as the token gets trapped in \(v_8\), and subsequently no window remains open for \(\ell \) steps. Such harmful deviations can be detected, and starting with the strategy \(\sigma _{2}^{\textsf{NS}}\), we construct a strategy \(\sigma _{2}^{\textsf{AS}}\) that mimics \(\sigma _{2}^{\textsf{NS}}\) as long as harmful deviations do not occur, and resets otherwise, i.e., the strategy forgets the prefix of the play before the deviation. For instance, when the token moves from \(v_6\) to \(v_7\) in \(\mathcal {G}\), the strategy resets and the play continues as if the game began from \(v_7\). We call \(\sigma _{2}^{\textsf{AS}}\) a reset strategy. Figure 3 shows a part of a Mealy machine \(M_{2}^{\textsf{AS}}\) defining a reset strategy for the game \(\mathcal {G}\). The figure contains all the reset transitions out of \(q_4\), but the reset transitions out of \(q_1\), \(q_2\), and \(q_3\) have been omitted for space. More details on how to obtain a Mealy machine that defines \(\sigma _{2}^{\textsf{AS}}\) from a Mealy machine that defines \(\sigma _{2}^{\textsf{NS}}\) without adding any new states can be found in [15].

Fig. 3.
figure 3

(top) Stochastic game \(\mathcal {G}\) with objective \(\overline{\textsf{FWMP}(3)}\) for \(\text {Player}~{2}\). All unlabelled edges have payoff \(0\). (middle) Mealy machine \(M_{2}^{\textsf{NS}}\) defining a strategy \(\sigma _{2}^{\textsf{NS}}\) that is winning from all vertices in \(\mathcal {G}_{\textsf{NS}}\) for \(\overline{\textsf{FWMP}(3)}\). (bottom) Part of the Mealy machine \(M_{2}^{\textsf{AS}}\) defining a reset strategy that is almost-sure winning from all vertices in \(\mathcal {G}\).

Now, we argue that the reset strategy is almost-sure winning for \(\text {Player}~{2}\) from all vertices in \(\mathcal {G}\). If a play in \(\mathcal {G}\) continues for \(\textsf{M}\cdot |V| \cdot \ell \) steps without deviating, then by Lemma 2, it contains an open window of length \(\ell \). From any point in the play, the probability that \(\sigma _{2}^{\textsf{AS}}\) successfully copies \(\sigma _{2}^{\textsf{NS}}\) for \(i\) steps (that is, no deviations occur) is at least \(p^{i}\), where \(p\) is the minimum probability over all the edges in \(\mathcal {G}\). It follows that from every point in the play, the probability that an open window of length \(\ell \) occurs in the next \(\textsf{M}\cdot |V| \cdot \ell \) steps is at least \(p^{\textsf{M}\cdot |V| \cdot \ell }\). Therefore, from every position in the play, the probability that an open window of length \(\ell \) occurs eventually is at least \(\sum _{i \ge 0} (1 - p^{\textsf{M}\cdot |V| \cdot \ell })^i \cdot p^{\textsf{M}\cdot |V| \cdot \ell } = 1\). Thus, with probability 1, infinitely many open windows of length \(\ell \) occur in the outcome, and the outcome satisfies \(\overline{\textsf{FWMP}(\ell )}\). Thus, all vertices in \(\mathcal {G}\) are almost-sure winning for \(\text {Player}~{2}\) for \(\overline{\textsf{FWMP}(\ell )}\). For all stochastic games \(\mathcal {G}\), the objective \(\textsf{FWMP}(\ell )\) satisfies the \(\textsf{SAS}\) property.    \(\square \)

We now construct a strategy \(\sigma _{2}^{\textsf{Pos}}\) of \(\text {Player}~{2}\) that is positive winning from all vertices in \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\overline{\textsf{FWMP}(\ell )})\). Let \(W^{i}_{2}\) and \(A^{i}_{2}\) denote the sets \(W^{}_{2}\) and \(A^{}_{2}\) computed in the \(i^{\text {th}}\) recursive call of the \(\textsf {ASWin} _{\textsf{FWMP}(\ell )}\) algorithm respectively. If the token is in \(\bigcup _i W^{i}_{2}\), then \(\sigma _{2}^{\textsf{Pos}}\) mimics \(\sigma _{2}^{\textsf{AS}}\); if the token is in \(\bigcup _i A^{i}_{2} \setminus W^{i}_{2}\), then \(\sigma _{2}^{\textsf{Pos}}\) is a positive-attractor strategy to \(W^{i}_{2}\) which is memoryless. Then, \(\sigma _{2}^{\textsf{Pos}}\) is a positive winning strategy for \(\text {Player}~{2}\) from all vertices in \( \langle \! \langle 2 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\textsf{FWMP}(\ell ))\). Using Theorem 4, Corollary 1, and Lemma 4, we have the following.

Theorem 5

Given a stochastic game \(\mathcal {G}\), a window length \(\ell \ge 1\), and a threshold \(p \in [0,1]\), for \(\textsf{FWMP}{_{\mathcal {G}}}(\ell )\), the positive and almost-sure satisfaction for \(\text {Player}~{1}\) are in \(\textsf{PTIME}\), and the quantitative satisfaction is in \(\textsf{NP}\cap \textsf{coNP}\). Moreover for optimal strategies, memory of size \(\ell \) is sufficient for \(\text {Player}~{1}\) and memory of size \(|V| \cdot \ell \) is sufficient for \(\text {Player}~{2}\).

6.2 Bounded window mean-payoff objective

We show that the \(\textsf{SAS}\) property holds for the \(\textsf{BWMP}\) objective for all stochastic games \(\mathcal {G}\).

Lemma 5

In stochastic games, the \(\textsf{BWMP}\) objective satisfies the \(\textsf{SAS}\) property.

Proof (Sketch)

[Sketch] We show that for all stochastic games \(\mathcal {G}\), if \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\textsf{BWMP}}) = V\), then \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\overline{\textsf{BWMP}}) = V\). Since every play that satisfies \(\overline{\textsf{BWMP}}\) also satisfies \(\overline{\textsf{FWMP}(\ell )}\) for all \(\ell \ge 1\), if \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\textsf{BWMP}}) = V\), then \(\langle \! \langle 2 \rangle \! \rangle ^{\mathsf {}}_{\mathcal {G}_{\textsf{NS}}}(\overline{\textsf{FWMP}(\ell )}) = V\). It follows that for each \(\ell \ge 1\), \(\text {Player}~{2}\) has a finite-memory strategy (say, with memory \(\textsf{M}_{\ell }\)), that is winning for the \(\overline{\textsf{FWMP}(\ell )}\) objective from all vertices in \(\mathcal {G}_{\textsf{NS}}\). For every such strategy, we construct a reset strategy \(\sigma _{2}^{\ell }\) of memory size at most \(\textsf{M}_{\ell }\) as described in the proof of Lemma 4 that is almost-sure winning for the \(\overline{\textsf{FWMP}(\ell )}\) objective from all vertices. We use these strategies to construct an infinite-memory strategy \(\sigma _{2}^{\textsf{AS}}\) of \(\text {Player}~{2}\) that is almost-sure winning for \(\overline{\textsf{BWMP}}\) from all vertices in the stochastic game \(\mathcal {G}\).

Let \(p\) be the minimum probability over all edges in the game, and for all \(\ell \ge 1\), let \(q(\ell )\) denote \(p^{\textsf{M}_{\ell } \cdot |V| \cdot \ell }\). We partition a play of the game into phases \(1, 2, \ldots \) such that for all \(\ell \ge 1\), the length of phase \(\ell \) is equal to \(\textsf{M}_{\ell } \cdot |V| \cdot \ell \cdot \left\lceil 1/q(\ell )\right\rceil \). We define the strategy \(\sigma _{2}^{\textsf{AS}}\) as follows: if the game is in phase \(\ell \), then \(\sigma _{2}^{\textsf{AS}}\) is \(\sigma _{2}^{ \ell }\), the reset strategy that is almost-sure winning for \(\overline{\textsf{FWMP}(\ell )}\) in \(\mathcal {G}\).

We show that \(\sigma _{2}^{\textsf{AS}}\) is almost-sure winning for \(\text {Player}~{2}\) for \(\overline{\textsf{BWMP}}\) in \(\mathcal {G}\). Let \(E_\ell \) denote the event that phase \(\ell \) contains an open window of length \(\ell \). Given a play \(\pi \), if \(E_\ell \) occurs in \(\pi \) for infinitely many \(\ell \ge 1\), then for every suffix of \(\pi \) and for all \(\ell \ge 1\), the suffix contains an open window of length \(\ell \), and \(\pi \) satisfies \(\overline{\textsf{BWMP}}\). For all \(\ell \ge 1\), we compute the probability that \(E_\ell \) occurs in the outcome. For all \(\ell \ge 1\), we can divide phase \(\ell \) into \(\left\lceil 1/q(\ell ) \right\rceil \) blocks of length \(\textsf{M}_{\ell } \cdot |V| \cdot \ell \) each. If at least one of these blocks contains an open window of length \(\ell \), then the event \(E_\ell \) occurs. It follows from the proof of Lemma 4 that if \(\text {Player}~{2}\) follows \(\sigma _{2}^{\ell }\), then the probability that there exists an open window of length \(\ell \) in the next \(\textsf{M}_{\ell } \cdot |V| \cdot \ell \) steps is at least \(q(\ell )\). Hence, the probability that none of the blocks in the phase contains an open window of length \(\ell \) is at most \((1-q(\ell ))^{\left\lceil 1/q(\ell ) \right\rceil }\). Thus, the probability that \(E_\ell \) occurs in phase \(\ell \) is at least \( 1 - (1-q(\ell ))^{\left\lceil 1/q(\ell )\right\rceil } > 1-\frac{1}{e} \approx 0.63 > 0\). It follows that with probability 1, for infinitely many values of \(\ell \ge 1\), the event \(E_{\ell }\) occurs in \(\pi \).    \(\square \)

Note that solving a non-stochastic game with the \(\textsf{BWMP}\) objective is in \(\textsf{NP}\cap \textsf{coNP}\) [8]. Thus by Corollary 1, quantitative satisfaction for \(\textsf{BWMP}\) is in \(\textsf{NP}^{\textsf{NP}\cap \textsf{coNP}} \cap \textsf{coNP}^{\textsf{NP}\cap \textsf{coNP}}\), which is the same as \(\textsf{NP}\cap \textsf{coNP}\) [25].

Moreover, from [8], \(\text {Player}~{1}\) has a memoryless strategy and \(\text {Player}~{2}\) needs infinite memory to play optimally in non-stochastic games with the \(\textsf{BWMP}\) objective. From the proof of Lemma 5, by using the strategy \(\sigma _{2}^{\textsf{AS}}\), \(\text {Player}~{2}\) almost-surely wins \(\overline{\textsf{BWMP}}\) from all vertices in \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{AS}}_{\mathcal {G}}(\overline{\textsf{BWMP}})\). We can construct a positive winning strategy \(\sigma _{2}^{\textsf{Pos}}\) for \(\text {Player}~{2}\) from all vertices in \(\langle \! \langle 2 \rangle \! \rangle ^{\textsf{Pos}}_{\mathcal {G}}(\overline{\textsf{BWMP}})\) in a similar manner as done for the positive winning strategy for \(\overline{\textsf{FWMP}(\ell )}\) in Section 6.1. We summarize the results in the following theorem:

Theorem 6

Given a stochastic game \(\mathcal {G}\) and a threshold \(p \in [0,1]\), for \(\textsf{BWMP}{_{\mathcal {G}}}\), the positive, almost-sure, and quantitative satisfaction for \(\text {Player}~{1}\) are in \(\textsf{NP}\cap \textsf{coNP}\). Moreover, a memoryless strategy suffices for \(\text {Player}~{1}\), while \(\text {Player}~{2}\) requires an infinite memory strategy to play optimally.