The Impatient May Use Limited Optimism to Minimize Regret
Abstract
Discountedsum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may realize that, with hindsight, she could have increased her reward by playing differently: this difference in outcomes constitutes her regret value. The agent may thus elect to follow a regret minimal strategy. In this paper, it is shown that (1) there always exist regretminimal strategies that are admissible—a strategy being inadmissible if there is another strategy that always performs better; (2) computing the minimum possible regret or checking that a strategy is regretminimal can be done in Open image in new window , disregarding the computational cost of numerical analysis (otherwise, this bound becomes Open image in new window ).
Keywords
Admissibility Discountedsum games Regret minimization1 Introduction
 1.
A valuation function: given an infinite play, what is Eve’s reward?
 2.
Assumptions about the environment: is Adam trying to help or hinder Eve?
The valuation function can be Boolean, in which case one says that Eve wins or loses (one very classical example has Eve winning if the maximum value appearing infinitely often along the edges is even). In this setting, it is often assumed that Adam is adversarial, and the question then becomes: Can Eve always win? (The names of the players stem from this view: is there a strategy of \(\exists \)ve that always beats \(\forall \)dam?) The literature on that subject spans more than 35 years, with newly found applications to this day (see [4] for comprehensive lecture notes, and [7] for an example of recent use in the analysis of attacks in cryptocurrencies).

The adversarial environment hypothesis translates to Adam trying to minimize Eve’s reward, and the question becomes: Can Eve always achieve a reward of \(x\)? This problem is in Open image in new window [20] and showing a Open image in new window upperbound would constitute a major breakthrough (namely, it would imply the same for socalled parity games [15]). A strategy of Eve that maximizes her rewards against an adversarial environment is called worstcase optimal. Conversely, a strategy that maximizes her rewards assuming a collaborative environment is called bestcase optimal.

Assuming that the environment is adversarial is drastic, if not pessimistic. Eve could rather be interested in settling for a strategy \(\sigma \) which is not consistently bad: if another strategy \(\sigma '\) gives a better reward in one environment, there should be another environment for which \(\sigma \) is better than \(\sigma '\). Such strategies, called admissible [5], can be seen as an a priori rational choice.

Finally, Eve could put no assumption on the environment, but regret not having done so. Formally, the regret value of Eve’s strategy is defined as the maximal difference, for all environments, between the best value Eve could have obtained and the value she actually obtained. Eve can thus be interested in following a strategy that achieves the minimal regret value, aptly called a regretminimal strategy [10]. This constitutes an a posteriori rational choice [12]. Regretminimal strategies were explored in several contexts, with applications including competitive online algorithm synthesis [3, 11] and robotmotion planning [13, 14].
 1.
Optipess strategies are not only regretminimal (a fact established in [13]) but also admissible—note that there are regretminimal strategies that are not admissible and vice versa. On the way, we show that for any strategy of Eve there is an admissible strategy that performs at least as well; this is a peculiarity of discountedsum games.
 2.
The regret value of a given timeswitching strategy can be computed with an Open image in new window algorithm (disregarding the cost of numerical analysis). The main technical hurdle is showing that exponentially long paths can be represented succinctly, a result of independent interest.
 3.
The question Can Eve’s regret be bounded by \(x\)? is decidable in Open image in new window (again disregarding the cost of numerical analysis, Open image in new window otherwise), improving on the implicit Open image in new window algorithm of [13]. The algorithm consists in guessing a timeswitching strategy and computing its regret value; since optipess strategies are timeswitching strategies that are regretminimal, the algorithm will eventually find the minimal regret value of the input game.
In more details, in Sect. 4 we provide a crucial lemma that allows to represent long paths succinctly, and in Sect. 5, we argue that the important values of a game (regret, bestcase, worstcase) have short witnesses. In Sect. 6, we use these lemmas to devise our algorithms.
2 Preliminaries
We assume familiarity with basic graph and complexity theory. Some more specific definitions and known results are recalled here.
Game, Play, History. A (discountedsum) game \(\mathcal {G}\) is a tuple \((V,v_0, V_\exists ,E,w,\lambda )\) where V is a finite set of vertices, \(v_0\) is the starting vertex, \(V_\exists \subseteq V\) is the subset of vertices that belong to Eve, \(E \subseteq V \times V\) is a set of directed edges, \(w :E \rightarrow \mathbb {Z}\) is an (edge)weight function, and \(0< \lambda < 1\) is a rational discount factor. The vertices in \(V \setminus V_\exists \) are said to belong to Adam. Since we consider games played for an infinite number of turns, we will always assume that every vertex has at least one outgoing edge.
A play is an infinite path \(v_1 v_2 \cdots \in V^\omega \) in the digraph (V, E). A history \(h = v_1 \cdots v_n\) is a finite path. The length of \(h\), written \(h\), is the number of edges it contains: \(h \overset{\mathrm {def}}{=}n  1\). The set \(\mathbf {Hist}\) consists of all histories that start in \(v_0\) and end in a vertex from \(V_\exists \).
Strategies. A strategy of Eve in \(\mathcal {G}\) is a function \(\sigma \) that maps histories ending in some vertex \(v \in V_\exists \) to a neighbouring vertex \(v'\) (i.e., \((v,v') \in E\)). The strategy \(\sigma \) is positional if for all histories \(h, h'\) ending in the same vertex, \(\sigma (h) = \sigma (h')\). Strategies of Adam are defined similarly.
A history \(h = v_1 \cdots v_n\) is said to be consistent with a strategy \(\sigma \) of Eve if for all \(i \ge 2\) such that \(v_i \in V_\exists \), we have that \(\sigma (v_1 \cdots v_{i1}) = v_i\). Consistency with strategies of Adam is defined similarly. We write \(\mathbf {Hist}(\sigma )\) for the set of histories in \(\mathbf {Hist}\) that are consistent with \(\sigma \). A play is consistent with a strategy (of either player) if all its prefixes are consistent with it.
Given a vertex \(v\) and both Adam and Eve’s strategies, \(\tau \) and \(\sigma \) respectively, there is a unique play starting in \(v\) that is consistent with both, called the outcome of \(\tau \) and \(\sigma \) on \(v\). This play is denoted \(\mathbf {out}^{v}(\sigma ,\tau )\).
For a strategy \(\sigma \) of Eve and a history \(h \in \mathbf {Hist}(\sigma )\), we let \(\sigma _h\) be the strategy of Eve that assumes \(h\) has already been played. Formally, \(\sigma _h(h') = \sigma (h \cdot h') \) for any history \(h'\) (we will use this notation only on histories \(h'\) that start with the ending vertex of \(h\)).
Types of Strategies. A strategy \(\sigma \) of Eve is strongly worstcase optimal (SWO) if for every history h we have \(\mathbf {aVal}^{h}(\sigma )= \mathbf {aVal}^{h}\); it is strongly bestcase optimal (SBO) if for every history h we have \(\mathbf {cVal}^{h}(\sigma )= \mathbf {cVal}^{h}\).
Lemma 1

\((\forall v \in V)\left[ \mathbf {aVal}^{v}(\sigma ) = \mathbf {aVal}^{v}\right] \) iff \(\sigma \) is SWO;

\((\forall v \in V)\left[ \mathbf {cVal}^{v}(\sigma ) = \mathbf {cVal}^{v}\right] \) iff \(\sigma \) is SBO;

\((\forall v \in V)\left[ \mathbf {aVal}^v(\sigma ) = \mathbf {aVal}^v \wedge \mathbf {cVal}^v(\sigma ) = \mathbf {acVal}^v\right] \) iff \(\sigma \) is SBWO.
Regret can also be characterized by considering the point in history when Eve should have done things differently. Formally, for any vertices \(u\) and \(v\) let \( \mathbf {cVal}^{u}_{\lnot v}\) be the maximal \(\mathbf {cVal}^{u}(\sigma )\) for strategies \(\sigma \) verifying \(\sigma (u) \ne v.\) Then:
Lemma 2
A strategy \(\sigma \) is perfectly optimisticthenpessimistic (optipess, for short) if there are positional SBO and SBWO strategies \(\sigma ^{\mathrm {sbo}}\) and \(\sigma ^{\mathrm {sbwo}}\) such that \(\sigma = \sigma ^{\mathrm {sbo}}\overset{t}{\mathrm {\rightarrow }}\sigma ^{\mathrm {sbwo}}\) where Open image in new window
Theorem 1
([13]). For all optipess strategies \(\sigma \) of Eve, \(\mathbf {Reg}\left( \sigma \right) = \mathbf {Reg}\).
Conventions. As we have done so far, we will assume throughout the paper that a game \(\mathcal {G}\) is fixed—with the notable exception of the results on complexity, in which we assume that the game is given with all numbers in binary. Regarding strategies, we assume that bipositional strategies are given as two positional strategies and a threshold function encoded as a table with binaryencoded entries.
Example 1
Consider the following game, where round vertices are owned by Eve, and square ones by Adam. The double edges represent Eve’s positional strategy \(\sigma \):
Eve’s strategy has a regret value of \(2\lambda ^2/(1\lambda )\). This is realized when Adam plays from \(v_0\) to \(v_1\), from \(v''_1\) to x, and from \(v'_1\) to y. Against that strategy, Eve ensures a discountedsum value of 0 by playing according to \(\sigma \) while regretting not having played to \(v''_1\) to obtain \(2\lambda ^2/(1\lambda )\). \(\blacksquare \)
3 Admissible Strategies and Regret
There is no reason for Eve to choose a strategy that is consistently worse than another one. This classical idea is formalized using the notions of strategy domination and admissible strategies. In this section, which is independent from the rest of the paper, we study the relation between admissible and regretminimal strategies. Let us start by formally introducing the relevant notions:
Definition 1
Let \(\sigma _1, \sigma _2\) be two strategies of Eve. We say that \(\sigma _1\) is weakly dominated by \(\sigma _2\) if \(\mathbf {Val}(\mathbf {out}^{v_0}(\sigma _1,\tau )) \le \mathbf {Val}(\mathbf {out}^{v_0}(\sigma _2,\tau ))\) for every strategy \(\tau \) of Adam. We say that \(\sigma _1\) is dominated by \(\sigma _2\) if \(\sigma _1\) is weakly dominated by \(\sigma _2\) but not conversely. A strategy \(\sigma \) of Eve is admissible if it is not dominated by any other strategy.
In other words, admissible strategies are maximal elements for the weakdomination preorder.
Example 2
Consider the following game, where the strategy \(\sigma \) of Eve is shown by the double edges:
This strategy guarantees a discountedsum value of \(6\lambda ^2(1\lambda )\) against any strategy of Adam. Furthermore, it is worstcase optimal since playing to \(v_1\) instead of \(v_2\) would allow Adam the opportunity to ensure a strictly smaller value by playing to \(v''_1\). The latter also implies that \(\sigma \) is admissible. Interestingly, playing to \(v_1\) is also an admissible behavior of Eve since, against a strategy of Adam that plays from \(v_1\) to \(v'_1\), it obtains \(10\lambda ^2(1\lambda ) > 6\lambda ^2(1\lambda )\). \(\blacksquare \)
The two examples above can be used to argue that the sets of strategies that are regret minimal and admissible, respectively, are in fact incomparable.
Proposition 1
There are regretoptimal strategies that are not admissible and admissible strategies that have suboptimal regret.
Proof
(Sketch). Consider once more the game depicted in Example 1 and recall that the strategy \(\sigma \) of Eve corresponding to the double edges has minimal regret. This strategy is not admissible: it is dominated by the alternative strategy \(\sigma '\) of Eve that behaves like \(\sigma \) from \(v_1\) but plays to \(v'_2\) from \(v_2\). Indeed, if Adam plays to \(v_1\) from \(v_0\) then the outcomes of \(\sigma \) and \(\sigma '\) are the same. However, if Adam plays to \(v_2\) then the value of the outcome of \(\sigma \) is 0 while the value of the outcome of \(\sigma '\) is strictly greater than 0.
Similarly, the strategy \(\sigma \) depicted by double edges in the game from Example 2 is admissible but not regretminimizing. In fact, her strategy \(\sigma '\) that consists in playing \(v_1\) from \(v_0\) has a smaller regret. \(\square \)
In the rest of this section, we show that (1) any strategy is weakly dominated by an admissible strategy; (2) being dominated entails more regret; (3) optipess strategies are both regretminimal and admissible. We will need the following:
Lemma 3
([6]). A strategy \(\sigma \) of Eve is admissible if and only if for every history \(h \in \mathbf {Hist}(\sigma )\) the following holds: either \(\mathbf {cVal}^{h}(\sigma ) > \mathbf {aVal}^h\) or \(\mathbf {aVal}^h(\sigma )=\mathbf {cVal}^h(\sigma )=\mathbf {aVal}^h = \mathbf {acVal}^h\).
The above characterization of admissible strategies in socalled wellformed games was proved in [6, Theorem 11]. Lemma 3 follows from the fact that discountedsum games are wellformed.
3.1 Any Strategy Is Weakly Dominated by an Admissible Strategy
We show that discountedsum games have the distinctive property that every strategy is weakly dominated by an admissible strategy. This is in stark contrast with most cases where admissibility has been studied previously [6].
Theorem 2
Any strategy of Eve is weakly dominated by an admissible strategy.
Proof
(Sketch). The main idea is to construct, based on \(\sigma \), a strategy \(\sigma '\) that will switch to a SBWO strategy as soon as \(\sigma \) does not satisfy the characterization of Lemma 3. The first part of the argument consists in showing that \(\sigma \) is indeed weakly dominated by \(\sigma '\). This is easily done by comparing, against each strategy \(\tau \) of Adam, the values of \(\sigma \) and \(\sigma '\). The second part consists in verifying that \(\sigma '\) is indeed admissible. This is done by checking that each history h consistent with \(\sigma '\) satisfies the characterization of Lemma 3, that is \(\mathbf {cVal}^h(\sigma ') > \mathbf {aVal}^h\) or \(\mathbf {aVal}^h(\sigma ') = \mathbf {cVal}^h(\sigma ')= \mathbf {aVal}^h = \mathbf {acVal}^h\). \(\square \)
3.2 Being Dominated Is Regretful
Theorem 3
For all strategies \(\sigma ,\sigma '\) of Eve such that \(\sigma \) is weakly dominated by \(\sigma '\), it holds that \( \mathbf {Reg}\left( \sigma '\right) \le \mathbf {Reg}\left( \sigma \right) \).
Proof
It follows from Proposition 1, however, that the converse of the theorem is false.
3.3 Optipess Strategies Are both RegretMinimal and Admissible
Recall that there are admissible strategies that are not regretminimal and vice versa (Proposition 1). However, as a direct consequence of Theorems 2 and 3, there always exist regretminimal admissible strategies. It turns out that optipess strategies, which are regretminimal (Theorem 1), are also admissible:
Theorem 4
All optipess strategies of Eve are admissible.
Proof
Let \(\sigma = \sigma ^{\mathrm {sbo}}\overset{t}{\mathrm {\rightarrow }}\sigma ^{\mathrm {sbwo}}\) be an optipess strategy; we show it is admissible. To this end, let \(h = v_0 \dots v_n \in \mathbf {Hist}(\sigma )\); we show that one of the properties of Lemma 3 holds. There are two cases:
4 Minimal Values Are Witnessed by a Single Iterated Cycle
We start our technical work towards a better algorithm to compute the regret value of a game. Here, we show that there are succinctly presentable histories that witness small values in the game. Our intention is to later use this result to apply a modified version of Lemma 2 to bipositional strategies to argue there are small witnesses of a strategy having too much regret.
More specifically, we show that for any history \(h\), there is another history \(h'\) of the same length that has smaller value and such that \(h' = \alpha \cdot \beta ^k \cdot \gamma \) where \(\alpha \beta \gamma \) is small. This will allow us to find the smallest possible value among exponentially long histories by guessing \(\alpha , \beta , \gamma ,\) and \(k\), which will all be small. This property holds for a wealth of different valuation functions, hinting at possible further applications. For discountedsum games, the following suffices to prove the desired property holds.
Lemma 4
Within the proof of the key lemma of this section, and later on when we use it (Lemma 9), we will rely on the following notion of cycle decomposition:
Definition 2
A simplecycle decomposition (SCD) is a pair consisting of paths and iterated simple cycles. Formally, an SCD is a pair \(D = \langle (\alpha _i)_{i=0}^n, (\beta _j,k_j)_{j=1}^n\rangle \), where each \(\alpha _i\) is a path, each \(\beta _j\) is a simple cycle, and each \(k_j\) is a positive integer. We write \(D(j) = \beta _j^{k_j}\cdot \alpha _j\) and \(D(\star ) = \alpha _0\cdot D(1)D(2) \cdots D(n)\).
By carefully iterating Lemma 4, we have:
Lemma 5

\(h\) and \(h'\) have the same starting and ending vertices, and the same length;

\(\mathbf {Val}(h') \le \mathbf {Val}(h)\);

\(\alpha \beta \gamma  \le 4V^3\) and \(\beta \) is a simple cycle.
Proof

\(D(\star )\) and \(D'(\star )\) have the same starting and ending vertices, the same length, and satisfy \(\mathbf {Val}(D'(\star )) \le \mathbf {Val}(D(\star ))\) and \(n' \le n\);

Either \(n' < n\), or \(\alpha _0'\cdots \alpha _{n'}' < \alpha _0\cdots \alpha _n\), or \(\{k_i' \ge V\} < \{k_i \ge V\}\).
 1.
Any ßCD with \(n\) greater than \(V\) has a smaller ßCD;
 2.
Any ßCD with two \(k_{j}, k_{j'} > V\) has a smaller ßCD.
Together they imply that for a smallest ßCD \(D\), \(D(\star )\) is of the required form. Indeed let \(j\) be the unique value for which \(k_j > V\), then the statement of the Lemma is satisfied by letting \(\alpha = \alpha _0\cdot D(1)\cdots D(j1)\), \(\beta = \beta _j\), \(k = k_j\), and \(\gamma = \alpha _j\cdot D(j+1)\cdots D(n)\).
5 Short Witnesses for Regret, Antagonistic, and Collaborative Values
We continue our technical work towards our algorithm for computing the regret value. In this section, the overarching theme is that of short witnesses. We show that (1) the regret value of a strategy is witnessed by histories of bounded length; (2) the collaborative value of a game is witnessed by a simple path and an iterated cycle; (3) the antagonistic value of a strategy is witnessed by an SCD and an iterated cycle.
5.1 Regret Is Witnessed by Histories of Bounded Length
Lemma 6
Proof
It may seem from this lemma and the fact that \(t(v)\) may be very large that we will need to guess histories of important length. However, since we will be considering bipositional switching strategies, we will only be interested in guessing some properties of the histories that are not hard to verify:
Lemma 7
Proof
This is done by guessing multiple flows within the graph \((V, E)\). Here, we call flow a valuation of the edges \(E\) by integers, that describes the number of times a path crosses each edge. Given a vector in \(\mathbb {N}^E\), it is not hard to check whether there is a path that it represents, and to extract the initial and final vertices of that path [17].

\(h_0 = t(v_1)\) and for all \(1 \le j < i\), \(h_j = t(v_{j+1})  t(v_j)\);

For all \(0 \le j \le i\), \(h_j\) does not contain a vertex \(v_k\) with \(k \le j\).
To confirm the existence of a history with the given parameters, it is thus sufficient to guess the value \(i \le V_\exists \), and to guess \(i\) connected flows (rather than paths) with the above properties that are consistent with \(\sigma _1\). Finally, we guess a flow for \(h''\) consistent with \(\sigma _2\) if we need a switched history, and verify that it is starting at a switching vertex. The flows must sum to \(n+1\), with the last vertex being \(v'\), and the previous \(v\). \(\square \)
5.2 Short Witnesses for the Collaborative and Antagonistic Values
Lemma 8

\( \mathbf {cVal}^{v_0} = \max \{\mathbf {Val}(\alpha \cdot \beta ^\omega ) \mid (\alpha , \beta ) \in P\} \) and

membership in \(P\) is decidable in polynomial time w.r.t. the game.
Proof
We argue that the set P of all pairs \((\alpha ,\beta )\) with \(\alpha \) a simple path, \(\beta \) a simple cycle, and such that \(\alpha \cdot \beta \) is a path, gives us the result.
The first part of the claim is a consequence of Lemma 1: Consider positional SBO strategies \(\tau \) and \(\sigma \) of Adam and Eve, respectively. Since they are positional, the path \(\mathbf {out}^{v_0}(\sigma ,\tau )\) is of the form \(\alpha \cdot \beta ^\omega \), as required, and its value is \(\mathbf {cVal}^{v_0}\). We can thus let P be the set of all pairs obtained from such SBO strategies.
Moreover, it can be easily checked that for all pairs \((\alpha , \beta )\) such that \(\alpha \cdot \beta \) is a path in the game there exists a pair of strategies with outcome \(\alpha \cdot \beta ^\omega \). (Note that verifying whether \(\alpha \cdot \beta \) is a path can indeed be done in polynomial time given \(\alpha \) and \(\beta \).) Finally, the value \(\mathbf {Val}(\alpha \cdot \beta ^\omega )\) will, by definition, be at most \(\mathbf {cVal}^{v_0}\). \(\square \)
Lemma 9

\(\mathbf {aVal}^{v_0}(\sigma ) = \min \{\mathbf {Val}(D(\star )\cdot \beta ^\omega ) \mid (D, \beta ) \in K\} \) and

the size of each pair is polynomially bounded, and membership in \(K\) is decidable in polynomial time w.r.t. \(\sigma \) and the game.
Proof
We will prove that the set K of all pairs \((D,\beta )\) with D an SCD of polynomial length (which will be specified below), \(\beta \) a simple cycle, and such that \(D(\star ) \cdot \beta \) is a path, satisfies our claims.
We now argue that \(\mathbf {Val}(h)\) is witnessed by an SCD of polynomial size. This bears similarity to the proof of Lemma 7. Specifically, we will reuse the fact that histories consistent with \(\sigma \) can be split into histories played “between thresholds.”

\(h_0 = t(v_1)\) and for all \(1 \le j < i\), \(h_j = t(v_{j+1})  t(v_j)\);

For all \(0 \le j \le i\), \(h_j\) does not contain a vertex \(v_k\) with \(k \le j\).
We now diverge from the proof of Lemma 7. We apply Lemma 5 on each \(h_j\) in the game where the strategy \(\sigma _1\) is hardcoded (that is, we first remove every edge \((u, v) \in V_\exists \times V\) that does not satisfy \(\sigma _1(u) = v\)). We obtain a history \(h_0'h_1'\cdots h_i'\) that is still in \(\mathbf {Hist}(\sigma )\), thanks to the previous splitting of \(h\). We also apply Lemma 5 to \(h'\), this time in the game where \(\sigma _2\) is hardcoded, obtaining \(h''\). Since each \(h_j'\) and \(h''\) are expressed as \(\alpha \cdot \beta ^k\cdot \gamma \), there is an SCD \(D\) with no more than \(V_\exists \) elements that satisfies \(\mathbf {Val}(D(\star )) \le \mathbf {Val}(h)\)—naturally, since \(\mathbf {Val}(h)\) is minimal and \(D(\star ) \in \mathbf {Hist}(\sigma )\), this means that the two values are equal. Note that it is not hard, given an SCD \(D\), to check whether \(D(\star ) \in \mathbf {Hist}(\sigma )\), and that SCDs that are not valued \(\mathbf {Val}(h)\) have a larger value. \(\square \)
6 The Complexity of Regret
We are finally equipped to present our algorithms. To account for the cost of numerical analysis, we rely on the problem Open image in new window [2]. This problem consists in determining whether an arithmetic circuit with addition, subtraction, and multiplication gates, together with input values, evaluates to a positive integer. Open image in new window is known to be decidable in the socalled counting hierarchy, itself contained in the set of problems decidable using polynomial space.
Theorem 5
Proof
Let us now concentrate on the collaborative value that we need to evaluate in Eq. 2. To compute \(\mathbf {cVal}\), we rely on Lemma 8, which we apply in the game where \(v_n\) is set initial, and its successor forced not to be \(v\). We guess a pair \((\alpha _{\text {c}}, \beta _{\text {c}}) \in P\); we thus have \(\mathbf {Val}(\alpha _{\text {c}}\cdot \beta _{\text {c}}^\omega ) \le \mathbf {cVal}^{v_n}_{\lnot \sigma (h)}\), with at least one guessed pair \((\alpha _{\text {c}}, \beta _{\text {c}})\) reaching that latter value.
Let us now focus on computing \(\mathbf {aVal}^{v_n}(\sigma _h)\). Since \(\sigma \) is a bipositional switching strategy, \(\sigma _h\) is simply \(\sigma \) where \(t(v)\) is changed to \(\max \{0, t(v)  n\}\). Lemma 9 can thus be used to compute our value. To do so, we guess a pair \((D, \beta _{\text {a}}) \in K\); we thus have \(\mathbf {Val}(D(\star )\cdot \beta _{\text {a}}^\omega ) \ge \mathbf {aVal}^{v_n}(\sigma _h)\), and at least one pair \((D, \beta _{\text {a}})\) reaches that latter value.
Theorem 6
Proof
To decide the problem at hand, we ought to check that every strategy has a regret value greater than \(r\). However, optipess strategies being regretminimal, we need only check this for a class of strategies that contains optipess strategies: bipositional switching strategies form one such class.
What is left to show is that optipess strategies can be encoded in polynomial space. Naturally, the two positional strategies contained in an optipess strategy can be encoded succinctly. We thus only need to show that, with \(t\) as in the definition of optipess strategies (page 5), \(t(v)\) is at most exponential for every \(v \in V_\exists \) with \(t(v) \in \mathbb {N}\). This is shown in the long version of this paper. \(\square \)
Theorem 7
Proof
A consequence of the proof of Theorem 5 and the existence of optipess strategies is that the value \(\mathbf {Reg}\) of a game can be computed by a polynomial size arithmetic circuit. Moreover, our reliance on Open image in new window allows the input \(r\) in Theorem 5 to be represented as an arithmetic circuit without impacting the complexity. We can thus verify that for all bipositional switching strategies \(\sigma '\) (with sufficiently large threshold functions) and all possible polynomial size arithmetic circuits, \(\mathbf {Reg}(\sigma ) > r\) implies that \(\mathbf {Reg}(\sigma ') > r\). The latter holds if and only if \(\sigma \) is regret optimal since, as we have argued in the proof of Theorem 6, such strategies \(\sigma '\) include optipess strategies and thus regretminimal strategies. \(\square \)
7 Conclusion
We studied regret, a notion of interest for an agent that does not want to assume that the environment she plays in is simply adversarial. We showed that there are strategies that both minimize regret, and are not consistently worse than any other strategies. The problem of computing the minimum regret value of a game was then explored, and a better algorithm was provided for it.
The exact complexity of this problem remains however open. The only known lower bound, a straightforward adaptation of [14, Lemma 3] for discountedsum games, shows that it is at least as hard as solving parity games [15].
Our upper bound could be significantly improved if we could efficiently solve the following problem:
This can be seen as the problem of comparing succinctly represented numbers in a rational base. The Open image in new window oracle in Theorem 5 can be replaced by an oracle for this seemingly simpler arithmetic problem. The variant of Open image in new window in which \(r\) is an integer was shown to be in Open image in new window by Cucker, Koiran, and Smale [8], and they mention that the complexity is open for rational values. To the best of our knowledge, the exact complexity of Open image in new window is open even for \(n = 3\).
Footnotes
 1.
Technically, \(\sigma _h\) is positional in the game that records whether the switch was made.
Notes
Acknowledgements
We thank Raphaël Berthon and Ismaël Jecker for helpful conversations on the length of maximal (and minimal) histories in discountedsum games, James Worrell and Joël Ouaknine for pointers on the complexity of comparing succinctly represented integers, and George Kenison for his writing help.
References
 1.de Alfaro, L., Henzinger, T.A., Majumdar, R.: Discounting the future in systems theory. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 1022–1037. Springer, Heidelberg (2003). https://doi.org/10.1007/3540450610_79CrossRefGoogle Scholar
 2.Allender, E., Bürgisser, P., KjeldgaardPedersen, J., Miltersen, P.B.: On the complexity of numerical analysis. SIAM J. Comput. 38(5), 1987–2006 (2009). https://doi.org/10.1137/070697926MathSciNetCrossRefzbMATHGoogle Scholar
 3.Aminof, B., Kupferman, O., Lampert, R.: Reasoning about online algorithms with weighted automata. ACM Trans. Algorithms 6(2), 28:1–28:36 (2010). https://doi.org/10.1145/1721837.1721844MathSciNetCrossRefzbMATHGoogle Scholar
 4.Apt, K.R., Grädel, E.: Lectures in Game Theory for Computer Scientists. Cambridge University Press, New York (2011)CrossRefGoogle Scholar
 5.Brenguier, R., et al.: Nonzero sum games for reactive synthesis. In: Dediu, A.H., Janoušek, J., MartínVide, C., Truthe, B. (eds.) LATA 2016. LNCS, vol. 9618, pp. 3–23. Springer, Cham (2016). https://doi.org/10.1007/9783319300009_1CrossRefGoogle Scholar
 6.Brenguier, R., Pérez, G.A., Raskin, J.F., Sankur, O.: Admissibility in quantitative graph games. In: Lal, A., Akshay, S., Saurabh, S., Sen, S. (eds.) 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2016. LIPIcs, Chennai, India, 13–15 December 2016, vol. 65, pp. 42:1–42:14. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2016). https://doi.org/10.4230/LIPIcs.FSTTCS.2016.42
 7.Chatterjee, K., Goharshady, A.K., IbsenJensen, R., Velner, Y.: Ergodic meanpayoff games for the analysis of attacks in cryptocurrencies. In: Schewe, S., Zhang, L. (eds.) 29th International Conference on Concurrency Theory, CONCUR 2018. LIPIcs, Beijing, China, 4–7 September 2018, vol. 118, pp. 11:1–11:17. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2018). https://doi.org/10.4230/LIPIcs.CONCUR.2018.11
 8.Cucker, F., Koiran, P., Smale, S.: A polynomial time algorithm for diophantine equations in one variable. J. Symb. Comput. 27(1), 21–29 (1999). https://doi.org/10.1006/jsco.1998.0242MathSciNetCrossRefzbMATHGoogle Scholar
 9.Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (2012). https://doi.org/10.1007/9781461240549CrossRefzbMATHGoogle Scholar
 10.Filiot, E., Le Gall, T., Raskin, J.F.: Iterated regret minimization in game graphs. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 342–354. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642151552_31CrossRefGoogle Scholar
 11.Filiot, E., Jecker, I., Lhote, N., Pérez, G.A., Raskin, J.F.: On delay and regret determinization of maxplus automata. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, 20–23 June 2017, pp. 1–12. IEEE Computer Society (2017). https://doi.org/10.1109/LICS.2017.8005096
 12.Halpern, J.Y., Pass, R.: Iterated regret minimization: a new solution concept. Games Econ. Behav. 74(1), 184–207 (2012). https://doi.org/10.1016/j.geb.2011.05.012MathSciNetCrossRefzbMATHGoogle Scholar
 13.Hunter, P., Pérez, G.A., Raskin, J.F.: Minimizing regret in discountedsum games. In: Talbot, J.M., Regnier, L. (eds.) 25th EACSL Annual Conference on Computer Science Logic, CSL 2016. LIPIcs, Marseille, France, 29 August–1 September 2016, vol. 62, pp. 30:1–30:17. Schloss Dagstuhl  LeibnizZentrum fuer Informatik (2016). https://doi.org/10.4230/LIPIcs.CSL.2016.30
 14.Hunter, P., Pérez, G.A., Raskin, J.F.: Reactive synthesis without regret. Acta Inf. 54(1), 3–39 (2017). https://doi.org/10.1007/s002360160268zMathSciNetCrossRefzbMATHGoogle Scholar
 15.Jurdzinski, M.: Deciding the winner in parity games is in UP \(\cap \) coUP. Inf. Process. Lett. 68(3), 119–124 (1998). https://doi.org/10.1016/S00200190(98)001501MathSciNetCrossRefzbMATHGoogle Scholar
 16.Puterman, M.L.: Markov Decision Processes. WileyInterscience, New York (2005)zbMATHGoogle Scholar
 17.Reutenauer, C.: The Mathematics of Petri Nets. PrenticeHall Inc., Upper Saddle River (1990)zbMATHGoogle Scholar
 18.Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)MathSciNetCrossRefGoogle Scholar
 19.Watkins, C.J.C.H., Dayan, P.: Technical note Qlearning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698CrossRefzbMATHGoogle Scholar
 20.Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158(1&2), 343–359 (1996). https://doi.org/10.1016/03043975(95)001883MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.