Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Parity Games. A parity game is a zero-sum game played on a finite graph between two players called Even and Odd. Each vertex of the graph is labelled with an integer priority. The players move a token around the graph to form an infinite path, and the winner is determined by the parity of the largest priority that is visited infinitely often: Even wins if and only if it is even.

Parity games have attracted much attention in the verification community, because they capture the expressive power of nested least and greatest fixpoint operators, as formalized in the modal \(\mu \)-calculus and other fixpoint logics [11]. In particular, deciding the winner in parity games is polynomial-time equivalent to checking non-emptiness of non-deterministic parity tree automata, and to the modal \(\mu \)-calculus model checking, two fundamental algorithmic problems in automata theory, logic, and verification [7, 11, 27].

Strategy Improvement. We study strategy improvement for solving parity games, which is a local search technique that iteratively improves the strategy of one of the two players until an optimal strategy is found. Much like the simplex method for linear programming, and policy iteration algorithms for MDPs, strategy improvement algorithms can solve large parity games in a very small number of iterations in practice. The first strategy improvement algorithm devised specifically for parity games was given by Vöge and Jurdziński [30], and since then several further algorithms have been proposed [2, 20, 26].

Every strategy improvement algorithm uses a switching rule to decide how to proceed in each step. Theoretically, the best known switching rule is the random-facet rule, which provides a \(2^{O(\sqrt{n \log n})}\) upper bound on the number of strategy improvement iterations [21]. However, this is a single switch rule, which only switches one edge in each iteration. In practice, we would expect an arbitrarily chosen initial strategy to differ from an optimal strategy by O(n) edges, and so a single switch rule will necessarily cause the strategy improvement algorithm to take at least O(n) iterations.

In this paper, we focus on the greedy all-switches switching rule, which switches every vertex that can be switched in each iteration. This rule has been found to perform very well in practice, and as our experimental results confirm, greedy all-switches strategy improvement can solve games with more than ten million vertices in under one-hundred iterations.

Practical Aspects of Strategy Improvement. Although strategy improvement can solve large games using only a handful of iterations, experimental work has found that it performs very poorly in practice. For example, Friedmann and Lange performed an experimental study [10] in which the all-switches variant of the Vöge-Jurdziński algorithm was compared with Jurdziński’s small-progress measures algorithm [16] and Zielonka’s recursive algorithm [32]. They found that, in some cases, the Vöge-Jurdziński algorithm takes longer than an hour to solve games with under one-hundred thousand vertices, whereas the recursive algorithm can scale to problems that are an order of magnitude larger.

The reason for this is that, although the algorithm uses a very small number of iterations, the cost of performing each step is very high. In particular, the Vöge-Jurdziński algorithm, which has served as the standard benchmark for strategy improvement algorithms, has a step complexity of \(O(n^2)\), even in games with a small number of prioritiesFootnote 1.

In fact, there are existing algorithms that avoid this high step complexity. Björklund, Sandberg and Vorobyov present an algorithm whose step complexity is \(O(n \cdot d)\), where d is the number of distinct priorities used in the game. While d can be as large as n in the case where every vertex has a distinct priority, in practice d is often a very small constant such as 2 or 4. Luttenberger observed [20] that a particularly simple algorithm is obtained if one combines the Björklund-Vorobyov strategy improvement algorithm for mean-payoff games [2], with the discrete valuation used by the Vöge-Jurdziński algorithm. This algorithm also has \(O(n \cdot d)\) step complexity, and is the one that we will focus on in this paper.

Our Contribution. Our goal in this paper is to show that all-switches strategy improvement can be used in practice to solve large parity games efficiently. As we have mentioned above, the number of iterations needed by the algorithm is usually tiny, and so our effort is dedicated towards improving the cost of computing each step. The main contributions of this paper are:

Best Response Computation. In each iteration of strategy improvement, the algorithm has a strategy for one of the two players, and must compute a best response strategy for the opponent. This can be a very expensive operation in practice. For the algorithm studied in this paper, this boils down to solving a solving a shortest paths problem that can contain negative weights. The natural approach is to apply the Bellman-Ford algorithm. However, the first contribution of this paper is to show that there is a better approach: best responses can be computed using a one-player version of strategy improvement.

The performance of strategy improvement algorithms on shortest paths problems was studied by Cochet-Terrasson and Gaubert [3]. While they showed that the number of improvement iterations is at most O(|V||E|), their experimental results on random graphs found that strategy improvement was outperformed by the Bellman-Ford algorithm. They found that, while they typically both take the same number of iterations, one iteration of strategy improvement is more expensive than one iteration of Bellman-Ford.

Nevertheless, for the case of parity games, we give experimental evidence to show that one-player strategy improvement outperforms the Bellman-Ford algorithm when computing best responses. The experimental data shows that part, but not all, of this improvement is due to the fact that we can initialize the algorithm with the best response from the previous iteration.

A Parallel Algorithm for Strategy Improvement. Once we fix the decision to use one-player strategy improvement to compute best responses, we turn our attention towards the best way to implement this. In recent years, hardware manufacturers have made little progress in speeding up single-core CPU workloads, but progress continues to be made by adding more cores to CPUs. Moreover, GPUs continue to be made more powerful, and the rise of general purpose computing on GPUs has found many prominent applications, for example, in the training of deep neural networks. For this reason, we argue that good parallel implementations are required if we are to use an algorithm in practice.

The second contribution of this paper is to develop an efficient parallel algorithm for computing a strategy improvement iteration. The decision to use one-player strategy improvement to compute best responses means that the only non-trivial task is to compute the valuation of a pair of strategies. We show that this task can be reduced to an instance of list ranking, a well-studied problem that requires us to compute the prefix-sum of a linked list. The first work optimal parallel algorithm for list ranking was given by Cole and Vishkin [4]. However, their algorithm is complex and difficult to implement in practice. Helman and Jájá give a simpler randomized algorithm that is work efficient with high probability [12], and in particular it has been shown to work well on modern GPU hardware [31]. We give a modification of the Helman-Jájá algorithm that can be used to compute a valuation in a parity game.

Experimental Results. We have produced CPU and GPU implementations of the aforementioned parallel algorithm. The third contribution of this paper is to provide experimental results. We use the recently developed benchmark suite of Keiren [18], which unlike previous benchmarks from PGSolver [10], contains large parity games derived from real verification tasks.

We find that our implementation scales to parity games with tens of millions of vertices, and that the limiting factor is memory rather than run time. We also compare a single-threaded sequential CPU implementation with a multi-threaded parallel CPU implementation and a GPU implementation, which both use list ranking algorithm described above. While the parallel CPU implementation fails to deliver a meaningful speedup, the GPU implementation delivers an average speedup of 10.37.

Related Work. Strategy improvement originated from the policy iteration algorithms that are used to solve Markov decision processes [25], and can be seen as a generalisation of this method to the two-player setting. The method was first proposed by Hoffman and Karp in order to solve two-player concurrent stochastic games [1]. It was then adapted by Condon [5] to solve simple-stochastic games, and by Puri to solve discounted games [24]. Parity games can be reduced in polynomial time to discounted and simple-stochastic games [15, 33], so both of these algorithms could, in principle, be used to solve parity games, but both reductions require the use of large rational numbers, which makes doing so impractical.

The greedy all-switches switching rule has received much attention in the past. Its good experimental performance inspired research into whether it always terminates after polynomially many iterations. However, Friedmann showed that this was not the case [9], by giving an example upon which the algorithm takes exponential time. Recently, it has even been shown that deciding whether a given strategy is visited by the algorithm is actually a PSPACE-complete problem [8].

There has been much previous work on solving parity games in parallel. Most of the work so far has focused on the small progress measures algorithm [16], because it can be, implemented in parallel in an straightforward way. In the first paper on this topic, van de Pol and Weber presented a multi-core implementation of the algorithm [28], and Huth et al. presented further optimizations to that algorithm [14]. Two papers have reported on implementations on the parallel Cell processor used by the Playstation 3 [17, 29].

For parallel implementations of strategy improvement, there are two relevant papers. Hoffman and Luttenberger have given GPU implementations of various algorithms for solving parity games [13]. In particular, they implemented the strategy improvement algorithm that is studied in this paper, but they used the Bellman-Ford algorithm to compute best responses. Meyer and Luttenberger have reported on a GPU implementation of the Björklund-Vorobyov strategy improvement algorithm for mean-payoff games [22].

2 Preliminaries

Parity Games. A parity game is played between two players called Even and Odd. Formally, it is a tuple \(\mathcal {G} = (V, V_{\text {Even}}, V_{\text {Odd}}, E, {{\mathrm{pri}}})\), where (VE) is a directed graph. The sets \(V_{\text {Even}}\) and \(V_{\text {Odd}}\) partition V into the vertices belonging to player Even, and the vertices belonging to player Odd, respectively. The priority function \({{\mathrm{pri}}}: V \rightarrow \mathbb N\) assigns a positive natural number to each vertex. We define \(D_\mathcal {G} = \{p \in \mathbb N:\,{{\mathrm{pri}}}(v) = p \text { for some } v \in V\}\) to be the set of priorities that are used in \(\mathcal {G}\). We make the standard assumption that there are no terminal vertices, which means that every vertex is required to have at least one outgoing edge.

A positional strategy for player Even is a function that picks one outgoing edge for each Even vertex. More formally, a positional strategy for Even is a function \(\sigma : V_{\text {Even}}\rightarrow V\) such that, for each \(v \in V_{\text {Even}}\) we have that \((v, \sigma (v)) \in E\). Positional strategies for player Odd are defined analogously. We use \(\varSigma _{\text {Even}}\) and \(\varSigma _{\text {Odd}}\) to denote the set of positional strategies for players Even and Odd, respectively. Every strategy that we consider in this paper will be positional, so from now on, we shall refer to positional strategies as strategies.

A play of the game is an infinite path through the game. More precisely, a play is a sequence \(v_0, v_1, \dots \) such that for all \(i\in \mathbb N\) we have \(v_i \in V\) and \((v_i, v_{i+1}) \in E\). Given a pair of strategies \(\sigma \in \varSigma _{\text {Even}}\) and \(\tau \in \varSigma _{\text {Odd}}\), and a starting vertex \(v_0\), there is a unique play that occurs when the game starts at \(v_0\) and both players follow their respective strategies. So, we define \({{\mathrm{Play}}}(v_0, \sigma , \tau ) = v_0, v_1, \dots \), where for each \(i \in \mathbb N\) we have \(v_{i+1} = \sigma (v_i)\) if \(v_i \in V_{\text {Even}}\), and \(v_{i+1} = \tau (v_i)\) if \(v_i \in V_{\text {Odd}}\).

Given a play \(\pi = v_0, v_1, \dots \) we define:

$$\begin{aligned} {{\mathrm{MaxIo}}}(\pi ) = \max \{p:\,\exists \text { infinitely many } i \in \mathbb N\text { s.t.} {{\mathrm{pri}}}(v_i) = p\}, \end{aligned}$$

to be the largest priority that occurs infinitely often along \(\pi \). The winner is determined by the parity of this priority: a play \(\pi \) is winning for player Even if \({{\mathrm{MaxIo}}}(\pi )\) is even, and we say that \(\pi \) is winning for Odd if \({{\mathrm{MaxIo}}}(\pi )\) is odd.

A strategy \(\sigma \in \varSigma _{\text {Even}}\) is a winning strategy for a vertex \(v \in V\) if, for every (not necessarily positional) strategy \(\tau \in \varSigma _{\text {Odd}}\), we have that \({{\mathrm{Play}}}(v, \sigma , \tau )\) is winning for player Even. Likewise, a strategy \(\tau \in \varSigma _{\text {Odd}}\) is a winning strategy for v if, for every (not necessarily positional) strategy \(\sigma \in \varSigma _{\text {Even}}\), we have that \({{\mathrm{Play}}}(v, \sigma , \tau )\) is winning for player Odd. The following fundamental theorem states that parity games are positionally determined.

Theorem 1

[6, 23]. The set of vertices V can be partitioned into winning sets \((W_{\text {Even}}, W_{\text {Odd}})\), where Even has a positional winning strategy for all \(v \in W_{\text {Even}}\), and Odd has a positional winning strategy for all \(v \in W_{\text {Odd}}\).

The computational problem that we are interested in is, given a parity game, to determine the partition \((W_{\text {Even}}, W_{\text {Odd}})\).

3 Strategy Improvement

In this section, we describe the strategy improvement algorithm that we will consider in this paper. The algorithm, originally studied by Luttenberger [20], is a combination of the Björklund-Vorobyov strategy improvement algorithm for mean-payoff games [2], with the discrete strategy improvement valuation of Vöge and Jurdziński [30]. Strategy improvement algorithms select one of the two players to be the strategy improver. In this description, and throughout the rest of the paper, we will select player Even to take this role.

A Modified Game. At the start of the algorithm, we modify the game by introducing a new sink vertex s into the graph. For each vertex v of the Even player, we add a new edge from v to the sink. The idea is that, at any point player Even can choose to take the edge to s and terminate the game. The owner and priority of s are irrelevant, since the game stops once s is reached.

Admissible Strategies. A strategy \(\sigma \in \varSigma _{\text {Even}}\) is said to be admissible if player Odd cannot force and odd cycle when playing against \(\sigma \). More formally, \(\sigma \) is admissible if, for every strategy \(\tau \in \varSigma _{\text {Odd}}\) we have that \({{\mathrm{Play}}}(v, \sigma , \tau )\) either arrives at the sink s, or that \({{\mathrm{MaxIo}}}({{\mathrm{Play}}}(v, \sigma , \tau ))\) is even. The strategy improvement algorithm will only consider admissible strategies for player Even.

Valuations. The core of a strategy improvement algorithm is a valuation, which measures how good a given pair of strategies is from a given starting vertex. For our algorithm, the valuation will count how many times each priority occurs on a given path, so formally a valuation will be a function of the form \(D_\mathcal {G} \rightarrow \mathbb Z\), and we define \(\text {Vals}_\mathcal {G}\) to be the set of all functions of this form.

Given an admissible strategy \(\sigma \in \varSigma _{\text {Even}}\) for Even, a strategy \(\tau \in \varSigma _{\text {Odd}}\) for Odd, and a vertex \(v \in V\), we define the valuation function \({{\mathrm{Val}}}^{\sigma , \tau }(v): V \rightarrow \text {Vals}_\mathcal {G} \cup \{\top \}\) as follows.

  • If \(\pi = {{\mathrm{Play}}}(v, \sigma , \tau )\) is infinite, then we define \({{\mathrm{Val}}}^{\sigma , \tau }(v) = \top \)

  • If \(\pi = {{\mathrm{Play}}}(v, \sigma , \tau )\) is finite, then it must end at the sink s. The valuation of v will count the number of times that each priority appears along \(\pi \). Formally, if \(\pi = v_0, v_1, \dots , v_k, s\), then for each \(p \in D_V\) we define a valuation \(L \in \text {Vals}_\mathcal {G}\) as follows:

    $$\begin{aligned} L(p) = \bigl | \{i \in \mathbb N: {{\mathrm{pri}}}(v_i) = p\} \bigr |. \end{aligned}$$

    We set \({{\mathrm{Val}}}^{\sigma , \tau }(v) = L\).

Observe that, since \(\sigma \) is an admissible strategy, \({{\mathrm{Val}}}^{\sigma , \tau }(v) = \top \) implies that \({{\mathrm{Play}}}(v, \sigma , \tau )\) is winning for Even.

Next, we introduce the operator \(\sqsubseteq \) which will be used to compare valuations. We define \(L \sqsubseteq \top \) for every \(L \in \text {Vals}_\mathcal {G}\). When we compare two valuations, however, the procedure is more involved. Let \(L_1, L_2 \in \text {Vals}_g\) be two valuations. If \(L_1 = L_2\) then \(L_1 \sqsubseteq L_2\) and \(L_2 \sqsubseteq L_1\). Otherwise, we define \({{\mathrm{Maxdiff}}}(L_1, L_2)\) to be the largest priority p such that \(L_1(p) \ne L_2(p)\). Then, we have that \(L_1 \sqsubseteq L_2\) if and only if one of the following is true: either \(p = {{\mathrm{Maxdiff}}}(L_1, L_2)\) is even and \(L_1(p) < L_2(p)\), or \(p = {{\mathrm{Maxdiff}}}(L_1, L_2)\) is odd and \(L_1(p) > L_2(p)\).

Best Responses. Given an admissible strategy \(\sigma \in \varSigma _{\text {Even}}\), a best response is a strategy \(\tau \in \varSigma _{\text {Odd}}\) that minimizes the valuation of each vertex. More formally, we define, \({{\mathrm{br}}}(\sigma ) \in \varSigma _{\text {Odd}}\) to be a strategy with the property that \({{\mathrm{Val}}}^{\sigma , {{\mathrm{br}}}(\sigma )}(v) \sqsubseteq {{\mathrm{Val}}}^{\sigma , \tau }(v)\) for every strategy \(\tau \in \varSigma _{\text {Odd}}\) and every vertex v. If there is more than one such strategy, then we pick one arbitrarily. Although it is not immediately clear, it can be shown that there is a single strategy \(\tau \in \varSigma _{\text {Odd}}\) that simultaneously minimises the valuation of all vertices. Strategy improvement only ever considers an admissible strategy \(\sigma \) played against its best response, so we define the shorthand \({{\mathrm{Val}}}^{\sigma } = {{\mathrm{Val}}}^{\sigma , {{\mathrm{br}}}(\sigma )}\).

The Algorithm. We are now ready to describe the strategy improvement algorithm. It begins by selecting the following initial strategy for Even. We define \(\sigma _{\text {init}}\in \varSigma _{\text {Even}}\) so that \(\sigma _{\text {init}}(v) = s\) for all \(v \in V_{\text {Even}}\). Note that there is no guarantee that \(\sigma _{\text {init}}\) is admissible, because there may be a cycle with odd parity that contains only vertices belonging to player Odd. So, a preprocessing step must be performed to eliminate this possibility. One simple preprocessing procedure is to determine the set of vertices from which Odd can avoid visiting an Even vertex, and to insert enough dummy Even vertices into this subgame to prevent Odd from forming a cycle. As it happens, none of the games considered in our experimental study require preprocessing, so this is not a major issue in practice.

In each iteration, strategy improvement has a strategy for the improver. The first step is to compute the set of switchable edges for this strategy. An edge (vu) is switchable in strategy \(\sigma \) if \(u \ne \sigma (v)\) and \({{\mathrm{Val}}}^{\sigma }(\sigma (v)) \sqsubset {{\mathrm{Val}}}^{\sigma }(u)\). We define \(\mathcal {S}^{\sigma }\) to be the set of edges that are switchable in \(\sigma \).

The algorithm selects a non-empty subset of the switchable edges and switches them. We say that a set of edges \(S \subseteq E\) is a switchable set if, for every pair of edges \((v, u), (v', u') \in S\), we have has \(v \ne v'\), that is, S does not contain two outgoing edges for a single vertex. If S is a switchable set and \(\sigma \) is a strategy, then we can switch S in \(\sigma \) to create the new strategy \(\sigma [S]\) where, for every vertex v:

$$\begin{aligned} \sigma [S](v) = {\left\{ \begin{array}{ll} u &{} (v, u) \in S, \\ \sigma (v) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

The key property of strategy improvement is that, if \(S \subseteq \mathcal {S}^{\sigma }\) is a switchable set that contains only switchable edges, then we have that \(\sigma [S]\) is better than \(\sigma \) in the \(\sqsubseteq \) ordering. Formally, this means that \({{\mathrm{Val}}}^{\sigma }(v) \sqsubseteq {{\mathrm{Val}}}^{\sigma [S]}(v)\) for all vertices v, and there exists at least one vertex for which we have \({{\mathrm{Val}}}^{\sigma }(v) \sqsubset {{\mathrm{Val}}}^{\sigma [S]}(v)\).

The strict improvement property mentioned above implies that the algorithm cannot visit the same strategy twice, so it must eventually terminate. The algorithm can only terminate once it has reached a strategy with no switchable edges. We can use this strategy to determine winning sets for both players. That is, if \(\sigma ^*\) is a strategy with no switchable edges, then we can prove that: \(W_{\text {Even}}= \{v \in V:\,{{\mathrm{Val}}}^{\sigma ^*}(v) = \top \}\), and \(W_{\text {Odd}}= \{v \in V:\,{{\mathrm{Val}}}^{\sigma ^*}(v) \ne \top \}\).

Luttenberger has given a direct proof that the algorithm is correct [20]. Actually, a simple proof of correctness can be obtained directly from the correctness of Björklund-Vorobyov (BV) algorithm. It is not difficult to show that if we turn the parity game into a mean-payoff game using the standard reduction, and then apply the BV algorithm to the resulting mean-payoff game, then the BV algorithm and this algorithm will pass through exactly the same sequence of strategies.

Theorem 2

The following statements are true.

  • For every strategy \(\sigma \in \varSigma _{\text {Even}}\) there is at least one best response \(\tau \in \varSigma _{\text {Odd}}\).

  • Let \(\sigma \) be a strategy, and let \(S \subseteq \mathcal {S}^{\sigma }\) be a switchable set that contains only switchable edges. We have \({{\mathrm{Val}}}^{\sigma }(v) \sqsubseteq {{\mathrm{Val}}}^{\sigma [S]}(v)\) for all vertices v, and there exists at least one vertex for which we have \({{\mathrm{Val}}}^{\sigma }(v) \sqsubset {{\mathrm{Val}}}^{\sigma [S]}(v)\).

  • Let \(\sigma \) be a strategy that has no switchable edges. We have \(W_{\text {Even}}= \{v \in V:\,{{\mathrm{Val}}}^{\sigma }(v) = \top \}\), and \(W_{\text {Odd}}= \{v \in V:\,{{\mathrm{Val}}}^{\sigma }(v) \ne \top \}\).

Switching Rules. Strategy improvement always switches a subset of switchable edges, but we have not discussed which set should be chosen. This decision is delegated to a switching rule, which for each strategy picks a subset of the switchable edges. In this paper we will focus on the greedy all-switches rule, which always switches every vertex that has a switchable edge. If a vertex has more than one switchable edge, then it picks an edge (vu) that maximizes \({{\mathrm{Val}}}^{\sigma }(u)\) under the \(\sqsubseteq \) ordering (arbitrarily if there is more than one such edge).

4 Computing Best Responses

To implement strategy improvement, we need a method for computing best responses. Since we only consider admissible strategies for Even, we know that Odd cannot create a cycle with odd parity, and so computing a best response simply requires us to find a shortest-path from each vertex to the sink, where path lengths are compared using the \(\sqsubseteq \) ordering. Any vertex that has no path to the sink is winning for Even. The obvious way to do this is to apply a shortest-paths algorithm. Note that odd priorities correspond to negative edges weights, so a general algorithm, such as the Bellman-Ford algorithm, must be applied.

One-Player Strategy Improvement. In this paper, we propose an alternative: we will use one-player strategy improvement equipped with the greedy-all switches rule. We say that an edge (vu) is Odd-switchable if \(v \in V_{\text {Odd}}\) and \({{\mathrm{Val}}}^{\sigma , \tau }(\sigma (v)) \sqsupset {{\mathrm{Val}}}^{\sigma , \tau }(u)\). To find a best response against a fixed admissible strategy \(\sigma \in \varSigma _{\text {Even}}\), the algorithm starts with an arbitrary Odd strategy \(\tau \in \varSigma _{\text {Odd}}\), and repeatedly switches Odd-switchable edges until it arrives at an Odd strategy in which there are no Odd-switchable edges.

It is not difficult to see that if \(\tau \) has no Odd-switchable edges when played against \(\sigma \), then it is a best response against \(\sigma \), because a strategy with no Odd-switchable edges satisfies the Bellman optimality equations for shortest paths.

One-player strategy improvement algorithms for solving shortest paths problems were studied by Cochet-Terrasson and Gaubert [3]. In particular, they proved that the all-switches variant of the algorithm always terminates after at most O(|V||E|) steps. Hence, we have the following lemma.

Lemma 3

Let \(\sigma \) be an admissible strategy. One-player strategy improvement will find a best-response against \(\sigma \) after at most O(|V||E|) iterations.

The Algorithm. We can now formally state the algorithm that we will study. Given a strategy \(\sigma \in \varSigma _{\text {Even}}\), let \(\text {All}_\text {Even}(\sigma )\) be the function that implements the greedy all-switches switching rule as described earlier. Moreover, given a pair of strategies \(\sigma \in \varSigma _{\text {Even}}\) and \(\tau \in \varSigma _{\text {Odd}}\), let \(\text {All}_\text {Odd}(\sigma , \tau )\) be a set S of Odd-switchable edges (vu) such that there is no edge \((v, w) \in E\) with \({{\mathrm{Val}}}^{\sigma , \tau }(u) \sqsupset {{\mathrm{Val}}}^{\sigma , \tau }(w)\), and such that each vertex has at most one outgoing edge in S.

figure a

The inner loop computes best responses using one-player strategy improvement, while the outer loop performs the two-player strategy improvement algorithm. Note, in particular, that after switching edges in \(\sigma \), the first Odd strategy considered by the inner loop is the best response to the previous strategy.

5 Parallel Computation of Valuations

Most operations used by strategy improvement can naturally be carried out in parallel. In particular, if we have already computed a valuation, then deciding whether an edge is switchable at a particular vertex v, and finding the switchable edge that has the highest valuation at v, are both local properties that only depend on the outgoing edges of v. So these operations can trivially be carried out in parallel. This leaves the task of computing a valuation as the only task that does not have an obvious parallel algorithm.

In this section, we give an efficient parallel algorithm for computing a valuation. Given two strategies \(\sigma \in \varSigma _{\text {Even}}\) and \(\tau \in \varSigma _{\text {Odd}}\) in a game \(\mathcal {G}\), we show how computing \({{\mathrm{Val}}}^{\sigma , \tau }(v)\) can be parallelized in a work efficient manner. There is an obvious sequential algorithm for this task that runs in time \(O(|V| \cdot |D_\mathcal {G}|)\) which works backwards on the tree defined by \(\sigma \) and \(\tau \) and counts how many times each priority appears on each path to s. Every vertex not found by this procedure must have valuation \(\top \).

List Ranking. The idea of our algorithm is to convert the problem of computing a valuation, into the well-known problem of computing prefix-sums on a linked list, which is known as list ranking. We will then adapt the efficient parallel algorithms that have been developed for this problem.

Given a sequence of integers \(x_0, x_1, x_2, \dots , x_k\), and a binary associative operator \(\oplus \), the prefix-sum problem requires us to compute a sequence of integers \(y_0, y_1, y_2, \dots , y_k\) such that \(y_i = x_1 \oplus x_2 \oplus \dots \oplus x_{i-1}\). If the input sequence is given as an array, then efficient parallel algorithms have long been known [19].

If the input sequence is presented as a linked-list, then the problem is called the list ranking problem, and is more challenging. The first work optimal parallel algorithm for list ranking was given by Cole and Vishkin [4]. However, their algorithm is complex and difficult to implement in practice. Helman and Jájá give a simpler randomized algorithm that is work efficient with high probability [12].

Theorem 4

[12]. There is a randomized algorithm for list ranking that, with high probability, runs in time O(n/p) whenever \(n > p^2 \ln n\), where n denotes the length of the list, and p denotes the number of processors.

We now give a brief overview of the algorithm, as we will later modify it slightly. A full and detailed description can be found in [12]. The algorithm works randomly choosing \(s = \frac{n}{p \log n}\) elements of the list to be splitters. Intuitively, each splitter defines a sublist that begins at the splitter, and ends at the next splitter that is encountered in the list (or the end of the list). These sublists are divided among the processors, and are ranked using the standard sequential algorithm. Once this has been completed, we can create a reduced list, in which each element is a splitter, and the value of each element is the prefix-sum of the corresponding sublist. The reduced list is ranked by a single processor, again using the standard sequential algorithm. Finally, we can complete the list ranking task as follows: if an element e of the list has rank \(x_r\) in its sublist, and the splitter at the start of sublist has rank \(x_s\) in the reduced list, then the rank of e is \(x_s \oplus x_r\).

Pseudoforests and Euler Tours. We now show how the problem of computing a valuation can be reduced to list ranking. Let \(\mathcal {G}^{\sigma , \tau } = (V, V_{\text {Even}}, V_{\text {Odd}}, E^{\sigma , \tau }, {{\mathrm{pri}}})\) be the game \(\mathcal {G}\) in which every edge not used by \(\sigma \) and \(\tau \) is deleted. Since each vertex has exactly one outgoing edge in this game, the partition of V into \(V_{\text {Even}}\) and \(V_{\text {Odd}}\) are irrelevant, and we shall treat \(\mathcal {G}^{\sigma , \tau }\) has a graph labelled by priorities.

First, we observe that \(\mathcal {G}^{\sigma , \tau }\) is a directed pseudoforest. The set of vertices whose valuation is not \(\top \) form a directed tree rooted at s. For these vertices, our task is to count the number of times each priority occurs on each path to the sink, and hence compute a valuation. Each other vertex is part of a directed pseudotree, which is a directed tree in which the root also has exactly one outgoing edge that leads back into the tree. Since we deal only with admissible strategies, every vertex in a pseudotree has valuation \(\top \).

Fig. 1.
figure 1

Converting a tree into a linked list using the Euler tour technique. Left: the original tree. Right: the corresponding linked list.

A standard technique for reducing problems on trees to list ranking is the Euler tour technique. We will describe this technique for the tree rooted at s, and show that it can be used to compute a valuation. We will also use the same technique for the other pseudo-trees in the graph, but since this portion of the algorithm is not standard, we defer the description until later.

In order to compute a valuation for every vertex v in the tree rooted at s, we need to count the number of times that a given priority p occurs on the path from v to the root. We create a linked list as follows. First we replace each directed edge (vu) with two edges (vu) and (uv). Then we select an Euler tour of this modified graph that starts and ends at the root. We use this tour to create a linked list, in which each element of the list an edge of the original tree, and the successors of each element are determined by the Euler tour. The value associated with each element e is defined as follows:

  • If \(e = (u, v)\), then the value of e is 1 if \({{\mathrm{pri}}}(v) = p\), and 0 otherwise.

  • If \(e = (v, u)\), then the value of e is \(-1\) if \({{\mathrm{pri}}}(v) = p\), and 0 otherwise.

If we then compute a list ranking on this list using \(+\) as the operator \(\oplus \), then the ranking of (vu) gives the number of times p appears on the path from v to the sink. Obviously, to compute a valuation we must do the above procedure in parallel for each priority in the game.

Formal Reduction to List Ranking. We now give a formal definition of the technique that we just described. Recall that \(E^{\sigma , \tau }\) gives the edges chosen by \(\sigma \) and \(\tau \). We define

$$\begin{aligned} \overleftarrow{E}^{\sigma ,\tau } = \{(u, v):\,(v, u) \in E^{\sigma , \tau }\}, \end{aligned}$$

to be the set of reversed edges. We call each edge in \(E^{\sigma , \tau }\) an up edge, since it moves towards the root, and correspondingly we call each edge in \(\overleftarrow{E}^{\sigma ,\tau }\) a down edge. The set of elements in our linked list will be \(L = E^{\sigma , \tau } \cup \overleftarrow{E}^{\sigma ,\tau }\).

Next we define the successor function \({{\mathrm{succ}}}: L \rightarrow L \cup \{\epsilon \}\), which gives the structure of the list, and where \(\epsilon \) is used to denote the end of the list. To do this, we take an arbitrary Euler tour of the tree, and define \(\succ \) to be the function that follows this tour. Figure 1 gives an example of this construction.

In our overview, we described how to use list ranking to compute the number of times a given priority p appears on the path to the sink. In our formal definition, we will in fact compute a full valuation with a single call to a list ranking algorithm. To achieve this, we define the weight function \(w: L \rightarrow \text {Vals}_\mathcal {G}\) as follows. For each priority \(p \in D_\mathcal {G}\), we first define two valuations \(A_p, A_{-p} \in \text {Vals}_\mathcal {G}\) so that, for every \(q \in D_\mathcal {G}\):

$$\begin{aligned} A_p(q) = {\left\{ \begin{array}{ll} 1 &{} \text {if}\ \ q = p,\\ 0 &{} \text {otherwise.} \end{array}\right. }&A_{-p}(q) = {\left\{ \begin{array}{ll} -1 &{} \text {if}\ \ q = p,\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Then, for every list element \(e = (u, v) \in L\): if e is an up edge then we set \(w(e) = A_{-{{\mathrm{pri}}}(u)}\), and if e is a down edge then we set \(w(e) = A_{{{\mathrm{pri}}}(v)}\). Moreover, we define the binary operator \(\oplus \) as follows. Given two valuations \(A_1, A_2\), we define \(A_1 \oplus A_2 = A_3\) where for every priority \(p \in D_\mathcal {G}\) we have \(A_3(p) = A_1(p) + A_2(p)\).

Modifications to the Helman-Jájá Algorithm. We must also handle the vertices that lie in pseudotrees. Our reduction turns every pseudotree into a pair of cycles. The Helman-Jájá algorithm can be adapted to deal with these, by ensuring that if a cycle is found in the reduced list, then all vertices on it are given a valuation of \(\top \). Moreover, some vertices may not be part of a reduced list, because they may be part of a small pseudotree, and none of random splitters were in that pseudotree. Since the Helman-Jájá always picks the head of the list to be a splitter (in our case this would be an edge leaving the sink at the start of the Euler tour), every vertex in the tree rooted at s is in the reduced list. So any vertex not part of a reduced list can be assigned valuation \(\top \).

Constructing the List Ranking Instance in Parallel. Since at least one of \(\sigma \) and \(\tau \) will change between every iteration, we must construct a new list ranking instance in every iteration of our algorithm. Thus, in order to have a true parallel algorithm, we must be able to carry out the reduction in parallel as well.

We start by describing a sequential algorithm for the task. Each vertex in the tree maintains two pointers \(\mathsf {start}_{v}\) and \(\mathsf {end}_{v}\). Initially, \(\mathsf {start}_{v}\) points to the down edge of v, and \(\mathsf {end}_{v}\) points to the up edge of v. Then, in an arbitrary order, we process each vertex v, and do the following:

  1. 1.

    Determine the parent of v in the tree, and call it u.

  2. 2.

    Connect the list element pointed to by \(\mathsf {start}_{u}\) to the element pointed to by \(\mathsf {start}_{v}\).

  3. 3.

    Set \(\mathsf {start}_{u}= \mathsf {end}_{v}\).

Once this has been completed, we then join the list element pointed to by \(\mathsf {start}_{v}\) to the list element pointed to by \(\mathsf {end}_{v}\), for all vertices v.

Intuitively, this algorithm builds the tour of each subtree incrementally. The second step adds the tour of the subtree starting at v to the linked list associated with u. The third step ensures that any further children of u will place their tours after the tour of the subtree of v.

For example, let us consider the tree and corresponding Euler tour given in Fig. 1, and let us focus on the vertex b. Initially, \(\mathsf {start}_{v}\) points to (ab), while \(\mathsf {end}_{v}\) points to (ba), which are the down and up edges of b, respectively. Let us suppose that d is processed before e. When d is processed, (ab) is connected to (bd) and \(\mathsf {start}_{v}\) is updated to point to (db). Subsequently, when e is processed (db) is connected to (be), and \(\mathsf {start}_{v}\) is updated to point to (eb). Then, in the final step of the algorithm (bd) is connected to (db) and (eb) is connected to (ba). So, the linked list corresponding to the subtree of b (shown on the right in Fig. 1) is created. Note that if e was processed before d, then a different linked list would be created, which would correspond to a different Euler tour of the tree. From the point of view of the algorithm, it is not relevant which Euler tour is used to construct the linked list.

In theory, this algorithm can be carried out in parallel in O(n / p) time and O(np) space by having each processor maintain its own copy of the pointers \(\mathsf {start}_{v}\) and \(\mathsf {end}_{v}\), and then after the algorithm has been completed, merging the p different sublists that were created.

In practice, the space blow up can be avoided by using atomic exchange operations, which are available on both CPU and GPU platforms. More precisely, we can use an atomic exchange instruction to set \(\mathsf {start}_{u}= \mathsf {end}_{v}\), while copying the previous value of \(\mathsf {start}_{u}\) to a temporary variable, and then connect the list element that was pointed to by \(\mathsf {start}_{u}\) to \(\mathsf {start}_{v}\).

6 Experimental Results

Experimental Setup. Our experimental study uses four implementations.

  • GPU-LR: a GPU implementation that uses the list-ranking algorithm to compute valuations. The GPU is responsible for ranking the sublists, while ranking the reduced list is carried sequentially on the CPU.

  • CPU-Seq: a single-threaded implementation that uses the natural sequential algorithm for computing valuations.

  • CPU-LR: a multi-threaded CPU implementation that uses the list-ranking algorithm to compute valuations. The sublists are ranked in parallel, while the reduced lists is ranked by a single thread.

  • Bellman-Ford: a single-threaded CPU implementation that uses the Bellman-Ford algorithm to compute best responses.

All implementations are in C++, and the GPU portions are implemented using NVIDIA CUDA. The code is publicly availableFootnote 2. We also compare our results to PGSolver’s recursive algorithm, with all of PGSolver’s heuristics disabled in order to deliver a fair comparison. We chose the recursive algorithm because it was found to be the most competitive in the previous experimental study of Friedmann and Lange [10].

Table 1. The games that we consider in our experimental study. The table displays the number of vertices, player Even vertices, player Odd vertices, edges, and distinct priorities.

For our benchmark games we utilise the suite that was recently developed by Keiren [18]. This provides a wide array of parity games that have been used throughout the literature for model-checking and equivalence checking. Since there are over 1000 games, we have selected a subset of those games to use here, and these are shown in Table 1. In particular, we have chosen a set of games that span a variety of sizes, and that cover a variety of tasks from verification. We found that strategy improvement solves many of the games in the suite in a very small number of iterations, so the results that we present here focus on the games upon which strategy improvement takes the largest number of iterations. The vast majority of the games in the suite have between 2 and 4 priorities, and the ones that do not are artificially constructed (eg. random games), so we believe that our sample is representative of real world verification tasks.

The test machine has an Intel Core i7-4770K CPU, clocked at 3.50 GHz (3.90 GHz boost), with 4 physical cores, and 16 GB of RAM. The GPU is an NVIDIA GeForce GTX 780, which has 2304 CUDA cores clocked at 1.05 GHz and 3 GB of RAM. At the time of purchase in 2013, the CPU cost £248.20 and the GPU cost £444.94. Since the CPU has 8 logical cores with hyper-threading enabled, we use 8 threads in our CPU multi-threaded implementations. When benchmarking for time, we ran each instance three times, and the reported results are the average of the three. We implemented a time limit of 10 min. We only report the amount of time needed to solve the game, discarding the time taken to parse the game.

Table 2. Experimental results comparing the algorithm used to compute a best response. The algorithms are (1) SI: one-player strategy improvement (2) SI (Reset): one-player strategy improvement starting from an arbitrary strategy (3) Bellman-Ford. For each algorithm we report the total time and the total number of iterations used by the best response algorithm.

Best Response Algorithms. Our first experiment is to determine which method for computing best responses is faster in practice. In this experiment we compare the single-core sequential implementation of one-player strategy improvement (SI) against a single-core sequential implementation of the Bellman-Ford algorithm.

As we have mentioned, our one-player strategy improvement starts with the optimal strategy against the previous strategy of the improver. To quantify the benefit of this, we have also include results for a version of the one-player strategy improvement algorithm that, at the start of each best response computation, resets to the initial arbitrarily chosen strategy. We refer to this as SI-Reset.

The results are displayed in Table 2. We only report results for games that Bellman-Ford solved within the 10 min time limit. We report the total number of major iterations, which are the iterations in which the improver’s strategy is switched. The number of major iterations does not depend on the algorithm used to compute best responses. For each algorithm we report the overall time and the total number of iterations used computing best responses.

Before discussing the results in detail we should first note that these results paint a very positive picture for strategy improvement. All games were solved in at most 77 major iterations, with most being solved with significantly fewer major iterations. The number of iterations used on the Election instance was the most that we saw over any instance in our study, including those that we do not report here. This clearly shows that strategy improvement can scale to very large instances.

Moving on to the choice of best response algorithm, the most striking feature is that Bellman-Ford is on average 8.43 times slower than one-player strategy improvement (min 3.80, max 16.42). Some of this difference can be explained by the fact that Bellman-Ford is on average 1.72 times slower per iteration than one-player strategy improvement (min 1.11, max 2.38), which may be due to implementation inefficiencies. But most of the difference is due to the fact that Bellman-Ford uses on average 5.30 times more iterations than one-player strategy improvement (min 1.80, max 10.60).

The results with SI-Reset show that only some of this difference can be attributed to reusing the previous best response as SI-Reset uses on average 2.05 times more iterations than SI (min 1.00, max 4.11). Overall we found that SI used an average of 5.49 iterations to compute each best response (min 1.0 max 9.9), which again indicates that this method can scale to very large games.

Table 3. Experimental results comparing the running time of (1) GPU-LR: list ranking on the GPU (2) CPU-Seq: a sequential CPU implementation (3) CPU-LR: list ranking on a 4-core CPU (4) PGSolver: the recursive algorithm from PGSolver. \(\dagger \) indicates a failure due to lack of memory.

Parallel Implementations. Our second set of experimental results concerns our parallel implementation of strategy improvement when best responses are computed by one-player strategy improvement. The results are displayed in Table 3.

The first thing to note is that the parallel algorithm does not deliver good performance when implemented on a CPU. On average the multi-threaded CPU list ranking algorithm was 1.25 times slower than the single-threaded sequential algorithm (min 0.88, max 1.77). This can be partially explained by the fact that the total amount of work done by the parallel algorithm is at least twice the amount of work performed by the sequential algorithm, since turning the strategy into a linked list doubles the number of vertices.

On the other hand, the GPU implementation delivers a significant speedup. To give a fair comparison between the GPU implementation and the CPU implementations, we compute the ratio between the time taken by GPU-LR, and the minimum of the times taken by CPU-Seq and CPU-LR. Using this metric we find that the average speedup is 10.37 (min 5.54, max 14.21). The average speedup increases to 12.17 if we discard instances with fewer than 1 million edges, where setup overhead makes the GPU algorithm less competitive.

The downside to the GPU implementation is that games with more than about 32 million edges are too large to fit within the 3 GB of memory on our test GPU. Obviously, there is a cost trade off between the extra speed delivered by a GPU and the cost of purchasing a GPU with enough memory. At the time of writing, relatively cheap consumer graphics cards can be bought with up to 8 GB of memory, while expensive dedicated compute cards are available with up to 24 GB of memory.

Finally, we compare our results to PGSolver’s recursive algorithm. Here, to have a fair comparison, we should compare with the sequential CPU algorithm, as both algorithms are single-threaded. Unfortunately PGSolver ran out of memory for the very large games in our test set, but for the smaller games it can be seen that CPU-Seq is always competitive, and in many cases significantly faster than PGSolver.