Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Automatic verification and strategy synthesis are techniques for analysing probabilistic systems. They can be used to produce formal guarantees with respect to quantitative properties such as safety, reliability and efficiency. For example, they can be employed to synthesise controllers in applications such as autonomous vehicles, network protocols and robotic systems. These often operate in uncertain and adverse environments, models of which require both stochasticity, e.g., to represent noise, failures or delays, and game-theoretic aspects, to model non-cooperative agents or uncontrollable events.

PRISM-games is a tool for verification and strategy synthesis for turn-based stochastic multi-player games. The original version focused on model checking for the temporal logic rPATL [7], used to express zero-sum properties in which two opposing sets of players aim to minimise or maximise a single objective: either the probability of an event or the expected reward accumulated before it occurs. It has been successfully applied to, for example, autonomous driving, self-adaptive systems, computer security and user-centric networks [16].

In this paper, we present PRISM-games 2.0, which significantly extends functionality in several directions. First, it supports strategy synthesis for long-run properties, such as average (mean-payoff) and ratio rewards. This provides the ability to express properties of systems that run autonomously for long periods of time, and to specify measures such as energy consumption per time unit.

Secondly, a key new area of functionality is support for multi-objective properties, which enables the exploration of trade-offs, such as between performance and resource requirements. Specifically, we allow Boolean combinations of objectives expressed as expected total rewards (for stopping games), and expected mean-payoffs or ratios of expected mean-payoffs (in so-called controllable multichain games), as well as conjunctions of almost sure satisfaction for mean-payoffs and ratio rewards (in general games). The tool also performs computation and visualisation of the Pareto sets representing the optimal achievable trade-offs.

Thirdly, PRISM-games 2.0 facilitates compositional system development. This is done through assume-guarantee strategy synthesis, based on contracts over component interfaces that ensure cooperation between the components to achieve a common goal. For example, if one component satisfies the goal B under an assumption A on its environment (i.e. \(A \rightarrow B\)), while the other component ensures that the assumption A is satisfied, we can compose strategies for the components into a strategy for the full system achieving B. Multi-objective strategy synthesis, e.g., for an implication \(A \rightarrow B\), can be conveniently employed to realise such assume-guarantee contracts. Again, Pareto set computation can be performed to visualise the relationship between properties and across interfaces.

In this paper, we summarise the algorithms and implementation [2, 3, 8, 9, 15] behind the new functionality in PRISM-games 2.0, describe its usage in the tool, and illustrate the benefits it brings with results from four case studies drawn from autonomous systems and energy management.

Related Tools. For stochastic games, there is support for qualitative verification in GIST [6] and partial support in the general purpose game solver GAVS+ [10], but there are no tools for multi-objective or compositional analysis. Multi-objective verification for the simpler model of Markov decision processes is available in PRISM [13] (for LTL and expected total reward objectives) and MultiGain [4] (for mean-payoff objectives), but not for stochastic games. Analysis of Nash equilibria (which also balance contrasting objectives for different players) can be performed with EAGLE [14] or PRALINE [5], but only for non-stochastic games. Lastly, Uppaal Stratego [11] performs strategy synthesis against quantitative properties, but with a focus on real-time systems.

2 Modelling and Property Specification Languages

Compositional Modelling. PRISM-games supports action-labelled turn-based stochastic games (henceforth often simply called games), which are specified in an extension of the native PRISM modelling language [13]. Version 2.0 adds a compositional modelling approach to facilitate assume-guarantee strategy synthesis for 2-player stochastic games. A top-level system consists of several subsystems (component games), which are combined using the game composition operator introduced in [3]. This composition synchronises on shared actions, and actions controlled by Player 1 in subsystems are controlled by Player 1 in the composition, thus enabling composition of the synthesised Player 1 strategies.

Each subsystem consists of a set of modules, which are combined using the original parallel composition of PRISM-games (which ignores player identity). Transitions of modules are specified using guarded commands, optionally labelled with action names (but omitted for non-synchronising transitions). Transitions may be assigned to different players in different subsystems. This is done by tagging an action name \({\texttt {a}}\) with ! or ?, where \([{\texttt {a}}!]\) assigns the \({\texttt {a}}\)-transition to Player 1 and \([{\texttt {a}}?]\) to Player 2. No state can have outgoing transitions labelled by both ! and ? since we work with turn-based games. Figure 1 shows a model for a system consisting of two subsystems.

Fig. 1.
figure 1

A PRISM-games 2.0 model of a multi-component multi-objective game.

Property Specifications. PRISM-games focuses on strategy synthesis for stoch-astic multi-player games, i.e., finding player strategies that satisfy some winning condition, irrespective of the (finite) strategies of any other players in the game. PRISM-games 2.0 adds multi-objective queries (MQs): Boolean combinations of reward-based objectives. Rewards are specified by a reward structure that assigns real-valued rewards to transitions of a game (see Fig. 1, right-hand side). We can reason about total reward (indefinitely cumulated rewards), mean payoff (long-run average reward), or the long-run ratio of two rewards. We also support ratios of expected mean-payoffs; expected ratios are synthesised soundly, but not necessarily completely using almost-sure satisfaction of ratio rewards. An objective sets a target \({\texttt {v}}\) for a reward value to be exceeded (\(\ge \)) or upper-bounded (\(\le \)). Objectives for the expected mean-payoff of \({\texttt {r}}\), expected total reward of \({\texttt {r}}\), and ratio of expected rewards of \({\texttt {r}}\) and \({\texttt {c}}\) are expressed, respectively, as \(\mathtt {R}\{ \text {``}{\texttt {r}}\text {''}\}_{\ge {\texttt {v}}} [ \mathtt {S} ]\), \(\mathtt {R}\{ \text {``}{\texttt {r}}\text {''}\}_{\ge {\texttt {v}}} [ \mathtt {C} ]\), and \(\mathtt {R}\{\text {``}{\texttt {r}}\text {''}/\text {``}{\texttt {c}}\text {''}\}_{\ge {\texttt {v}}} [ \mathtt {S}]\), where we use \(\mathtt {S}\) and \(\mathtt {C}\) to denote long-run and cumulative rewards. Almost-sure satisfaction objectives for mean-payoff and ratio rewards are written \(\mathtt {P}_{\ge 1}[ \mathtt {R}(\mathtt {path})\{\text {``}{\texttt {r}}\text {''}\}_{\ge {\texttt {v}}} [ \mathtt {S} ] ]\) and \(\mathtt {P}_{\ge 1} [ \mathtt {R}(\mathtt {path})\{\text {``}{\texttt {r}}\text {''}/\text {``}{\texttt {c}}\text {''}\}_{\ge {\texttt {v}}} [ \mathtt {S} ] ]\). Objectives in an MQ must be of the same type and are combined with the standard Boolean connectives (\(\wedge \), \(\vee \), \(\rightarrow \), \(\lnot \)), but almost-sure satisfaction objectives are only allowed in conjunctions. In the style of rPATL, we use \(\langle \!\langle \textsf {coalition} \rangle \!\rangle \) to denote synthesis of strategies for the player(s) in \(\textsf {coalition}\). The following are examples of MQs synthesising strategies for player 1 in a game:

  • \(\langle \!\langle 1 \rangle \!\rangle ( \mathtt {R}\{\text {``}{\texttt {packets\_in}}\text {''}/\text {``}{\texttt {time}}\text {''}\}_{\le {\texttt {v}}_1} [ \mathtt {S} ] \; {\rightarrow } \; \mathtt {R}\{\text {``}{\texttt {served}}\text {''}/\text {``}{\texttt {time}}\text {''}\}_{\ge {\texttt {v}}_2} [ \mathtt {S} ] )\) – assuming the expected rate of incoming network packets is at most \({\texttt {v}}_1\), the expected rate of serving submitted requests is guaranteed to be at least \({\texttt {v}}_2\).”

  • \(\langle \!\langle 1 \rangle \!\rangle (\mathtt {R}\{\text {``}{\texttt {passengers}}\text {''}\}_{\ge {\texttt {v}}_1} [ \mathtt {C} ] \; \wedge \; \mathtt {R}\{\text {``}{\texttt {fuel}}\text {''}\}_{\le {\texttt {v}}_2} [ \mathtt {C} ])\) – “the expected number of passengers transported is at least \({\texttt {v}}_1\), while simultaneously ensuring that the expected fuel consumption is at most \({\texttt {v}}_2\).”

Fig. 2.
figure 2

Pareto sets for the games in Fig. 1, with property specifications beneath the respective sets. On the right is the compositional Pareto set \(P'\). The global target is \(({\texttt {v}}_2,{\texttt {v}}_3) = (\frac{3}{4},\frac{9}{8})\), and the local targets can be seen to be consistent with \({\texttt {v}}_1 = \frac{1}{4}\).

3 Multi-objective Strategy Synthesis

PRISM-games 2.0 implements the multi-objective strategy synthesis methods formulated in [2, 8, 9], at the heart of which is a fixpoint computation of the sets of achievable targets for multiple reward objectives. For expected total rewards, games must be stopping, i.e., terminal states with zero reward must be reached almost surely under all strategies [8]. For expected long-run objectives, games must be controllable multichain, i.e., the sets of states that can occur in any maximal end component are almost surely reachable [15].

MQs with objectives of all types are converted into a unified fixpoint computation. In particular, Boolean combinations of expectation objectives are converted to conjunctions by selecting appropriate weights for the individual objectives [8]. Then, at each state of the game we iteratively compute polytopic sets of achievable vectors, with each dimension corresponding to one objective. Performance can be improved by computing successive polytopes using in-place (i.e. Gauss-Seidel) updates, as well as rounding the corners of the polytopes at every iteration (which comes at the cost of precision) [9]. We then construct succinct strategies with stochastic memory updates, that win by maintaining the target below the expected value of the memory elements, which are the extreme points of the polytopes at the respective states.

The implementation uses the Parma Polyhedra Library [1] for symbolic manipulation of convex sets. Stochastic games are stored in an explicit-state fashion and analysed using an extension of PRISM’s Java-based “explicit” engine.

Pareto Sets. An MQ is achievable for all targets in the achievable set; its frontier is the Pareto set, containing the targets that cannot be improved in any direction without degrading another. The achievable set for Boolean combinations is the union of the convex achievable sets obtained for the respective weights. The Pareto sets can be visualised by the user selecting two-dimensional slices.

Table 1. Performance. The forward slash (/) indicates values for separate components.

4 Compositional Strategy Synthesis

We leverage assume-guarantee verification rules for probabilistic automata (i.e., games with only a single player) for assume-guarantee strategy synthesis in two-player games [3]. Given a system \(\mathcal {G}\) composed of subsystems \(\mathcal {G}_1\), \(\mathcal {G}_2\), \(\ldots \), a designer supplies respective local property specifications \(\varphi _1\), \(\varphi _2\), \(\ldots \) via the construct comp( \(\varphi _1\) , \(\varphi _2\) , ...). By synthesising local strategies \(\pi _i\) for \(\mathcal {G}_i\) satisfying \(\varphi _i\), a global strategy \(\pi \) can be constructed for \(\mathcal {G}\). Using assume-guarantee rules, one can then derive a global property \(\varphi \) for \(\mathcal {G}\) that is satisfied by \(\pi \). The rules require fairness conditions, and we write \(\mathcal {G}^\pi \models ^{\textsf {u}} \varphi \) if the Player 1 strategy \(\pi \) satisfies \(\varphi \) against all unconditionally fair Player 2 strategies. For example, the rule:

$$\begin{aligned} \frac{ \mathcal {G}_1^{\pi _1} \models ^{\textsf {u}} \varphi ^A \quad \mathcal {G}_2^{\pi _2} \models ^{\textsf {u}} \varphi ^A \rightarrow \varphi ^G }{ (\mathcal {G}_1 \parallel \mathcal {G}_2)^{\pi _1 \parallel \pi _2} \models ^{\textsf {u}} \varphi ^G } {\textsc {(Asym)}} \end{aligned}$$

states that Player 1 wins with strategy \(\pi _1 \parallel \pi _2\) for \(\varphi ^G\) in the top-level system if \(\pi _2\) in \(\mathcal {G}_2\) achieves \(\varphi ^G\) under the contract \(\varphi ^A \rightarrow \varphi ^G\), and \(\pi _1\) in \(\mathcal {G}_1\) satisfies \(\varphi ^A\). Reward structures in shared objectives may only involve synchronised actions.

Compositional Pareto Sets. We compositionally compute a Pareto set for the property \(\varphi \) of the top-level system, which is an under-approximation of the Pareto set computed directly on the monolithic system. For a target in the compositional Pareto set, the targets for the local property specifications \(\varphi _i\) can be derived, so that the local strategies can be synthesised (see Fig. 2).

5 Case Studies and Tool Availability

We illustrate the new functionality in PRISM-games 2.0 with four case studies, as follows. “UAV”: we compute Pareto sets for a UAV performing reconnaissance of roads, reacting to inputs from a human operator, under a conjunction of expected total rewards [12]. “AD(V)”: we synthesise a strategy to steer an autonomous car through a village V, reacting to its environment such as pedestrians, or traffic jams, under a conjunction of expected total rewards [9]. “Power”: we maximise uptime of two components in an aircraft electrical power network, reacting to generator failures and switch delays d; each component has a conjunction of almost-sure satisfaction of ratio rewards. We use assume-guarantee strategy synthesis for two model variants, with (resp. without) modelling an interface, denoted Power\(^+\)(d) (resp. Power(d)) [2]. “Temp”: we control the temperature in three adjacent rooms, reacting to the outside temperature and whether windows are opened; and use Boolean combinations of expected ratios. We use assume-guarantee strategy synthesis for two model variants, denoted Temp(w) and Temp(v) [15]. Table 1 summarises the tool’s performance on these case studies on a 2.8 GHz PC with 32 GB RAM. We observe that scalability mostly depends on the number of objectives, the state space size and accuracy, but our compositional approach greatly increases the viable state space sizes.

PRISM-games 2.0 is open source, released under GPL, available from [16].