SecurityAware Synthesis Using DelayedAction Games
Abstract
Stochastic multiplayer games (SMGs) have gained attention in the field of strategy synthesis for multiagent reactive systems. However, standard SMGs are limited to modeling systems where all agents have full knowledge of the state of the game. In this paper, we introduce delayedaction games (DAGs) formalism that simulates hiddeninformation games (HIGs) as SMGs, where hidden information is captured by delaying a player’s actions. The elimination of private variables enables the usage of SMG offtheshelf model checkers to implement HIGs. Furthermore, we demonstrate how a DAG can be decomposed into subgames that can be independently explored, utilizing parallel computation to reduce the model checking time, while alleviating the state space explosion problem that SMGs are notorious for. In addition, we propose a DAGbased framework for strategy synthesis and analysis. Finally, we demonstrate applicability of the DAGbased synthesis framework on a case study of a humanontheloop unmannedaerial vehicle system under stealthy attacks, where the proposed framework is used to formally model, analyze and synthesize securityaware strategies for the system.
1 Introduction
Stochastic multiplayer games (SMGs) are used to model reactive systems where nondeterministic decisions are made by multiple players [4, 13, 23]. SMGs extend probabilistic automata by assigning a player to each choice to be made in the game. This extension enables modeling of complex systems where the behavior of players is unknown at design time. The strategy synthesis problem aims to find a winning strategy, i.e., a strategy that guarantees that a set of objectives (or winning conditions) is satisfied [6, 21]. Algorithms for synthesis include, for instance, value iteration and strategy iteration techniques, where multiple rewardbased objectives are satisfied [2, 9, 17]. To tackle the statespace explosion problem, [29] presents an assumeguarantee synthesis framework that relies on synthesizing strategies on the component level first, before composing them into a global winning strategy. Meanpayoffs and ratio rewards are further investigated in [3] to synthesize \(\varepsilon \)optimal strategies. Formal tools that support strategy synthesis via SMGs include PRISMgames [7, 19] and Uppaal Stratego [10].
SMGs are classified based on the number of players that can make choices at each state. In concurrent games, more than one player is allowed to concurrently make choices at a given state. Conversely, turnbased games assign one player at most to each state. Another classification considers the information available to different players across the game [27]. Completeinformation games (also known as perfectinformation games [5]) grant all players complete access to the information within the game. In symmetric games, some information is equally hidden from all players. On the contrary, asymmetric games allow some players to have access to more information than the others [27].
This work is motivated by securityaware systems in which stealthy adversarial actions are potentially hidden from the system, where the latter can probabilistically and intermittently gain full knowledge about the current state. While hiddeninformation games (HIGs) can be used to model such systems by using private variables to capture hidden information [5], standard model checkers can only synthesize strategies for (fullinformation) SMGs; thus, demanding for alternative representations. The equivalence between turnbased semiperfect information games and concurrent perfectinformation games was shown [5]. Since a player’s strategy mainly rely on full knowledge of the game state [9], using SMGs for synthesis produces strategies that may violate synthesis specifications in cases where required information is hidden from the player. Partiallyobservable stochastic games (POSGs) allow agents to have different belief states by incorporating uncertainty about both the current state and adversarial plans [15]. Techniques such as active sensing for online replanning [14] and gridbased abstractions of belief spaces [24] were proposed to mitigate synthesis complexity arising from partial observability. The notion of delaying actions has been studied as means for gaining information about a game to improve future strategies [18, 30], but was not deployed as means for hiding information.
To this end, we introduce delayedaction games (DAGs)—a new class of games that simulate HIGs, where information is hidden from one player by delaying the actions of the others. The omission of private variables enables the use of offtheshelf tools to implement and analyze DAGbased models. We show how DAGs (under some mild and practical assumptions) can be decomposed into subgames that can be independently explored, reducing the time required for synthesis by employing parallel computation. Moreover, we propose a DAGbased framework for strategy synthesis and analysis of securityaware systems. Finally, we demonstrate the framework’s applicability through a case study of securityaware planning for an unmannedaerial vehicle (UAV) system prone to stealthy cyber attacks, where we develop a DAGbased system model and further synthesize strategies with strong probabilistic security guarantees.
The paper is organized as follows. Section 2 presents SMGs, HIGs, and problem formulation. In Sect. 3, we introduce DAGs and show that they can simulate HIGs. Section 4 proposes a DAGbased synthesis framework, which we use for securityaware planning for UAVs in Sect. 5, before concluding the paper in Sect. 6.
2 Stochastic Games
In this section, we present turnbased stochastic games, which assume that all players have full information about the game state. We then introduce hiddeninformation games and their privatevariable semantics.
Notation. We use \(\mathbb {N}_0\) to denote the set of nonnegative integers. \(\mathcal {P}(A)\) denotes the powerset of A (i.e., \(2^A\)). A variable v has a set of valuations \( Ev \left( v\right) \), where \(\eta \left( v\right) \in Ev \left( v\right) \) denotes one. We use \(\varSigma ^*\) to denote the set of all finite words over alphabet \(\varSigma \), including the empty word \(\epsilon \). The mapping Open image in new window indicates the effect of a finite word on \(\eta \left( v\right) \). Finally, for general indexing, we use \(s_i\) or \(s^{(i)}\), for \(i\in \mathbb {N}_0\), while \(\text{ PL }_\gamma \) denotes Player \(\gamma \).
TurnBased Stochastic Games (SMGs). SMGs can be used to model reactive systems that undergo both stochastic and nondeterministic transitions from one state to another. In a turnbased game,^{1} actions can be taken at any state by at most one player. Formally, an SMG can be defined as follows [1, 28, 29].
Definition 1

S is a finite set of states, partitioned into \(S_\mathrm{I}\), \(S_\mathrm{II}\) and \(S_\bigcirc \);

\(A \!=\!A_\mathrm{I} \cup A_\mathrm{II} \cup \{ \tau \} \) is a finite set of actions where \(\tau \) is an empty action;

\(s_0 \in S_\mathrm{II}\) is the initial state; and

\(\delta : S \times A \times S \rightarrow [0,1]\) is a transition function, such that \(\delta (s,a,s') \in \{1,0\}\), \(\forall s \in S_\mathrm{I} \cup S_\mathrm{II}, a\in A \) and \(s' \in S\), and \(\delta (s,\tau ,s') \in \left[ 0,1\right] ,~\forall s \in S_\bigcirc \) and \(s' \in S_\mathrm{I} \cup S_\mathrm{II}\), where \(\sum _{s' \in S_\mathrm{I} \cup S_\mathrm{II}}{\delta (s,\tau ,s')} = 1\) holds.
For all \(s \!\in S_\mathrm{I} \cup S_\mathrm{II}\) and \(a \in A_\mathrm{I} \cup A_\mathrm{II} \), we write Open image in new window if \(\delta (s,a,s') \!=\!1\). Similarly, for all \(s \!\in \! S_{\bigcirc }\) we write Open image in new window if \(s^\prime \) is randomly sampled with probability \(p \!=\!\delta (s,\tau ,s')\).
HiddenInformation Games. SMGs assume that all players have full knowledge of the current state, and hence provide perfectinformation models [5]. In many applications, however, this assumption may not hold. A great example are securityaware models where stealthy adversarial actions can be hidden from the system; e.g., the system may not even be aware that it is under attack. On the other hand, hiddeninformation games (HIGs) refer to games where one player does not have complete access to (or knowledge of) the current state. The notion of hidden information can be formalized with the use of private variables (PVs) [5]. Specifically, a game state can be encoded using variables \(v_\mathcal {T}\) and \(v_\mathcal {B}\), representing the true information, which is only known to PL_{I}, and PL_{II} belief, respectively.
Definition 2

set of states \(S \subseteq Ev \left( v_\mathcal {T}\right) \times Ev \left( v_\mathcal {B}\right) \times \mathcal {P} \left( Ev \left( v_\mathcal {T}\right) \right) \times \varGamma \), partitioned in \( S_\mathrm{I}, S_\mathrm{II}\), \( S_\bigcirc \);

\(A \!=\!A_\mathrm{I} \cup A_\mathrm{II} \cup \{ \tau , \theta \} \) is a finite set of actions, where \(\tau \) denotes an empty action, and \(\theta \) is the action capturing PL_{II} attempt to reveal the true value \(v_\mathcal {T}\);

\(s_0 \in S_\mathrm{II}\) is the initial state;

\(\beta :A_\mathrm{II} \!\rightarrow \! \mathcal {P}\! \left( A_I \right) \) is a function that defines the set of available PL_{I} actions, based on PL_{II} action; and

\( \delta :S \times A \times S \rightarrow [0,1] \) is a transition function such that \( \delta ( s_\mathrm{I},a, s_\bigcirc )=\) \( \delta ( s_\bigcirc ,a, s_\mathrm{I}) = 0\), and \(\delta (s_\mathrm{II},\theta , s_\bigcirc )\), \( \delta ( s_\mathrm{II},a, s_\mathrm{I})\), \( \delta ( s_\mathrm{I},a, s_\mathrm{II}) \in \left\{ 0, 1 \right\} \) for all \( s_\mathrm{I} \in S_\mathrm{I}\), \( s_\mathrm{II} \in S_\mathrm{II}\), \( s_\bigcirc \in S_\bigcirc \) and \(a \in A\), where \(\sum _{s^\prime \in S_\mathrm{II}}{\delta (s_\bigcirc ,\tau ,s')} \!=\!1\).
In the above definition, \(\delta \) only allows transitions \(s_\mathrm{I}\) to \(s_\mathrm{II}\), \(s_\mathrm{II}\) to \(s_\mathrm{I}\) or \(s_\bigcirc \), with \(s_\mathrm{II}\) to \(s_\bigcirc \) conditioned by action \(\theta \), and probabilistic transitions \(s_\bigcirc \) to \(s_\mathrm{II}\). A game state can be written as \(s = \left( t, u, \varOmega , \gamma \right) \), but to simplify notation we use \(s_{\gamma } \left( t, u, \varOmega \right) \) instead, where \(t \in Ev \left( v_\mathcal {T}\right) \) is the true value of the game, \(u \in Ev \left( v_\mathcal {B}\right) \) is PL_{II} current belief, \(\varOmega \in \mathcal {P}({ Ev \left( v_\mathcal {T}\right) }) \setminus \{ \emptyset \} \) is PL_{II} belief space, and \(\gamma \in \varGamma \) is the current player’s index. When the truth is hidden from PL_{II}, the belief space \(\varOmega \) is the information set [27], capturing PL_{II} knowledge about the possible true values.
Example 1
(Belief vs. True Value). Our motivating example is a system that consists of a UAV and a human operator. For localization, the UAV mainly relies on a GPS sensor that can be compromised to effectively steer the UAV away from its original path. While aggressive attacks can be detected, some may remain stealthy by introducing only bounded errors at each step [16, 20, 22, 26]. For example, Fig. 1 shows a UAV (PL_{II}) occupying zone A and flying north (N). An adversary (PL_{I}) can launch a stealthy attack targeting its GPS, introducing a bounded error (NE, NW) to remain stealthy. The set of stealthy actions available to the attacker depends on the preceding UAV action, which is captured by the function \(\beta \), where \(\beta (\mathsf {N}) \!=\!\lbrace \mathsf {NE}, \mathsf {N}, \mathsf {NW}\rbrace \). Being unaware of the attack, the UAV believes that it is entering zone C, while the true new location is D due to the attack (\(\mathsf {NE}\)). Initially, \(\eta \left( v_{\mathcal {T}}\right) \!=\!\eta \left( v_{\mathcal {B}}\right) \!=\!z_A\), and \(\varOmega \!=\!\{ z_A \}\) as the UAV is certain it is in zone \(z_A\). In \(s_2\), \(\eta \left( v_{\mathcal {B}}\right) \!=\!z_C\), yet \(\eta \left( v_{\mathcal {T}}\right) \!=\!z_D\). Although \(v_{\mathcal {T}}\) is hidden, PL_{II} is aware that \(\eta \left( v_{\mathcal {T}}\right) \) is in \(\varOmega \!=\!\{ z_B, z_C, z_D \}\).
Example 2
(HIG Semantics). Continuing Example 1, let us assume that the set of actions \(A_\mathrm{I} = A_\mathrm{II} = \lbrace \mathsf {N},\mathsf {S},\mathsf {E},\mathsf {W},\mathsf {NE},\mathsf {NW},\mathsf {SE},\mathsf {SW} \rbrace \), and that \(\theta \!=\!\mathsf {GT}\) is a geolocation task that attempts to reveal the true value of the game.^{2} Now, consider the scenario illustrated in Fig. 3. At the initial state \(s_0\), the UAV attempts to move north (\(\mathsf {N}\)), progressing the game to the state \(s_1\), where the adversary takes her turn by selecting an action from the set \(\beta (\mathsf {N}) = \lbrace \mathsf {NE}, \mathsf {N}, \mathsf {NW}\rbrace \). The players take turns until the UAV performs a geolocation task \(\mathsf {GT}\), moving from the state \(s_4\) to \(s_5\). With probability \(p = \delta (s_5,\tau ,s_6)\), the UAV detects its true location and updates its belief accordingly (i.e., to \(s_6\)). Otherwise, the belief remains the same (i.e., equal to \(s_4\)).
3 DelayedAction Games
In this section, we provide an alternative representation of HIGs that eliminates the use of private variables—we introduce DelayedAction Games (DAGs) that exploit the notion of delayed actions. Furthermore, we show that for any HIG, a DAG that simulates the former can be constructed.
Delayed Actions. Informally, a DAG reconstructs an HIG such that actions of PL_{I} (the player with access to perfect information) follow the actions of PL_{II}, i.e., PL_{I} actions are delayed. This rearrangement of the players’ actions provides a means to hide information from PL_{II} without the use of private variables, since in this case, at PL_{II} states, PL_{I} actions have not occurred yet. In this way, PL_{II} can act as though she has complete information at the moment she makes her decision, as the future state has not yet happened and so cannot be known. In essence, the formalism can be seen as a partial ordering of the players’ actions, exploiting the (partial) superposition property that a wide class of physical systems exhibit. To demonstrate this notion, let us consider DAG modeling on our running example.
Example 3
(Delaying Actions). Figure 5 depicts the (HIGbased) scenario from Fig. 3, but in the corresponding DAG, where the UAV actions are performed first (in \(\hat{s}_0,\hat{s}_1,\hat{s}_2\)), followed by the adversary delayed actions (in \(\hat{s}_3,\hat{s}_4\)). Note that, in the DAG model, at the time the UAV executed its actions (\(\hat{s}_0,\hat{s}_1,\hat{s}_2\)) the adversary actions had not occurred (yet). Moreover, \(\hat{s}_0\) and \(\hat{s}_6\) (Fig. 5) share the same belief and true values as \(s_0\) and \(s_6\) (Fig. 3), respectively, though the transient states do not exactly match. This will be used to show the relationship between the games.
The advantage of this approach is twofold. First, the elimination of private variables enables simulation of an HIG using a fullinformation game. Thus, the formulation of the strategy synthesis problem using offtheshelf SMGbased tools becomes feasible. In particular, a PL_{II} synthesized strategy becomes dependent on the knowledge of PL_{I} behavior (possible actions), rather than the specific (hidden) actions. We formalize a DAG as follows.
Definition 3

\(\hat{S} \subseteq Ev \left( v_\mathcal {T}\right) \times Ev \left( v_\mathcal {B}\right) \times A_\mathrm{II}^* \times \mathbb {N}_0\times \varGamma \) is the set of states, partitioned into \(\hat{S}_\mathrm{I}, \hat{S}_\mathrm{II}\) and \(\hat{S}_\bigcirc \);

\(\hat{s}_0 \in \hat{S}_\mathrm{II}\) is the initial state; and

\(\hat{\delta } :\hat{S} \times A \times \hat{S} \rightarrow [0,1] \) is a transition function such that \(\hat{\delta } (\hat{s}_\mathrm{II},a,\hat{s}_\bigcirc ) \!=\!\) \(\hat{\delta } (\hat{s}_\mathrm{I},a,\hat{s}_\mathrm{II}) \!=\!\) \(\hat{\delta } (\hat{s}_\bigcirc ,a,\hat{s}_\mathrm{I}) \!=\!0\), and \(\hat{\delta } (\hat{s}_\mathrm{II},a,\hat{s}_\mathrm{II}) \in \left\{ 0, 1 \right\} \), \(\hat{\delta } (\hat{s}_\mathrm{II},\theta ,\hat{s}_\mathrm{I}) \in \left\{ 0, 1 \right\} \), \(\hat{\delta } (\hat{s}_\mathrm{I},a,\hat{s}_\mathrm{I}) \in \left\{ 0, 1 \right\} \), \(\hat{\delta } (\hat{s}_\mathrm{I},a,\hat{s}_\bigcirc ) \in \left\{ 0, 1 \right\} \), for all \(\hat{s}_\mathrm{I} \in \hat{S}_\mathrm{I}\), \(\hat{s}_\mathrm{II} \in \hat{S}_\mathrm{II}\), \(\hat{s}_\bigcirc \in \hat{S}_\bigcirc \) and \(a \in A\), where \(\sum _{\hat{s}^\prime \in \hat{S}_\mathrm{II}}{\delta (\hat{s}_\bigcirc ,a,\hat{s}')} \!=\!1\).
Note that, in contrast to transition function \(\delta \) in HIG \(\mathcal {G}_{\mathsf {H}}\), \(\hat{\delta }\) in DAG \(\mathcal {G}_{\mathsf {D}}\) only allows transitions \(\hat{s}_\mathrm{II}\) to \(\hat{s}_\mathrm{II}\) or \(\hat{s}_\mathrm{I}\), as well as \(\hat{s}_\mathrm{I}\) to \(\hat{s}_\mathrm{I}\) or \(\hat{s}_\bigcirc \), and probabilistic transitions \(\hat{s}_\bigcirc \) to \(\hat{s}_\mathrm{II}\); also note that \(\hat{s}_\mathrm{II}\) to \(\hat{s}_\mathrm{I}\) is conditioned by the action \(\theta \).
In both \(\mathcal {G}_{\mathsf {H}}\) and \(\mathcal {G}_{\mathsf {D}}\), we label states where all players have full knowledge of the current state as proper. We also say that two states are similar if they agree on the belief, and equivalent if they agree on both the belief and ground truth.
Definition 4

\(s_\gamma \) is proper iff \(\varOmega = \{t\}\), denoted by \(s_\gamma \in \mathrm {Prop}(\mathcal {G}_{\mathsf {H}})\).

\(\hat{s}_{\hat{\gamma }}\) is proper iff \(w = \epsilon \), denoted by \(\hat{s}_{\hat{\gamma }} \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}})\).

\(s_\gamma \) and \(\hat{s}_{\hat{\gamma }}\) are similar iff \(\hat{u} = u\), \(\hat{t} \in \varOmega \), and \(\gamma =\hat{\gamma }\), denoted by \(s _\gamma \sim \hat{s}_{\hat{\gamma }}\).

\(s_\gamma \), \(\hat{s}_{\hat{\gamma }}\) are equivalent iff \(t = \hat{t}\), \(u = \hat{u}\), \(w=\epsilon \), and \(\gamma =\hat{\gamma }\), denoted by \(s_\gamma \simeq \hat{s}_{\hat{\gamma }}\).
From the above definition, we have that \(s \simeq \hat{s} \implies s \in \mathrm {Prop}(\mathcal {G}_{\mathsf {H}}), \hat{s} \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}})\). We now define execution fragments, possible progressions from a state to another.
Definition 5
(Execution Fragment). An execution fragment (of either an SMG, DAG or HIG) is a finite sequence of states, actions and probabilities
\(\varrho = s_0 a_1 p_1 s_1 a_2 p_2 s_2 \ldots a_n p_n s_n \) such that Open image in new window .^{3}
We use \( first (\varrho )\) and \( last (\varrho )\) to refer to the first and last states of \(\varrho \), respectively. If both states are proper, we say that \(\varrho \) is proper as well, denoted by \(\varrho \in \mathrm {Prop}(\mathcal {G}_{\mathsf {H}})\).^{4} Moreover, \(\varrho \) is deterministic if no probabilities appear in the sequence.
Definition 6
(Move). A move \(m_\gamma \) of an execution \(\varrho \) from state \(s\in \varrho \), denoted by \( move _\gamma (s,\varrho )\), is a sequence of actions \(a_1 a_2 \ldots a_i \in A_\gamma ^*\) that player \(\gamma \) performs in \(\varrho \) starting from s.
By omitting the player index we refer to the moves of all players. To simplify notation, we use \( move (\varrho )\) as a short notation for \( move ( first (\varrho ),\varrho )\). We write \((m)( first (\varrho )) = last (\varrho )\) to denote that the execution of move m from the \( first (\varrho )\) leads to the \( last (\varrho )\). This allows us to now define the delay operator as follows.
Definition 7
(Delay Operator). For an \(\mathcal {G}_{\mathsf {H}}\), let \(m = move (\varrho ) = a_1 b_1 \ldots a_n b_n \theta \) be a move for some deterministic \(\varrho \in \mathrm {TS}(\mathcal {G}_{\mathsf {H}}) \), where \(a_1 ... a_n \in A_\mathrm{II}^*, b_1 ... b_n \in A_\mathrm{I}^*\). The delay operator, denoted by \(\overline{m}\), is defined by the rule \(\overline{m} = a_1 \ldots a_n \theta b_1 \ldots b_n\).
Simulation Relation. Given an HIG \(\mathcal {G}_{\mathsf {H}}\), we first define the corresponding DAG \(\mathcal {G}_{\mathsf {D}}\).
Definition 8
(Correspondence). Given an HIG \(\mathcal {G}_{\mathsf {H}}\), a corresponding DAG \(\mathcal {G}_{\mathsf {D}}= \mathfrak {D}[\mathcal {G}_{\mathsf {H}}]\) is a DAG that follows the semantic rules displayed in Fig. 7.
For the rest of this section, we consider \(\mathcal {G}_{\mathsf {D}}= \mathfrak {D} [\mathcal {G}_{\mathsf {H}}]\), and use \(\varrho \in \mathrm {TS}(\mathcal {G}_{\mathsf {H}})\) and \(\hat{\varrho } \in \mathrm {TS}(\mathcal {G}_{\mathsf {D}})\) to denote two execution fragments of the HIG and DAG, respectively. We say that \(\varrho \) and \(\hat{\varrho }\) are similar, denoted by \(\varrho \sim \hat{\varrho }\), iff \( first (\varrho ) \simeq first (\hat{\varrho })\), \( last (\varrho ) \sim last (\hat{\varrho })\), and \(\overline{ move (\varrho )} = move (\hat{\varrho })\).
Definition 9
(Game Proper Simulation). A game \(\mathcal {G}_{\mathsf {D}}\) properly simulates \(\mathcal {G}_{\mathsf {H}}\), denoted by \(\mathcal {G}_{\mathsf {D}}\leadsto \mathcal {G}_{\mathsf {H}}\), iff \(\forall \varrho \in \mathrm {Prop}(\mathcal {G}_{\mathsf {H}})\), \(\exists \hat{\varrho } \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}})\) such that \(\varrho \sim \hat{\varrho }\).
Before proving the existence of the simulation relation, we first show that if a move is executed on two equivalent states, then the terminal states are similar.
Lemma 1
(Terminal States Similarity). For any \(s_0 \!\simeq \! \hat{s}_0\) and a deterministic \(\varrho \!\in \! \mathrm {TS}(\mathcal {G}_{\mathsf {H}})\) where \( first (\varrho ) \!=\!s_0\), \( last (\varrho ) \in S_\mathrm{II}\), then \( last (\varrho ) \!\sim \! \left( \overline{ move (\varrho )}\right) \! \left( \hat{s}_0\right) \) holds.
Proof

Base \((i \!=\!0)\): \(\varrho _0 \!=\!s_0 \) \(\implies s^{(0)} \simeq \hat{s}^{(0)} \) where \(u_0=\hat{u}_0\) and \(t_0 \!=\!\hat{t}_0\).

Induction \((i \!>\! 0)\): Assume that the claim holds for \( move (\varrho _{i1}) \!=\!a_1 b_1 ... a_{i1} b_{i1} \theta \), i.e., \(u_{i1} \!=\!\hat{u}_{i1}\) and \(\hat{t}_{i1} \!\in \! \varOmega _{i1}\). For \(\varrho _i\) we have that Open image in new window and Open image in new window . Also, Open image in new window and Open image in new window . Hence, \( u_i \!=\!\hat{u}_i\), \(\hat{t}_i \in \varOmega _i\) and \(\hat{\gamma }_i \!=\!\gamma _i \!=\!\bigcirc \). Thus, \(s^{(i)} \sim \hat{s}^{(i)}\) holds. The same can be shown for \( move (\varrho ) = a_1 b_1 ... a_i b_i \) where no \(\theta \) occurs. \(\square \)
Theorem 1
Proof

Base (\(n \!=\!0\)): for \(\varrho \) to be deterministic and proper, \(\varrho \!=\!\varrho _0 \!=\!s^{(0)} \) holds.

Case (\(n \!=\!1\)): \(p_1 = p(t_0^\prime ,u_0^\prime ) \). From Lemma 1, \(\hat{u}_1 \!=\!u_1 \) and \(\hat{t}_1 \!=\!t_1 \). Hence, \( \mathrm {Pr} \left[ last (\varrho ) \!=\!s_\mathrm{II}^{(1)} \right] = \mathrm {Pr} \left[ \left( \overline{ move (\varrho )}\right) \left( \hat{s}_{0}\right) \!=\!\hat{s}_\mathrm{II}^{(1)} \right] \!=\!p(t_0^\prime ,u_0^\prime ) \) and \(s_\mathrm{II}^{(1)} \simeq \hat{s}_\mathrm{II}^{(1)}\).

Induction (\(n \!>\! 1\)): It is straightforward to infer that \( p_n \!=\!p\left( t_{n1}^\prime , u_{n1}^\prime \right) \), hence \( \mathrm {Pr} \! \left[ last (\varrho ) \!=\!s_\mathrm{II}^{(n)} \right] = \mathrm {Pr} \! \left[ \left( \overline{ move (\varrho )}\right) \! \left( \hat{s}^{(0)}\right) \!=\!\hat{s}^{(n)} \right] = P\), and \(s_\mathrm{II}^{(n)} \simeq \hat{s}_\mathrm{II}^{(n)}\). \(\square \)
Theorem 2
(DAGHIG Simulation). For any HIG \(\mathcal {G}_{\mathsf {H}}\) there exists a DAG \(\mathcal {G}_{\mathsf {D}}= \mathfrak {D}[\mathcal {G}_{\mathsf {H}}]\) such that \(\mathcal {G}_{\mathsf {D}}\leadsto \mathcal {G}_{\mathsf {H}}\) (as defined in Definition 9).
4 Properties of DAG and DAGbased Synthesis
We here discuss DAG features, including how it can be decomposed into subgames by restricting the simulation to finite executions, and the preservation of safety properties, before proposing a DAGbased synthesis framework.
Transitions. In DAGs, nondeterministic actions of different players underline different semantics. Specifically, PL_{I} nondeterminism captures what is known about the adversarial behavior, rather than exact actions, where PL_{I} actions are constrained by the earlier PL_{II} action. Conversely, PL_{II} nondeterminism abstracts the player’s decisions. This distinction reflects how DAGs can be used for strategy synthesis under hidden information. To illustrate this, suppose that a strategy \(\pi _\mathrm{II}\) is to be obtained based on a worstcase scenario. In that case, the game is explored for all possible adversarial behaviors. Yet, if a strategy \(\pi _\mathrm{I}\) is known about PL_{I}, a counter strategy \(\pi _\mathrm{II}\) can be found by constructing \(\mathcal {G}_{\mathsf {D}}^{\pi _\mathrm{I}}\).
Probabilistic behaviors in DAGs are captured by \(\mathrm {PL}_{\bigcirc }{}\), which is characterized by the transition function \(\hat{\delta } :\hat{S}_\bigcirc \times \hat{S}_\mathrm{II} \rightarrow [0,1]\). The specific definition of \(\hat{\delta }\) depends on the modeled system. For instance, if the transition function (i.e., the probability) is stateindependent, i.e., \( \hat{\delta }(\hat{s}_\bigcirc , \hat{s}_\mathrm{II}) = c , c \in [0,1]\), the obtained model becomes trivial. Yet, with a statedependent transition function, i.e., \(\hat{\delta }(\hat{s}_\bigcirc ,\hat{s}_\mathrm{II}) = p(\hat{t},\hat{u})\), the probability that PL_{II} successfully reveals the true value depends on both the belief and the true value, and the transition function can then be realized since \(\hat{s}_\bigcirc \) holds both \(\hat{t}\) and \(\hat{u}\).
Decomposition. Consider an execution \(\hat{\varrho }^* \!=\!\hat{s}_0 a_1 \hat{s}_1 a_2 \hat{s}_2 \ldots \) that describes a scenario where PL_{II} performs infinitely many actions with no attempt to reveal the true value. To simulate \(\hat{\varrho }^*\), the word w needs to infinitely grow. Since we are interested in finite executions, we impose stopping criteria on the DAG, such that the game is trapped whenever \(w \!=\!h_{\max }\) is true, where \(h_{\max } \in \mathbb {N}\) is an upper horizon. We formalize the stopping criteria as a deterministic finite automaton (DFA) that, when composed with the DAG, traps the game whenever the stopping criteria hold. Note that imposing an upper horizon by itself is not a sufficient criterion for a DAG to be considered a stopping game [8]. Conversely, consider a proper (and hence finite) execution \(\hat{\varrho } \!=\!\hat{s}_0 a_1 \ldots \hat{s}^\prime \), where \(\hat{s}_0, \hat{s}^\prime \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}})\). From Definition 9, it follows that a DAG initial state is strictly proper, i.e., \(\hat{s}_0 \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}})\). Hence, when \(\hat{s}^\prime \) is reached, the game can be seen as if it is repeated with a new initial state \(\hat{s}^\prime \). Consequently, a DAG game (complemented with stopping criteria) can be decomposed into a (possibly infinite) countable set of subgames that have the same structure yet different initial states.
Definition 10
(DAG Subgames). The subgames of a \(\mathcal {G}_{\mathsf {D}}\) are defined by the set Open image in new window where \( \hat{S} = \bigcup _{i}^{}{\hat{S}^{(i)}} \); \( \hat{S}_{\gamma } = \bigcup _{i}^{}{\hat{S}_{\gamma }^{(i)}} \; \forall \gamma \in \varGamma \); and \( \hat{s}_0^{(i)} = \hat{s}_\mathrm{II}^{(i)} \; \text{ s.t. }\; \hat{s}_\mathrm{II}^{(i)} \in \mathrm {Prop}(\mathcal {G}_{\mathsf {D}}^{(i)}) \; , \; \hat{s}_\mathrm{II}^{(i)} \ne \hat{s}_\mathrm{II}^{(j)} \; \forall i,j \in \mathbb {N}_0\).
Intuitively, each subgame either reaches a proper state (representing the initial state of another subgame) or terminates by an upper horizon. This decomposition allows for the independent (and parallel) analysis of individual subgames, drastically reducing both the time required for synthesis and the explored state space, and hence improving scalability. An example of this decompositional approach is provided in Sect. 5.
Preservation of Safety Properties. In DAGs, the action \(\theta \) denotes a transition from PL_{II} to PL_{I} states and thus the execution of any delayed actions. While this action can simply describe a revealing attempt, it can also serve as a whatif analysis of how the true value may evolve at stage i of a subgame. We refer to an execution of the second type as a hypothetical branch, where \(\mathrm {Hyp}(\hat{\varrho },h)\) denotes the set of hypothetical branches from \(\hat{\varrho }\) at stage \(h \in \{ 1, \ldots , n \} \). Let \(L_{\mathrm {safe}} (s)\) be a labeling function denoting if a state is safe. The formula Open image in new window is satisfied by an execution \(\varrho \) in HIG iff all \( s(t,u,\varOmega ) \in \varrho \) are safe.
 (a)
\(L_{\mathrm {safe}} (s)\) depends only on the belief u, then \(\varrho \models \varPhi _{\mathrm {safe}}\) iff all \(\hat{s}_{II} \in \hat{\varrho }\) are safe;
 (b)
\(L_{\mathrm {safe}} (s)\) depends only on the true value t, then \(\varrho \models \varPhi _{\mathrm {safe}}\) iff all \(\hat{s}_\mathrm{I} \in \mathrm {Hyp}(\hat{\varrho },n)\) are safe; and
 (c)
\(L_{\mathrm {safe}} (s)\) depends on both the true value t and belief u, then \(\varrho \models \varPhi _{\mathrm {safe}}\) iff \( last (\hat{\varrho }_h)\) is safe for all \( \hat{\varrho }_h \!\in \! \mathrm {Hyp}(\hat{\varrho },h), h \!\in \! \{ 1, ... , n \},\) where n is the number of PL_{II} actions.
Taking into account such relations, both safety (e.g., never encounter a hazard) and distancebased requirements (e.g., never exceed a subgame horizon) can be specified when using DAGs for synthesis, to ensure their satisfaction in the original model. This can be generalized to other rewardbased synthesis objectives, which will be part of our future efforts that we discuss in Sect. 6.
Algorithm 2 describes the procedure for strategy synthesis based on the DAG \(\mathcal {G}_{\mathsf {D}}\), and an rPATL [6] synthesis query \(\phi _\mathrm {syn}\) that captures, for example, a safety requirement. Starting with the initial location, the procedure checks whether \(\phi _\mathrm {syn}\) is satisfied if action \(\theta \) is performed at stage h, and updates the set of feasible strategies \(\varPi _{i}\) for subgame \(\hat{\mathcal {G}}_i\) until \(h_{\max }\) is reached or \(\phi _\mathrm {syn}\) is not satisfied.^{6} Next, the set \(\varPi _{i}\) is used to update the list of reachable end locations \(\ell \) with new initial locations of reachable subgames that should be explored. Finally, the composition of both \(\mathcal {G}_{\mathsf {H}}\) and \(\varPi _\mathrm{II}^*\) resolves PL_{II} nondeterminism, where the resulting model \(\mathcal {G}_{\mathsf {H}}^{\varPi _\mathrm{II}^*} \) is a Markov Decision Process (MDP) of complete information that can be easily used for further analysis.
5 Case Study

Target reachability. To overcome cases where targets are unreachable due to hazard zones, the label \( reach \) is assigned to the set of states with acceptable checkpoint locations (including the target) to render the objective incrementally feasible. The objective for all encountered subgames is then formalized as \( \mathrm {Pr}_{\max } \left[ \mathsf {F}\; reach \right] \geqslant p_{\min } \) for some bound \(p_{\min }\).

Hazard Avoidance. Similar to target reachability, the label \( hazard \) is assigned to states corresponding to hazard zones. The objective \(\mathrm {Pr}_{\max } \left[ \mathsf {G}\; \lnot hazard \right] \geqslant p_{\min }\) is then specified for all encountered subgames.
We implemented the model in PRISMgames [7, 19] and performed the experiments on an Intel Core i7 4.0 GHz CPU, with 10 GB RAM dedicated to the tool. Figure 10(b) shows the supergame obtained by following the procedure in Algorithm 2. A vertex \(\hat{\mathcal {G}}_{{\texttt {xy}}}\) represents a subgame (composed with its strategy) that starts at location \(({\texttt {x}},{\texttt {y}})\), while the outgoing edges points to subgames reachable from the current one. Note that each edge represents a probabilistic transition. Subgames with more than one outgoing transition imply nondeterminism that is resolved by the adversary actions. Hence, the directed graph depicts an MDP.
Results for strategy synthesis using queries \(\phi _\mathrm {syn}\) and \(\phi _\mathrm {ana}\).
Subgame \(\hat{\mathcal {G}}_{51}\)  Model size  Time (sec)  

Map  \(t_{adv}\)  k  States  Transitions  Choices  Model  \(\phi _\mathrm {syn}\)  \(\phi _\mathrm {ana}\) 
\(8 \times 8\)  1  4  11,608  17,397  15,950  2.810  0.072  – 
5  57,129  87,865  83,267  14.729  0.602  –  
6  236,714  366,749  359,234  62.582  1.293  –  
7  876,550  1,365,478  1,355,932  231.741  6.021  –  
2  4  6,678  9,230  8,394  2.381  0.042  –  
5  33,904  48,545  45,354  10.251  0.367  –  
6  141,622  204,551  198,640  37.192  1.839  –  
7  524,942  763,144  754,984  145.407  8.850  –  
Supergame \(\mathcal {G}_{\mathsf {D}}\)  6,212  8,306  6,660  2.216  –  2.490 
6 Discussion and Conclusion
In this paper, we introduced DAGs and showed how they can simulate HIGs by delaying players’ actions. We also derived a DAGbased framework for strategy synthesis and analysis using offtheshelf SMG model checkers. Under some practical assumptions, we showed that DAGs can be decomposed into independent subgames, utilizing parallel computation to reduce the time needed for model analysis, as well as the size of the state space. We further demonstrated the applicability of the proposed framework on a case study focused on synthesis and analysis of active attack detection strategies for UAVs prone to cyber attacks.
DAGs come at the cost of increasing the total state space size as \(\mathcal {M}_\mathrm {mrd}\) and \(\mathcal {M}_\mathrm {mwr}\) are introduced. This does not present a significant limitation due to the compositional approach towards strategy synthesis using subgames. However, the synthesis is still limited to model sizes that offtheshelf tools can handle.
The concept of delaying actions implicitly assumes that the adversary knows the UAV actions a priori. This does not present a concern in the presented case study as an abstract (i.e., nondeterministic) adversary model is analogous to synthesizing against the worstcase attacking scenario. Nevertheless, strategies synthesized using DAGs (and SMGs in general) are inherently conservative. Depending on the considered system, this can easily lead to no feasible solution.
The proposed synthesis framework ensures preservation of safety properties. Yet, general rewardbased strategy synthesis is to be approached with care. For example, rewards dependent on the belief can appear in any state, and exploring hypothetical branches is not required. However, rewards dependent on a state’s true value should only appear in proper states, and all hypothetical branches are to be explored. A detailed investigation of how various properties are preserved by DAGs, along with multiobjective synthesis, is a direction for future work.
Footnotes
 1.
The term turnbased indicates that at any state only one player can play an action. It does not necessarily imply that players take fair turns.
 2.
A geolocation task is an attempt to localize the UAV by examining its camera feed.
 3.
For deterministic transitions, \(p=1\), hence omitted from \(\varrho \) for readability.
 4.
An execution fragment lives in the transition system (TS), i.e., \(\varrho \in \mathrm {Prop}(\mathrm {TS}(\mathcal {G}))\). We omit \(\mathrm {TS}\) for readability.
 5.
Specific implementation details are described in Sect. 5.
 6.
Failing to find a strategy at stage i implies the same for all horizons of size \(j>i\).
 7.
References
 1.Baier, C., Brazdil, T., Grosser, M., Kucera, A.: Stochastic game logic. In: Fourth International Conference on the Quantitative Evaluation of Systems, QEST 2007, pp. 227–236. IEEE (2007). https://doi.org/10.1109/QEST.2007.38
 2.Basset, N., Kwiatkowska, M., Topcu, U., Wiltsche, C.: Strategy synthesis for stochastic games with multiple longrun objectives. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 256–271. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466810_22CrossRefGoogle Scholar
 3.Basset, N., Kwiatkowska, M., Wiltsche, C.: Compositional strategy synthesis forstochastic games with multiple objectives. Information and Computation (2017). https://doi.org/10.1016/j.ic.2017.09.010MathSciNetCrossRefGoogle Scholar
 4.Brázdil, T., Chatterjee, K., Křetínský, J., Toman, V.: Strategy representation by decision trees in reactive synthesis. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 385–407. Springer, Cham (2018). https://doi.org/10.1007/9783319899602_21CrossRefGoogle Scholar
 5.Chatterjee, K., Henzinger, T.A.: Semiperfectinformation games. In: Sarukkai, S., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, pp. 1–18. Springer, Heidelberg (2005). https://doi.org/10.1007/11590156_1CrossRefGoogle Scholar
 6.Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic verification of competitive stochastic systems. Form. Methods Syst. Des. 43(1), 61–92 (2013). https://doi.org/10.1007/s1070301301837CrossRefzbMATHGoogle Scholar
 7.Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: PRISMgames: a model checker for stochastic multiplayer games. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 185–191. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642367427_13CrossRefzbMATHGoogle Scholar
 8.Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 266–277. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642403132_25CrossRefGoogle Scholar
 9.Chen, T., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: Synthesis for multiobjective stochastic games: an application to autonomous urban driving. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 322–337. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642401961_28CrossRefGoogle Scholar
 10.David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: UPPAAL STRATEGO. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466810_16CrossRefGoogle Scholar
 11.Elfar, M., Zhu, H., Cummings, M.L., Pajic, M.: Securityaware synthesis of humanUAV protocols. In: Proceedings of 2019 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2019)Google Scholar
 12.Feng, L., Wiltsche, C., Humphrey, L., Topcu, U.: Synthesis of humanintheloop control protocols for autonomous systems. IEEE Trans. Autom. Sci. Eng. 13(2), 450–462 (2016). https://doi.org/10.1109/TASE.2016.2530623CrossRefGoogle Scholar
 13.Fremont, D.J., Seshia, S.A.: Reactive control improvisation. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 307–326. Springer, Cham (2018). https://doi.org/10.1007/9783319961453_17CrossRefGoogle Scholar
 14.Fu, J., Topcu, U.: Integrating active sensing into reactive synthesis with temporal logic constraints under partial observations. In: 2015 American Control Conference (ACC), pp. 2408–2413. IEEE (2015). https://doi.org/10.1109/ACC.2015.7171093
 15.Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. AAAI 4, 709–715 (2004)Google Scholar
 16.Jovanov, I., Pajic, M.: Relaxing integrity requirements for attackresilient cyberphysical systems. IEEE Trans. Autom. Control (2019). https://doi.org/10.1109/TAC.2019.2898510
 17.Kelmendi, E., Krämer, J., Křetínský, J., Weininger, M.: Value iteration for simple stochastic games: stopping criterion and learning algorithm. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 623–642. Springer, Cham (2018). https://doi.org/10.1007/9783319961453_36CrossRefGoogle Scholar
 18.Klein, F., Zimmermann, M.: How much lookahead is needed to win infinite games? In: Halldórsson, M.M., Iwama, K., Kobayashi, N., Speckmann, B. (eds.) ICALP 2015. LNCS, vol. 9135, pp. 452–463. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662476666_36CrossRefGoogle Scholar
 19.Kwiatkowska, M., Parker, D., Wiltsche, C.: Prismgames: verification and strategy synthesis for stochastic multiplayer games with multiple objectives. Int. J. Softw. Tools Technol. Transf. 20(2), 195–210 (2018)CrossRefGoogle Scholar
 20.Lesi, V., Jovanov, I., Pajic, M.: Securityaware scheduling of embedded control tasks. ACM Trans. Embed. Comput. Syst. (TECS) 16(5s), 188:1–188:21 (2017). https://doi.org/10.1145/3126518CrossRefGoogle Scholar
 21.Li, W., Sadigh, D., Sastry, S.S., Seshia, S.A.: Synthesis for humanintheloop control systems. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 470–484. Springer, Heidelberg (2014). https://doi.org/10.1007/9783642548628_40CrossRefGoogle Scholar
 22.Mo, Y., Sinopoli, B.: On the performance degradation of cyberphysical systems under stealthy integrity attacks. IEEE Trans. Autom. Control 61(9), 2618–2624 (2016). https://doi.org/10.1109/TAC.2015.2498708MathSciNetCrossRefzbMATHGoogle Scholar
 23.Neider, D., Topcu, U.: An automaton learning approach to solving safety games over infinite graphs. In: Chechik, M., Raskin, J.F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 204–221. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662496749_12CrossRefGoogle Scholar
 24.Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic realtime systems. In: Sankaranarayanan, S., Vicario, E. (eds.) FORMATS 2015. LNCS, vol. 9268, pp. 240–255. Springer, Cham (2015). https://doi.org/10.1007/9783319229751_16CrossRefzbMATHGoogle Scholar
 25.Pajic, M., Lee, I., Pappas, G.J.: Attackresilient state estimation for noisy dynamical systems. IEEE Trans. Control Netw. Syst. 4(1), 82–92 (2017). https://doi.org/10.1109/TCNS.2016.2607420MathSciNetCrossRefzbMATHGoogle Scholar
 26.Pajic, M., Weimer, J., Bezzo, N., Sokolsky, O., Pappas, G.J., Lee, I.: Design and implementation of attackresilient cyberphysical systems: with a focus on attackresilient state estimators. IEEE Control Syst. 37(2), 66–81 (2017). https://doi.org/10.1109/MCS.2016.2643239
 27.Rasmusen, E., Blackwell, B.: Games and Information, vol. 15. MIT Press, Cambridge (1994)Google Scholar
 28.Svoreňová, M., Kwiatkowska, M.: Quantitative verification and strategy synthesis for stochastic games. Eur. J. Control 30, 15–30 (2016). https://doi.org/10.1016/j.ejcon.2016.04.009MathSciNetCrossRefzbMATHGoogle Scholar
 29.Wiltsche, C.: Assumeguarantee strategy synthesis for stochastic games. Ph.D. thesis, Ph.D. dissertation, Department of Computer Science, University of Oxford (2015)Google Scholar
 30.Zimmermann, M.: Delay games with WMSO+ U winning conditions. RAIRO Theor. Inform. Appl. 50(2), 145–165 (2016). https://doi.org/10.1051/ita/2016018MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.