Multiarmed Bandits for Boolean Connectives in Hybrid System Falsification
Abstract
Hybrid system falsification is an actively studied topic, as a scalable quality assurance methodology for realworld cyberphysical systems. In falsification, one employs stochastic hillclimbing optimization to quickly find a counterexample input to a blackbox system model. Quantitative robust semantics is the technical key that enables use of such optimization. In this paper, we tackle the socalled scale problem regarding Boolean connectives that is widely recognized in the community: quantities of different scales (such as speed [km/h] vs. rpm, or worse, rph) can mask each other’s contribution to robustness. Our solution consists of integration of the multiarmed bandit algorithms in hill climbingguided falsification frameworks, with a technical novelty of a new reward notion that we call hillclimbing gain. Our experiments show our approach’s robustness under the change of scales, and that it outperforms a stateoftheart falsification tool.
1 Introduction
Hybrid System Falsification. Quality assurance of cyberphysical systems (CPS) is attracting growing attention from both academia and industry, not only because it is challenging and scientifically interesting, but also due to the safetycritical nature of many CPS. The combination of physical systems (with continuous dynamics) and digital controllers (that are inherently discrete) is referred to as hybrid systems, capturing an important aspect of CPS. To verify hybrid systems is intrinsically hard, because the continuous dynamics therein leads to infinite search spaces.
More researchers and practitioners are therefore turning to optimizationbased falsification as a quality assurance measure for CPS. The problem is formalized as follows.

Given: a model \({\mathcal {M}}\) (that takes an input signal \({\mathbf {u}}\) and yields an output signal \({\mathcal {M}}({\mathbf {u}})\)), and a specification \(\varphi \) (a Open image in new window temporal formula)

Find: a falsifying input, that is, an input signal \({\mathbf {u}}\) such that the corresponding output \({\mathcal {M}}({\mathbf {u}})\) violates \(\varphi \)
Boolean satisfaction \({\mathbf {w}}\models \varphi \), and quantitative robustness values \([\![{\mathbf {w}},\varphi ]\!]\), of three signals of \( speed \) for the STL formula \(\varphi \equiv \Box _{[0,30]}( speed < 120)\)
An illustration of robust semantics is in Table 1. We use signal temporal logic (STL) [12], a temporal logic that is commonly used in hybrid system specification. The specification says the speed must always be below 120 during the time interval [0, 30]. In the search of an input signal \({\mathbf {u}}\) (e.g. of throttle and brake) whose corresponding output \({\mathcal {M}}({\mathbf {u}})\) violates the specification, the quantitative robustness \([\![{\mathcal {M}}({\mathbf {u}}),\varphi ]\!]\) gives much more information than the Boolean satisfaction \({\mathcal {M}}({\mathbf {u}})\models \varphi \). Indeed, in Table 1, while Boolean satisfaction fails to discriminate the first two signals, the quantitative robustness indicates a tendency that the second signal is closer to violation of the specification.
In the falsification literature, stochastic algorithms are used for hillclimbing optimization. Examples include simulated annealing (SA), globalized NelderMead (GNM [30]) and covariance matrix adaptation evolution strategy (CMAES [6]). Note that the system model \({\mathcal {M}}\) can be blackbox: we have only to observe the correspondence between input \({\mathbf {u}}\) and output \({\mathcal {M}}({\mathbf {u}})\). Observing an error \({\mathcal {M}}({\mathbf {u}}')\) for some input \({\mathbf {u}}'\) is sufficient evidence for a system designer to know that the system needs improvement. Besides these practical advantages, optimizationbased falsification is an interesting scientific topic: it combines two different worlds of formal reasoning and stochastic optimization.
Optimizationbased falsification started in [17] and has been developed vigorously [1, 3, 4, 5, 9, 11, 12, 13, 15, 27, 28, 34, 36, 38]. See [26] for a survey. There are mature tools such as Breach [11] and STaliro [5]; they work with industrystandard Simulink models.
Challenge: The Scale Problem in Boolean Superposition. In the field of hybrid falsification—and more generally in searchbased testing—the following problem is widely recognized. We shall call the problem the scale problem (in Boolean superposition).
Another related problem is that the efficiency of a falsification algorithm would depend on the choice of units of measure. Imagine replacing rpm with rph in (1), which makes the constant 4000 into 240000, and make the situation even worse.
These problems—that we call the scale problem—occur in many falsification examples, specifically when a specification involves Boolean connectives. We do need Boolean connectives in specifications: for example, many realworld specifications in industry are of the form \(\Box _{I}(\varphi _{1}\rightarrow \varphi _{2})\), requiring that an event \(\varphi _{1}\) triggers a countermeasure \(\varphi _{2}\) all the time.
Contribution: Integrating MultiArmed Bandits into OptimizationBased Falsification. As a solution to the scale problem in Boolean superposition that we just described, we introduce a new approach that does not superpose robustness values. Instead, we integrate multiarmed bandits (MAB) in the existing framework of falsification guided by hillclimbing optimization.
The MAB problem is a prototypical reinforcement learning problem: a gambler sits in front of a row of slot machines; their performance (i.e. average reward) is not known; the gambler plays a machine in each round and he continues with many rounds; and the goal is to optimize cumulative rewards. The gambler needs to play different machines and figure out their performance, at the cost of the loss of opportunities in the form of playing suboptimal machines.
In this paper, we focus on specifications of the form \(\Box _{I}(\varphi _{1}\wedge \varphi _{2})\) and \(\Box _{I}(\varphi _{1}\vee \varphi _{2})\); we call them (conjunctive/disjunctive) safety properties. We identify an instance of the MAB problem in the choice of the formula (out of \(\varphi _{1},\varphi _{2}\)) to try to falsify by hill climbing. See Fig. 1. We combine MAB algorithms (such as \(\varepsilon \)greedy and UCB1, see Sect. 3.2) with hillclimbing optimization, for the purpose of coping with the scale problem in Boolean superposition. This combination is made possible by introducing a novel reward notion for MAB, called hillclimbing gain, that is tailored for this purpose.
We have implemented our MABbased falsification framework in MATLAB, building on Breach [11].^{1} Our experiments with benchmarks from [7, 24, 25] demonstrate that our MABbased approach is a viable one against the scale problem. In particular, our approach is observed to be (almost totally) robust under the change of scaling (i.e. changing units of measure, such as from rpm to rph that we discussed after the formula (1)). Moreover, for the benchmarks taken from the previous works—they do not suffer much from the scale problem—our algorithm performs better than the stateoftheart falsification tool Breach [11].
Related Work. Besides those we mentioned, we shall discuss some related works.
Formal verification approaches to correctness of hybrid systems employ a wide range of techniques, including model checking, theorem proving, rigorous numerics, nonstandard analysis, and so on [8, 14, 18, 20, 22, 23, 29, 32]. These are currently not very successful in dealing with complex realworld systems, due to issues like scalability and blackbox components.
Our use of MAB in falsification exemplifies the role of the explorationexploitation tradeoff, the core problem in reinforcement learning. The tradeoff has been already discussed for the verification of quantitative properties (e.g., [33]) and also in some works on falsification. A recent example is [36], where they use Monte Carlo tree search to force systematic exploration of the space of input signals. Besides MCTS, Gaussian process learning (GP learning) has also attracted attention in machine learning as a clean way of balancing exploitation and exploration. The GPUCB algorithm is a widely used strategy there. Its use in hybrid system falsification is pursued e.g. in [3, 34].
More generally, coverageguided falsification [1, 9, 13, 28] aims at coping with the explorationexploitation tradeoff. One can set the current work in this context—the difference is that we force systematic exploration on the specification side, not in the input space.
There have been efforts to enhance expressiveness of MTL and STL, so that engineers can express richer intentions—such as time robustness and frequency—in specifications [2, 31]. This research direction is orthogonal to ours; we plan to investigate the use of such logics in our current framework.
A similar masking problem around Boolean connectives is discussed in [10, 19]. Compared to those approaches, our technique does not need the explicit declaration of input vacuity and output robustness, but it relies on the “hillclimbing gain” reward to learn the significance of each signal.
Finally, the interest in the use of deep neural networks is rising in the field of falsification (as well as in many other fields). See e.g. [4, 27].
2 Preliminaries: Hill ClimbingGuided Falsification
We review a welladopted methodology for hybrid system falsification, namely the one guided by hillclimbing optimization. It makes essential use of quantitative robust semantics of temporal formulas, which we review too.
2.1 Robust Semantics for STL
Our definitions here are taken from [12, 17].
Definition 1
((timebounded) signal). Let \(T\in {\mathbb {R}}_{+}\) be a positive real. An Mdimensional signal with a time horizon T is a function \({\mathbf {w}}:[0,T]\rightarrow {\mathbb {R}}^{M}\).
Let \({\mathbf {w}}:[0,T]\rightarrow {\mathbb {R}}^{M}\) and \({\mathbf {w}}':[0,T']\rightarrow {\mathbb {R}}^{M}\) be Mdimensional signals. Their concatenation \({\mathbf {w}}\cdot {\mathbf {w}}':[0,T+T']\rightarrow {\mathbb {R}}^{M}\) is the Mdimensional signal defined by \( ({\mathbf {w}}\cdot {\mathbf {w}}')(t)= {\mathbf {w}}(t) \) if \(t\in [0,T]\), and \( ({\mathbf {w}}\cdot {\mathbf {w}}')(t)={\mathbf {w}}'(tT)\) if \(t\in (T,T+T']\).
Let \(0<T_{1}<T_{2}\le T\). The restriction \({\mathbf {w}}_{[T_{1},T_{2}]}:[0,T_{2}T_{1}]\rightarrow {\mathbb {R}}^{M}\) of \({\mathbf {w}}:[0,T]\rightarrow {\mathbb {R}}^{M}\) to the interval \([T_{1},T_{2}]\) is defined by \(({\mathbf {w}}_{[T_{1},T_{2}]})(t)={\mathbf {w}}(T_{1}+t)\).
One main advantage of optimizationbased falsification is that a system model can be a black box—observing the correspondence between input and output suffices. We therefore define a system model simply as a function.
Definition 2
(system model \({\mathcal {M}}\)). A system model, with Mdimensional input and Ndim. output, is a function \({\mathcal {M}}\) that takes an input signal \({\mathbf {u}}:[0,T]\rightarrow {\mathbb {R}}^{M}\) and returns a signal \({\mathcal {M}}({\mathbf {u}}):[0,T]\rightarrow {\mathbb {R}}^{N}\). Here the common time horizon \(T\in {\mathbb {R}}_{+}\) is arbitrary. Furthermore, we impose the following causality condition on \({\mathcal {M}}\): for any timebounded signals \({\mathbf {u}}:[0,T]\rightarrow {\mathbb {R}}^{M}\) and \({\mathbf {u}}':[0,T']\rightarrow {\mathbb {R}}^{M}\), we require that \( {\mathcal {M}}({\mathbf {u}}\cdot {\mathbf {u}}') \big _{[0,T]} = {\mathcal {M}}({\mathbf {u}}) \).
Definition 3
(STL syntax). We fix a set \(\mathbf {Var}\) of variables. In \(\text {STL}\), atomic propositions and formulas are defined as follows, respectively: Open image in new window , and Open image in new window . Here f is an Nary function \(f:{\mathbb {R}}^{N}\rightarrow {\mathbb {R}}\), \(x_1, \dots , x_N \in \mathbf {Var}\), and I is a closed nonsingular interval in \({\mathbb {R}}_{\ge 0}\), i.e. \(I=[a,b]\) or \([a, \infty )\) where \(a,b \in {\mathbb {R}}\) and \(a<b\).
We omit subscripts I for temporal operators if \(I = [0, \infty )\). Other common connectives such as \(\rightarrow ,\top \), \(\Box _{I}\) (always) and \(\Diamond _{I}\) (eventually), are introduced as abbreviations: \(\Diamond _{I}\varphi \equiv \top \mathbin {{\mathcal {U}}_{I}}\varphi \) and \(\Box _{I}\varphi \equiv \lnot \Diamond _{I}\lnot \varphi \). An atomic formula \(f(\varvec{x})\le c\), where \(c\in {\mathbb {R}}\), is accommodated using \(\lnot \) and the function Open image in new window .
Definition 4
(robust semantics [12]). Let \({\mathbf {w}}:[0,T]\rightarrow {\mathbb {R}}^{N}\) be an Ndimensional signal, and \(t\in [0,T)\). The tshift of \({\mathbf {w}}\), denoted by \({\mathbf {w}}^t\), is the timebounded signal \({\mathbf {w}}^t:[0,Tt]\rightarrow {\mathbb {R}}^{N}\) defined by Open image in new window .
For atomic formulas, \({ [\![{\mathbf {w}}, f(\varvec{x})>c ]\!]}\) stands for the vertical margin \(f(\varvec{x})c\) for the signal \({\mathbf {w}}\) at time 0. A negative robustness value indicates how far the formula is from being true. It follows from the definition that the robustness for the eventually modality is given by \( { [\![{\mathbf {w}}, \Diamond _{[a,b]} (x > 0) ]\!]} = {{\bigsqcup _{t \in [a,b]\cap [0,T]}}} {\mathbf {w}}(t)(x) \).
The above robustness notion taken from [12] is therefore spatial. Other robustness notions take temporal aspects into account, too, such as “how long before the deadline the required event occurs”. See e.g. [2, 12]. Our choice of spatial robustness in this paper is for the sake of simplicity, and is thus not essential.
The original semantics of \(\text {STL}\) is Boolean, given as usual by a binary relation \(\models \) between signals and formulas. The robust semantics refines the Boolean one in the following sense: \( [\![{\mathbf {w}},\varphi ]\!] > 0\) implies \({\mathbf {w}}\models \varphi \), and \( [\![{\mathbf {w}},\varphi ]\!] < 0\) implies \({\mathbf {w}}\nvDash \varphi \), see [17, Prop. 16]. Optimizationbased falsification via robust semantics hinges on this refinement.
2.2 Hill ClimbingGuided Falsification
As we discussed in the introduction, the falsification problem attracts growing industrial and academic attention. Its solution methodology by hillclimbing optimization is an established field, too: see [1, 3, 5, 9, 11, 12, 13, 15, 26, 28, 34, 38] and the tools Breach [11] and STaLiRo [5]. We formulate the problem and the methodology, for later use in describing our multiarmed banditbased algorithm.
Definition 5
(falsifying input). Let \({\mathcal {M}}\) be a system model, and \(\varphi \) be an STL formula. A signal \({\mathbf {u}}:[0,T]\rightarrow {\mathbb {R}}^{\mathbf {Var}}\) is a falsifying input if \({ [\![{\mathcal {M}}({\mathbf {u}}), \varphi ]\!]}<0\); the latter implies \({\mathcal {M}}({\mathbf {u}})\nvDash \varphi \).
The use of quantitative robust semantics \({ [\![{\mathcal {M}}({\mathbf {u}}), \varphi ]\!]}\in {\mathbb {R}}\cup \{\infty ,\infty \}\) in the above problem enables the use of hillclimbing optimization.
Definition 6
(hill climbingguided falsification). Assume the setting in Definition 5. For finding a falsifying input, the methodology of hill climbingguided falsification is presented in Algorithm 1.
Here the function \(\textsc {Hill}\hbox {}\textsc {Climb}\) makes a guess of an input signal \({\mathbf {u}}_{k}\), aiming at minimizing the robustness \({ [\![{\mathcal {M}}({\mathbf {u}}_{k}), \varphi ]\!]}\). It does so, learning from the previous observations \( \bigl (\, {\mathbf {u}}_{l}, \, { [\![{\mathcal {M}}({\mathbf {u}}_{l}), \varphi ]\!]} \,\bigr )_{l\in [1,k1]} \) of input signals \({\mathbf {u}}_{1},\cdots , {\mathbf {u}}_{k1}\) and their corresponding robustness values (cf. Table 1).
3 Our Multiarmed BanditBased Falsification Algorithm
In this section, we present our contribution, namely a falsification algorithm that addresses the scale problem in Boolean superposition (see Sect. 1). The main novelties in the algorithm are as follows.
 1.
(Use of MAB algorithms) For binary Boolean connectives, unlike most works in the field, we do not superpose the robustness values of the constituent formulas \(\varphi _{1}\) and \(\varphi _{2}\) using a fixed operator (such as \(\sqcap \) and \(\sqcup \) in (2)). Instead, we view the situation as an instance of the multiarmed bandit problem (MAB): we use an algorithm for MAB to choose one formula \(\varphi _{i}\) to focus on (here \(i\in \{1,2\}\)); and then we apply hill climbingguided falsification to the chosen formula \(\varphi _{i}\).
 2.
(Hillclimbing gain as rewards in MAB) For our integration of MAB and hillclimbing optimization, the technical challenge is find a suitable notion of reward for MAB. We introduce a novel notion that we call hillclimbing gain: it formulates the (downward) robustness gain that we would obtain by applying hillclimbing optimization, suitably normalized using the scale of previous robustness values.
Later, in Sect. 4, we demonstrate that combining those two features gives rise to falsification algorithms that successfully cope with the scale problem in Boolean superposition.
Our algorithms focus on a fragment of STL as target specifications. They are called (disjunctive and conjunctive) safety properties. In Sect. 3.1 we describe this fragment of STL, and introduce necessary adaptation of the semantics. After reviewing the MAB problem in Sect. 3.2, we present our algorithms in Sects. 3.3, 3.4.
3.1 Conjunctive and Disjunctive Safety Properties
Definition 7
(conjunctive/disjunctive safety property). An STL formula of the form \(\Box _{I}(\varphi _{1}\wedge \varphi _{2})\) is called a conjunctive safety property; an STL formula of the form \(\Box _{I}(\varphi _{1}\vee \varphi _{2})\) is called a disjunctive safety property.
It is known that, in industry practice, a majority of specifications is of the form \(\Box _{I}(\varphi _{1}\rightarrow \varphi _{2})\), where \(\varphi _{1}\) describes a trigger and \(\varphi _{2}\) describes a countermeasure that should follow. This property is equivalent to \(\Box _{I}(\lnot \varphi _{1}\vee \varphi _{2})\), and is therefore a disjunctive safety property.
In Sects. 3.3, 3.4, we present two falsification algorithms, for conjunctive and disjunctive safety properties respectively. For the reason we just discussed, we expect the disjunctive algorithm should be more important in realworld application scenarios. In fact, the disjunctive algorithm turns out to be more complicated, and it is best introduced as an extension of the conjunctive algorithm.
We define the restriction of robust semantics to a (sub)set of time instants. Note that we do not require \({\mathcal {S}}\subseteq [0,T]\) to be a single interval.
Definition 8
Obviously, \( [\![{\mathbf {w}}, \psi ]\!]_{\mathcal {S}} <0\) implies that there exists \(t\in {\mathcal {S}}\) such that \( [\![{\mathbf {w}}^{t}, \psi ]\!]_{\mathcal {S}} <0\). We derive the following easy lemma; it is used later in our algorithm.
Lemma 9
In the setting of Definition 8, consider a disjunctive safety property \(\varphi \equiv \Box _{I}(\varphi _{1}\vee \varphi _{2})\), and let Open image in new window . Then \([\![{\mathbf {w}},\varphi _{2} ]\!]_{\mathcal {S}}<0\) implies \([\![{\mathbf {w}},\Box _{I}(\varphi _{1}\vee \varphi _{2}) ]\!]<0\). \(\square \)
3.2 The MultiArmed Bandit (MAB) Problem

a gambler sits in front of a row \(A_{1},\cdots ,A_{n}\) of slot machines;

each slot machine \(A_{i}\) gives, when its arm is played (i.e. in each attempt), a reward according to a prescribed (but unknown) probability distribution \(\mu _{i}\);

and the goal is to maximize the cumulative reward after a number of attempts, playing a suitable arm in each attempt.
The best strategy of course is to keep playing the best arm \(A_{\max }\), i.e. the one whose average reward \(\mathsf {avg}(\mu _{\max })\) is the greatest. This best strategy is infeasible, however, since the distributions \(\mu _{1},\cdots ,\mu _{n}\) are initially unknown. Therefore the gambler must learn about \(\mu _{1},\cdots ,\mu _{n}\) through attempts.
The MAB problem exemplifies the “learning by trying” paradigm of reinforcement learning, and is thus heavily studied. The greatest challenge is to balance between exploration and exploitation. A greedy (i.e. exploitationonly) strategy will play the arm whose empirical average reward is the maximum. However, since the rewards are random, this way the gambler can miss another arm whose real performance is even better but which is yet to be found so. Therefore one needs to mix exploration, too, occasionally trying empirically nonoptimal arms, in order to identity their true performance.
A rigorous formulation of the MAB problem is presented for the record.
Definition 10
(the multiarmed bandit problem). The multiarmed bandit (MAB) problem is formulated as follows.
Input: arms \((A_1, \dots , A_n)\), the associated probability distributions \(\mu _{1},\cdots ,\mu _{n}\) over \({\mathbb {R}}\), and a time horizon \(H\in {\mathbb {N}}\cup \{\infty \}\).
Goal: synthesize a sequence \(A_{i_{1}}A_{i_{2}}\cdots A_{i_{H}}\), so that the cumulative reward \( \sum _{k=1}^{H}\mathsf {rew}_{k} \) is maximized. Here the reward \(\mathsf {rew}_{k}\) of the kth attempt is sampled from the distribution \(\mu _{i_{k}}\) associated with the arm \(A_{i_{k}}\) played at the kth attempt.
We introduce some notations for later use. Let \((A_{i_{1}}\cdots A_{i_{k}}, \mathsf {rew}_{1}\cdots \mathsf {rew}_{k})\) be a history, i.e. the sequence of arms played so far (here \(i_{1},\cdots , i_{k}\in [1,n]\)), and the sequence of rewards obtained by those attempts (\(\mathsf {rew}_{l}\) is sampled from \(\mu _{i_{l}}\)).
For an arm \(A_{j}\), its visit count \(N(j,A_{i_{1}}A_{i_{2}}\cdots A_{i_{k}}, \mathsf {rew}_{1}\mathsf {rew}_{2}\cdots \mathsf {rew}_{k})\) is given by the number of occurrences of \(A_{j}\) in \(A_{i_{1}}A_{i_{2}}\cdots A_{i_{k}}\). Its empirical average reward \(R(j,A_{i_{1}}A_{i_{2}}\cdots A_{i_{k}}, \mathsf {rew}_{1}\mathsf {rew}_{2}\cdots \mathsf {rew}_{k})\) is given by \( \sum _{l\in \{l\in [1,k]\mid i_{l}=j\}} \mathsf {rew}_{l} \), i.e. the average return of the arm \(A_{j}\) in the history. When the history is obvious from the context, we simply write N(j, k) and R(j, k).
MAB Algorithms. There have been a number of algorithms proposed for the MAB problem; each of them gives a strategy (also called a policy) that tells which arm to play, based on the previous attempts and their rewards. The focus here is how to resolve the explorationexploitation tradeoff. Here we review two wellknown algorithms.
The \(\varepsilon \)Greedy Algorithm. This is a simple algorithm that spares a small fraction \(\varepsilon \) of chances for empirically nonoptimal arms. The spared probability \(\varepsilon \) is uniformly distributed. See Algorithm 2.
The UCB1 Algorithm. The UCB1 (upper confidence bound) algorithm is more complex; it comes with a theoretical upper bound for regrets, i.e. the gap between the expected cumulative reward and the optimal (but infeasible) cumulative reward (i.e. the result of keep playing the optimal arm \(A_{\max }\)). It is known that the UCB1 algorithm’s regret is at most \(O(\sqrt{nH\log {H}})\) after H attempts, improving the naive random strategy (which has the expected regret O(H)).
3.3 Our MABGuided Algorithm I: Conjunctive Safety Properties
Our first algorithm targets at conjunctive safety properties. It is based on our identification of MAB in a Boolean conjunction in falsification—this is as we discussed just above Definition 10. The technical novelty lies in the way we combine MAB algorithms and hillclimbing optimization; specifically, we introduce the notion of hillclimbing gain as a reward notion in MAB (Definition 11). This first algorithm paves the way to the one for disjunctive safety properties, too (Sect. 3.4).
The algorithm is in Algorithm 4. Some remarks are in order.
Algorithm 4 aims to falsify a conjunctive safety property \(\varphi \equiv \Box _{I}(\varphi _{1}\wedge \varphi _{2})\). Its overall structure is to interleave two sequences of falsification attempts, both of which are hill climbingguided. These two sequences of attempts aim to falsify \(\Box _{I}\varphi _{1}\) and \(\Box _{I}\varphi _{2}\), respectively. Note that \( [\![{\mathcal {M}}({\mathbf {u}}),\varphi ]\!]\le [\![{\mathcal {M}}({\mathbf {u}}),\Box _{I}\varphi _{1} ]\!]\), therefore falsification of \(\Box _{I}\varphi _{1}\) implies falsification of \(\varphi \); the same holds for \(\Box _{I}\varphi _{2}\), too.
In Line 5 we run an MAB algorithm to decide which of \(\Box _{I}\varphi _{1}\) and \(\Box _{I}\varphi _{2}\) to target at in the kth attempt. The function \(\textsc {MAB}\) takes the following as its arguments: (1) the list of arms, given by the formulas \(\varphi _{1},\varphi _{2}\); (2) their rewards \({\mathcal {R}}(\varphi _{1}), {\mathcal {R}}(\varphi _{2})\); (3) the history \(\varphi _{i_{1}}\cdots \varphi _{i_{k1}}\) of previously played arms (\(i_{l}\in \{1,2\}\)); and 4) the history \(\mathsf {rew}_{1}\cdots \mathsf {rew}_{k1}\) of previously observed rewards. This way, the type of the \(\textsc {MAB}\) function in Line 5 matches the format in Definition 10, and thus the function can be instantiated with any MAB algorithm such as Algorithms 2–3.
The only missing piece is the definition of the rewards \({\mathcal {R}}(\varphi _{1}), {\mathcal {R}}(\varphi _{2})\). We introduce the following notion, tailored for combining MAB and hill climbing.
Definition 11
Since we try to minimize the robustness values \(\mathsf {rb}_{l}\) through falsification attempts, we can expect that \(\mathsf {rb}_{l}\) for a fixed arm \(\varphi _{i}\) decreases over time. (In the case of the hillclimbing algorithm CMAES that we use, this is in fact guaranteed). Therefore the value \( \mathsf {max\text {}rb}({i},k1)\) in the definition of \({\mathcal {R}}(\varphi _{i})\) is the first observed robustness value. The numerator \( \mathsf {max\text {}rb}({i},k1)  \mathsf {last\text {}rb}({i},k1)\) then represents how much robustness we have reduced so far by hill climbing—hence the name “hillclimbing gain.” The denominator \(\mathsf {max\text {}rb}({i},k1)\) is there for normalization.
In Algorithm 4, the value \(\mathsf {rb}_{k}\) is given by the robustness \( [\![{\mathcal {M}}({\mathbf {u}}_{k}), \Box _{I}\varphi _{i_{k}} ]\!]\). Therefore the MAB choice in Line 5 essentially picks \(i_{k}\) for which hill climbing yields greater effect (but also taking exploration into account—see Sect. 3.2).
In Line 6 we conduct hillclimbing optimization—see Sect. 2.2. The function \(\textsc {Hill}\hbox {}\textsc {Climb}\) learns from the previous attempts \({\mathbf {u}}_{l_{1}},\cdots , {\mathbf {u}}_{l_{m}}\) regarding the same formula \(\varphi _{i_{k}}\), and their resulting robustness values \(\mathsf {rb}_{l_{1}},\cdots ,\mathsf {rb}_{l_{m}}\). Then it suggests the next input signal \({\mathbf {u}}_{k}\) that is likely to minimize the (unknown) function that underlies the correspondences \(\bigl [\,{\mathbf {u}}_{l_{j}}\mapsto \mathsf {rb}_{l_{j}}\,\bigr ]_{j\in [1,m]}\).
Lines 6–8 read as follows: the hillclimbing algorithm suggests a single input \({\mathbf {u}}_{k}\), which is then selected or rejected (Line 8) based on the robustness value it yields (Line 7). We note that this is a simplified picture: in our implementation that uses CMAES (it is an evolutionary algorithm), we maintain a population of some ten particles, and each of them is moved multiple times (our choice is three times) before the best one is chosen as \({\mathbf {u}}_{k}\).
3.4 Our MABGuided Algorithm II: Disjunctive Safety Properties
The other main algorithm of ours aims to falsify a disjunctive safety property \(\varphi \equiv \Box _{I}(\varphi _{1}\vee \varphi _{2})\). We believe this problem setting is even more important than the conjunctive case, since it encompasses conditional safety properties (i.e. of the form \(\Box _{I}(\varphi _{1}\rightarrow \varphi _{2})\)). See Sect. 3.1 for discussions.
In the disjunctive setting, the challenge is that falsification of \(\Box _{I}\varphi _{i}\) (with \(i \in \{1,2\}\)) does not necessarily imply falsification of \(\Box _{I}(\varphi _{1}\vee \varphi _{2})\). This is unlike the conjunctive setting. Therefore we need some adaptation of Algorithm 4, so that the two interleaved sequences of falsification attempts for \(\varphi _{1}\) and \(\varphi _{2}\) are not totally independent of each other. Our solution consists of restricting time instants to those where \(\varphi _{2}\) is false, in a falsification attempt for \(\varphi _{1}\) (and vice versa), in the way described in Definition 8.
Algorithm 5 shows our MABguided algorithm for falsifying a disjunctive safety property \( \Box _{I}(\varphi _{1}\vee \varphi _{2})\). The only visible difference is that Line 7 in Algorithm 4 is replaced with Line 7’. The new Line 7’ measures the quality of the suggested input signal \({\mathbf {u}}_{k}\) in the way restricted to the region \({\mathcal {S}}_{k}\) in which the other formula is already falsified. Lemma 9 guarantees that, if \(\mathsf {rb}_{k}<0\), then indeed the input signal \({\mathbf {u}}_{k}\) falsifies the original specification \( \Box _{I}(\varphi _{1}\vee \varphi _{2})\).
The assumption that makes Algorithm 5 sensible is that, although it can be hard to find a time instant at which both \(\varphi _{1}\) and \(\varphi _{2}\) are false (this is required in falsifying \( \Box _{I}(\varphi _{1}\vee \varphi _{2})\)), falsifying \(\varphi _{1}\) (or \(\varphi _{2}\)) individually is not hard. Without this assumption, the region \({\mathcal {S}}_{k}\) in Line 7’ would be empty most of the time. Our experiments in Sect. 4 demonstrate that this assumption is valid in many problem instances, and that Algorithm 5 is effective.
4 Experimental Evaluation
We name MABUCB and MAB\({\epsilon }\)greedy the two versions of MAB algorithm using strategies \(\varepsilon \)Greedy (see Algorithm 2) and UCB1 (see Algorithm 3). We compared the proposed approach (both versions MABUCB and MAB\({\epsilon }\)greedy) with a stateoftheart falsification framework, namely Breach [11]. Breach encapsulates several hillclimbing optimization algorithms, including CMAES (covariance matrix adaptation evolution strategy) [6], SA (simulated annealing), GNM (global NelderMead) [30], etc. According to our experience, CMAES outperforms other hillclimbing solvers in Breach, so the experiments for both Breach and our approach rely on the CMAES solver.
Experiments have been executed using Breach 1.2.13 on an Amazon EC2 c4.large instance, 2.9 GHz Intel Xeon E52666, 2 virtual CPU cores, 4 GB RAM.
Benchmark sets Bbench and Sbench
We built the benchmark set Bbench, as shown in Table 2a that reports the name of the model and its specifications (ID and formula). In total, we found 11 specifications. In order to increase the benchmark set and obtain specifications of different complexity, we artificially modified a constant (turned into a parameter named \(\tau \) if it is contained in a time interval, named \(\rho \) otherwise) of the specification: for each specification S, we generated m different versions, named as \(S_i\) with \(i \in \{1, \ldots , m\}\); the complexity of the specification (in terms of difficulty to falsify it) increases with increasing i.^{2} In total, we produced 60 specifications. Column parameter in the table shows which concrete values we used for the parameters \(\rho \) and \(\tau \). Note that all the specifications but one are disjunctive safety properties (i.e., \(\Box _{I}(\varphi _{1}\vee \varphi _{2})\)), as they are the most difficult case and they are the main target of our approach; we just add AT5 as example of conjunctive safety property (i.e., \(\Box _{I}(\varphi _{1}\wedge \varphi _{2})\)).
Our approach has been proposed with the aim of tackling the scale problem. Therefore, to better show how our approach mitigates this problem, we generated a second benchmark set Sbench as follows. We selected 15 specifications from Bbench (with concrete values for the parameters) and, for each specification S, we changed the corresponding Simulink model by multiplying one of its outputs by a factor \(10^k\), with \(k \in \{2,0,1,2,3\}\) (note that we also include the original one using scale factor \(10^0\)); the specification has been modified accordingly, by multiplying with the scale factor the constants that are compared with the scaled output. We name a specification S scaled with factor \(10^k\) as \(S^k\). Table 2b reports the IDs of the original specifications, the output that has been scaled, and the used scaled factors; in total, the benchmark set Sbench contains 60 specifications.
Aggregated results for benchmark sets Bbench and Sbench (SR: # successes out 30 trials. Time in secs. \(\varDelta \): percentage difference w.r.t. Breach). Outperformance cases are highlighted, indicated by positive \(\varDelta \) of SR, and negative \(\varDelta \) of time.
For benchmark set Bbench, it reports aggregated results for each group of specifications obtained from S (i.e., all the different versions \(S_i\) obtained by changing the value of the parameter); for benchmark set Sbench, instead, results are aggregated for each scaled specification \(S^k\) (considering the versions \(S^k_i\) obtained by changing the parameter value). We report minimum, maximum and average number of successes SR, and time in seconds. For MAB\({\epsilon }\)greedy and MABUCB, both for SR and time, we also report the average percentage difference^{4} (\(\varDelta \)) w.r.t. to the corresponding value of Breach.
Comparison. In the following, we compare two approaches \(A_1, A_2\) \(\in \) {Breach, MAB\({\epsilon }\)greedy, MABUCB} by comparing the number of their successes SR and average execution time using the nonparametric Wilcoxon signedrank test with 5% level of significance^{5} [35]; the null hypothesis is that there is no difference in applying \(A_1\) \(A_2\) in terms of the compared measure (SR or time).
4.1 Evaluation
We evaluate the proposed approach with some research questions.
RQ1 Which is the best MAB algorithm for our purpose?
In Sect. 3.2, we described that the proposed approach can be executed using two different strategies for choosing the arm in the MAB problem, namely MAB\({\epsilon }\)greedy and MABUCB. We here assess which one is better in terms of SR and time. From the results in Table 3, it seems that MABUCB provides slightly better performance in terms of SR; this has been confirmed by the Wilcoxon test applied over all the experiments (i.e., on the nonaggregated data reported in Appendix A in the extended version [37]): the null hypothesis that using anyone of the two strategies has no impact on SR is rejected with pvalue equal to 0.005089, and the alternative hypothesis that SR is better is accepted with pvalue = 0.9975; in a similar way, the null hypothesis that there is no difference in terms of time is rejected with pvalue equal to 3.495e−06, and the alternative hypothesis that is MABUCB is faster is accepted with pvalue = 1. Therefore, in the following RQs, we compare Breach with only the MABUCB version of our approach.
RQ2 Does the proposed approach effectively solve the scale problem?
Experimental results – Sbench (SR: # successes out of 30 trials. Time in secs)
RQ3 How does the proposed process behave with not scaled benchmarks?
In RQ2, we checked whether the proposed approach is able to tackle the scale problem for which it has been designed. Here, instead, we are interested in investigating how it behaves on specifications that have not been artificially scaled (i.e., those in Bbench). From Table 3 (upper part), we observe that MABUCB is always better than Breach both in terms of SR and time, which is shown by the highlighted cases. This is confirmed by Wilcoxon test over SR and time: null hypotheses are rejected with pvalues equal to, respectively, 6.02e−08 and 1.41e−08, and the alternative hypotheses that MABUCB is better are both accepted with pvalue = 1. This means that the proposed approach can also handle specifications that do not suffer from the scale problem, and so it can be used with any kind of specification.
RQ4 Is the proposed approach more effective than an approach based on rescaling?
A naïve solution to the scale problem could be to rescale the signals used in specification at the same scale. Thanks to the results of RQ2, we can compare to this possible baseline approach, using the scaled benchmark set Sbench. For example, AT5 suffers from the scale problem as \( speed \) is one order of magnitude less than \( rpm \). However, from Table 3, we observe that the scaling that would be done by the baseline approach (i.e., running Breach over \(\hbox {AT5}^1\)) is not effective, as SR is 0.4/30, that is much lower than the original SR 14.1/30 of the unscaled approach using Breach. Our approach, instead, raises SR to 28.4/30 and to 27.6/30 using the two proposed versions. By monitoring Breach execution, we notice that the naïve approach fails because it tries to falsify \( rpm < 4780\), which, however, is not falsifiable; our approach, instead, understands that it must try to falsify \( speed < \rho \). More details are given in the extended version [37].
5 Conclusion and Future Work
In this paper, we propose a solution to the scale problem that affects falsification of specifications containing Boolean connectives. The approach combines multiarmed bandit algorithms with hill climbingguided falsification. Experiments show that the approach is robust under the change of scales, and it outperforms a stateoftheart falsification tool. The approach currently handles binary specifications. As future work, we plan to generalize it to complex specifications having more than two Boolean connectives.
Footnotes
 1.
Code obtained at https://github.com/decyphir/breach.
 2.
Note that we performed this classification based on the falsification results of Breach.
 3.
The code, models, and specifications are available online at https://github.com/ERATOMMSD/FalStarMAB.
 4.
\(\varDelta \) = \({((mb)*100)}\big /{(0.5*(m+b))}\) where m is the result of MAB and b the one of Breach.
 5.
We checked that the distributions are not normal with the nonparametric ShapiroWilk test.
References
 1.Adimoolam, A., Dang, T., Donzé, A., Kapinski, J., Jin, X.: Classification and coveragebased falsification for embedded control systems. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 483–503. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_24CrossRefGoogle Scholar
 2.Akazaki, T., Hasuo, I.: Time robustness in MTL and expressivity in hybrid system falsification. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 356–374. Springer, Cham (2015). https://doi.org/10.1007/9783319216683_21CrossRefGoogle Scholar
 3.Akazaki, T., Kumazawa, Y., Hasuo, I.: Causalityaided falsification. In: Proceedings First Workshop on Formal Verification of Autonomous Vehicles, FVAV@iFM 2017. EPTCS, Turin, Italy, 19th September 2017, vol. 257, pp. 3–18 (2017)CrossRefGoogle Scholar
 4.Akazaki, T., Liu, S., Yamagata, Y., Duan, Y., Hao, J.: Falsification of cyberphysical systems using deep reinforcement learning. In: Havelund, K., Peleska, J., Roscoe, B., de Vink, E. (eds.) FM 2018. LNCS, vol. 10951, pp. 456–465. Springer, Cham (2018). https://doi.org/10.1007/9783319955827_27CrossRefGoogle Scholar
 5.Annpureddy, Y., Liu, C., Fainekos, G., Sankaranarayanan, S.: STaLiRo: a tool for temporal logic falsification for hybrid systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 254–257. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642198359_21CrossRefzbMATHGoogle Scholar
 6.Auger, A., Hansen, N.: A restart CMA evolution strategy with increasing population size. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, pp. 1769–1776. IEEE (2005)Google Scholar
 7.Beale, M.H., Hagan, M.T., Demuth, H.B.: Neural Network Toolbox\(^{{\rm TM}}\) User’s Guide. The Mathworks Inc., Natick (1992)Google Scholar
 8.Chen, X., Ábrahám, E., Sankaranarayanan, S.: Flow*: an analyzer for nonlinear hybrid systems. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 258–263. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_18CrossRefGoogle Scholar
 9.Deshmukh, J., Jin, X., Kapinski, J., Maler, O.: Stochastic local search for falsification of hybrid systems. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 500–517. Springer, Cham (2015). https://doi.org/10.1007/9783319249537_35CrossRefzbMATHGoogle Scholar
 10.Dokhanchi, A., Yaghoubi, S., Hoxha, B., Fainekos, G.E.: Vacuity aware falsification for MTL requestresponse specifications. In: 13th IEEE Conference on Automation Science and Engineering, CASE 2017, Xi’an, China, 20–23 August 2017, pp. 1332–1337. IEEE (2017)Google Scholar
 11.Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142956_17CrossRefGoogle Scholar
 12.Donzé, A., Maler, O.: Robust satisfaction of temporal logic over realvalued signals. In: Chatterjee, K., Henzinger, T.A. (eds.) FORMATS 2010. LNCS, vol. 6246, pp. 92–106. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642152979_9CrossRefzbMATHGoogle Scholar
 13.Dreossi, T., Dang, T., Donzé, A., Kapinski, J., Jin, X., Deshmukh, J.V.: Efficient guiding strategies for testing of temporal properties of hybrid systems. In: Havelund, K., Holzmann, G., Joshi, R. (eds.) NFM 2015. LNCS, vol. 9058, pp. 127–142. Springer, Cham (2015). https://doi.org/10.1007/9783319175249_10CrossRefGoogle Scholar
 14.Dreossi, T., Dang, T., Piazza, C.: Parallelotope bundles for polynomial reachability. In: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control, HSCC 2016, pp. 297–306. ACM, New York (2016)Google Scholar
 15.Dreossi, T., Donzé, A., Seshia, S.A.: Compositional falsification of cyberphysical systems with machine learning components. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 357–372. Springer, Cham (2017). https://doi.org/10.1007/9783319572888_26CrossRefGoogle Scholar
 16.Ernst, G., et al.: ARCHCOMP 2019 category report: Falsification. In: Frehse, G., Althoff, M. (eds.) 6th International Workshop on Applied Verification of Continuous and Hybrid Systems, ARCH19. EPiC Series in Computing, vol. 61 pp. 129–140. EasyChair (2019)Google Scholar
 17.Fainekos, G.E., Pappas, G.J.: Robustness of temporal logic specifications for continuoustime signals. Theor. Comput. Sci. 410(42), 4262–4291 (2009)MathSciNetCrossRefGoogle Scholar
 18.Fan, C., Qi, B., Mitra, S., Viswanathan, M., Duggirala, P.S.: Automatic reachability analysis for nonlinear hybrid models with C2E2. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 531–538. Springer, Cham (2016). https://doi.org/10.1007/9783319415284_29CrossRefGoogle Scholar
 19.Ferrère, T., Nickovic, D., Donzé, A., Ito, H., Kapinski, J.: Interfaceaware signal temporal logic. In: Ozay, N., Prabhakar, P. (eds.) Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, HSCC 2019, Montreal, QC, Canada, 16–18 April 2019, pp. 57–66. ACM (2019)Google Scholar
 20.Frehse, G., et al.: SpaceEx: scalable verification of hybrid systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 379–395. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642221101_30CrossRefGoogle Scholar
 21.Fu, Z., Su, Z.: XSat: a fast floatingpoint satisfiability solver. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016, Part II. LNCS, vol. 9780, pp. 187–209. Springer, Cham (2016). https://doi.org/10.1007/9783319415406_11CrossRefGoogle Scholar
 22.Gao, S., Avigad, J., Clarke, E.M.: \({\delta }\)complete decision procedures for satisfiability over the reals. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 286–300. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642313653_23CrossRefGoogle Scholar
 23.Hasuo, I., Suenaga, K.: Exercises in nonstandard static analysis of hybrid systems. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 462–478. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642314247_34CrossRefGoogle Scholar
 24.Hoxha, B., Abbas, H., Fainekos, G.E.: Benchmarks for temporal logic requirements for automotive systems. In: Frehse, G., Althoff, M. (eds.) 1st and 2nd International Workshop on Applied veRification for Continuous and Hybrid Systems, ARCH@CPSWeek 2014, Berlin, Germany, 14 April 2014/ARCH@CPSWeek 2015, Seattle, USA, 13 April 2015. EPiC Series in Computing, vol. 34, pp. 25–30. EasyChair (2014)Google Scholar
 25.Jin, X., Deshmukh, J.V., Kapinski, J., Ueda, K., Butts, K.: Powertrain control verification benchmark. In: Proceedings of the 17th International Conference on Hybrid Systems: Computation and Control, HSCC 2014, pp. 253–262. ACM, New York (2014)Google Scholar
 26.Kapinski, J., Deshmukh, J.V., Jin, X., Ito, H., Butts, K.: Simulationbased approaches for verification of embedded control systems: An overview of traditional and advanced modeling, testing, and verification techniques. IEEE Control. Syst. 36(6), 45–64 (2016)MathSciNetCrossRefGoogle Scholar
 27.Kato, K., Ishikawa, F., Honiden, S.: Falsification of cyberphysical systems with reinforcement learning. In: 3rd Workshop on Monitoring and Testing of CyberPhysical Systems, MT@CPSWeek 2018, Porto, Portugal, 10 April 2018, pp. 5–6. IEEE (2018)Google Scholar
 28.Kuřátko, J., Ratschan, S.: Combined global and local search for the falsification of hybrid systems. In: Legay, A., Bozga, M. (eds.) FORMATS 2014. LNCS, vol. 8711, pp. 146–160. Springer, Cham (2014). https://doi.org/10.1007/9783319105123_11CrossRefzbMATHGoogle Scholar
 29.Liebrenz, T., Herber, P., Glesner, S.: Deductive verification of hybrid control systems modeled in simulink with KeYmaera X. In: Sun, J., Sun, M. (eds.) ICFEM 2018. LNCS, vol. 11232, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/9783030024505_6CrossRefGoogle Scholar
 30.Luersen, M.A., Le Riche, R.: Globalized NelderMead method for engineering optimization. Comput. Struct. 82(23), 2251–2260 (2004)CrossRefGoogle Scholar
 31.Nguyen, L.V., Kapinski, J., Jin, X., Deshmukh, J.V., Butts, K., Johnson, T.T.: Abnormal data classification using timefrequency temporal logic. In: Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control, HSCC 2017, pp. 237–242. ACM, New York (2017)Google Scholar
 32.Platzer, A.: Logical Foundations of CyberPhysical Systems. Springer, Cham (2018). https://doi.org/10.1007/9783319635880CrossRefzbMATHGoogle Scholar
 33.Seshia, S.A., Rakhlin, A.: Quantitative analysis of systems using gametheoretic learning. ACM Trans. Embed. Comput. Syst. 11(S2), 55:1–55:27 (2012)CrossRefGoogle Scholar
 34.Silvetti, S., Policriti, A., Bortolussi, L.: An active learning approach to the falsification of black box cyberphysical systems. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/9783319668451_1CrossRefGoogle Scholar
 35.Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642290442CrossRefzbMATHGoogle Scholar
 36.Zhang, Z., Ernst, G., Sedwards, S., Arcaini, P., Hasuo, I.: Twolayered falsification of hybrid systems guided by monte carlo tree search. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11), 2894–2905 (2018)CrossRefGoogle Scholar
 37.Zhang, Z., Hasuo, I., Arcaini, P.: MultiArmed Bandits for Boolean Connectives in Hybrid System Falsification (Extended Version). CoRR, arXiv:1905.07549 (2019)
 38.Zutshi, A., Deshmukh, J.V., Sankaranarayanan, S., Kapinski, J.: Multiple shooting, CEGARbased falsification for hybrid systems. In: 2014 International Conference on Embedded Software, EMSOFT 2014, New Delhi, India, 12–17 October 2014, pp. 5:1–5:10. ACM (2014)Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.