figure a
figure b

1 Introduction

The behavioural equivalence bisimilarity, due to Milner [41] and Park [44], is one of the cornerstones of concurrency theory. It captures which states of a labelled transition system, a simple yet widely used model of concurrent systems, behave the same. Hennessy and Milner [29] provided a logical characterization of bisimilarity by introducing a logic, known as Hennessy-Milner logic, and proving that states are bisimilar if and only if they satisfy the same formulas of the logic. If the labelled transition system has finitely many states then for two states that are not bisimilar there exists a formula, often referred to as a distinguishing formula, such that one state satisfies the formula whereas the other state does not. This formula explains why the two states are not bisimilar. Cleaveland [12] presented a polynomial time algorithm that computes a distinguishing formula for states that are not bisimilar. Consider the following labelled transition system.

figure c

The states s and t are not bisimilar. This can be explained by a formula that expresses that a state can transition to a state that can subsequently transition to a purple (square) state as well as a green (hexagon) state. State s satisfies this formula but state t does not.

To model randomness in systems, labelled Markov chains are often used. Larsen and Skou [39] introduced probabilistic bisimilarity to capture which states of a labelled Markov chain behave the same. They also introduced a logic that characterizes probabilistic bisimilarity. Desharnais, Edalat, and Panangaden [19] simplified that logic and presented a polynomial time algorithm that produces a formula that distinguishes two states which are not probabilistic bisimilar. Consider the following labelled Markov chain.

figure d

The states s and t are not probabilistic bisimilar. State t can transition with more than probability \(\frac{1}{2}\) to a green state that can transition to a purple state, whereas state s cannot. This property can be expressed in the logic, giving rise to a formula that distinguishes the states s and t.

Giacalone, Jou, and Smolka [27] observed that probabilistic bisimilarity is not robust. Miniscule changes to the probabilities may alter which states are probabilistic bisimilar. Instead of an equivalence relation, they suggested exploiting a pseudometric to capture the behavioural similarity of states. That is, each pair of states is assigned a distance, a real number in the interval [0, 1], which measures how similar the states behave. The smaller the distance, the more alike the states behave. Distance zero captures that the states are behaviourally equivalent.

Desharnais, Gupta, Jagadeesan, and Panangaden [20] presented such a pseudometric. They showed that distance zero captures probabilistic bisimilarity. Therefore, those distances are known as probabilistic bisimilarity distances. These distances can be computed in polynomial time, as has been shown by Chen et al. [11]. Tang [48] developed and implemented algorithms that can compute the probabilistic bisimilarity distances for labelled Markov chains with thousands of states within seconds. The states s and t in the above labelled Markov chain have distance 0.125. How does one explain that 0.125 captures the similarity of their behaviour? That is the main question that we address in this paper.

To define their probabilistic bisimilarity distances, Desharnais et al. introduce a logic. The labelled Markov chains that they consider differ slightly from the ones we study in this paper: they label transitions whereas we label states (by colours/shapes), and where we require that the probabilities of the outgoing transitions of a state add up to one, they allow them to sum to less than one as well. State-labelled Markov chains have become the norm in probabilistic model checking. Probabilistic model checkers such as PRISM [38] and Storm [14] consider state-labelled Markov chains. Since each transition-labelled Markov chain can be encoded as a state-labelled one [46], this difference does not substantially impact any of the results. If the probabilities do not sum to one, one can add an additional state and transition to that state with the remaining probability. Also this difference does not significantly change the results. Adjusted to our setting, slightly simplified, and using a different syntax, the logic can be captured by the following grammar:

figure e

where a is a label of a state and q is a rational in the interval [0, 1]. This logic characterizes the probabilistic bisimilarity distances (see, for example, [6, 20]). Roughly speaking, the distance of two states is determined by a formula of the logic that distinguishes them the most. Such a formula explains their probabilistic bisimilarity distance. Consider, for example, the states s and t in the above labelled Markov chain. As we already mentioned, their distance is 0.125. This distance can be explained by the formula . This formula captures the probability of reaching a green state in one transition and subsequently reaching a purple state after the second transition. For state s that probability is 0.5 and it is 0.625 for state t. Note that the \(\mathord {\bigcirc }\) operator is similar to the next operator of linear temporal logic. Roughly, the interpretation of the formula \(\mathord {\bigcirc }\varphi \) in state s is the probability that \(\varphi \) holds in the successors of s.

As is common, we provide the above logic with a real-valued interpretation. For a formula of the logic, its interpretation maps each state of the labelled Markov chain to a real value in the interval [0, 1]. For example, for the formula  , its interpretation in state s is denoted by and has the value 0.5. The value of is 0.625. Their difference, which is 0.125, is the distance of the states s and t. The distinguishing formula for the states s and t is fairly simple. As we will discuss next, we need all the operators of the logic to explain the probabilistic bisimilarity distances and a single formula may not suffice.

1.1 Main Results

As we will show, the above logic is a minimal logic that characterizes the probabilistic bisimilarity distances. That is, if we remove any operator from the logic then the resulting logic does not characterize the probabilistic bisimilarity distances anymore. Furthermore, we will demonstrate that there exist finite labelled Markov chains for which the distances of some states cannot be explained by a single formula. However, as we will prove, we can explain the probabilistic bisimilarity distances by means of a sequence of formulas. Given two states, say u and v, we will construct a sequence of formulas \(\varphi _{uv}^0\), \(\varphi _{uv}^1\), \(\varphi _{uv}^2\), ... such that the sequence \(\llbracket {\varphi _{uv}^0}\rrbracket \!(u) - \!\llbracket {\varphi _{uv}^0}\rrbracket \!(v)\), \(\llbracket {\varphi _{uv}^1}\rrbracket \!(u) - \!\llbracket {\varphi _{uv}^1}\rrbracket \!(v)\), \(\llbracket {\varphi _{uv}^2}\rrbracket \!(u) - \!\llbracket {\varphi _{uv}^2}\rrbracket \!(v)\), ... converges to the probabilistic bisimilarity distance of u and v. We will also present an algorithm that computes those formulas and we will show that each formula can be computed in polynomial time.

1.2 Related Work

In addition to the references to the literature mentioned above, next we will discuss some other related work. Many of the behavioural equivalences have been characterized logically. For example, Feng and Zhang [25] provide a logical characterization of probabilistic bisimilarity for probabilistic automata. Bernardo and Miculan [4] present an algorithm that builds a distinguishing formula for states of a probabilistic automaton that are not probabilistic bisimilar. König, Mika-Michalski, and Schröder [37] propose a general method to construct a distinguishing formula for a variety of systems, including probabilistic automata.

Behavioural pseudometrics have been introduced for a large variety of systems that model randomness. For example, Ferns, Panangaden, and Precup [26] study probabilistic bisimilarity distances for Markov decision processes, Deng, Chothia, Palamidessi, and Pang [15] introduce them for probabilistic automata, and De Alfaro, Majumdar, Raman, and Stoelinga [1] present them for games.

Also many behavioural pseudometrics have been characterized logically. For example, Desharnais, Laviolette, and Tracol [23] present a logical characterization of \(\varepsilon \)-bisimilarity, a notion closely related to distances, for probabilistic automata. Du, Deng, and Gebler [24] logically characterize probabilistic bisimilarity distances for probabilistic automata. Pantelic and Lawford [43] provide a logical characterization of a behavioural pseudometric for probabilistic discrete event structures. Komorida et al. [35], König and Mika-Michalski [36], Wild and Schröder [51], as well as Wißmann, Milius, and Schröder [52], present general frameworks to obtain logical characterizations of behavioural pseudometrics.

Whereas many logics for systems with randomness have a real-valued interpretation, Castiglione, Gebler, and Tini [9, 10] introduce a logic for probabilistic automata with a boolean-valued interpretation. Their logic contains an operator with which we can express properties such as “a state can transition with probability a half to a purple state and with probability a half to a green state.” It is this operator that allows them to define a mimicking formula of a state. As the name suggests, this formula mimics the behaviour of the state. Furthermore, they endow the formulas with a pseudometric and show that the probabilistic bisimilarity distance of two states is the distance of their mimicking formulas. Hence, the distance of two states can be explained by means of the mimicking formulas of those states.

2 Labelled Markov Chains and Probabilistic Bisimilarity Distances

In this section, we introduce several key notions that play a central role in the remainder of the paper. We define the model of interest, namely a labelled Markov chain. Furthermore, we introduce probabilistic bisimilarity, an equivalence relation that captures which states of a labelled Markov chain behave the same, and probabilistic bisimilarity distances, which measure the similarity of behaviour of those states.

First, we recall some notions from probability theory. Given a finite set X, a function \(\mu : X \rightarrow [0, 1]\) is a probability distribution on X if \(\sum _{x \in X} \mu (x) = 1\). We denote the set of probability distributions on X by \(\mathcal {D}_{\mathbb {R}}(X)\). For \(\mu \in \mathcal {D}_{\mathbb {R}}(X)\) and \(A \subseteq X\), we often write \(\mu (A)\) for \(\sum _{x \in A} \mu (x)\). Similarly, for \(\omega \in \mathcal {D}_{\mathbb {R}}(X \times X)\), \(a \in X\), and \(A \subseteq X\), we usually write \(\omega (a, A)\) for \(\sum _{x \in A} \omega (a, x)\). For \(\mu \in \mathcal {D}_{\mathbb {R}}(X)\), we define the support of \(\mu \) by . A probability distribution \(\mu \in \mathcal {D}_{\mathbb {R}}(X)\) is rational if \(\mu (x) \in \mathbb {Q}\) for all \(x \in X\). We denote the set of rational probability distributions on X by \(\mathcal {D}_{\mathbb {Q}}(X)\). Obviously, \(\mathcal {D}_{\mathbb {Q}}\subseteq \mathcal {D}_{\mathbb {R}}\).

Definition 1

A labelled Markov chain is a tuple \(<S, L, \tau , \ell>\) consisting of

  • a finite set S of states,

  • a finite set L of labels,

  • a transition probability function \(\tau : S \rightarrow \mathcal {D}_{\mathbb {Q}}(S)\), and

  • a labelling function \(\ell : S \rightarrow L\).

We restrict the transition probabilities to rationals as we will compute with them in Section 6 and 7. For the remainder, we fix a labelled Markov chain \(<S, L, \tau , \ell>\). We define probabilistic bisimlarity by means of the set \(\varOmega _{\mathbb {R}}(\mu , \nu )\) which is known as the transportation polytope [33] of the probability distributions \(\mu \) and \(\nu \).

Definition 2

For all \(\mu \), \(\nu \in \mathcal {D}_{\mathbb {R}}(S)\), the set \(\varOmega _{\mathbb {R}}(\mu , \nu )\) is defined by

$$ \varOmega _{\mathbb {R}}(\mu , \nu ) = \{\, \omega \in \mathcal {D}_{\mathbb {R}}(S \times S) \mid \forall s \in S : \omega (s, S) = \mu (s) \wedge \omega (S, s) = \nu (s) \,\}. $$

Definition 3

A relation \(R \subseteq S \times S\) is a probabilistic bisimulation if for all \((s, t) \in R\), \(\ell (s) = \ell (t)\) and there exists \(\omega \in \varOmega _{\mathbb {R}}(\tau (s), \tau (t))\) with \(\textrm{support}(\omega ) \subseteq R\). States s and t are probabilistic bisimilar, denoted \(s \sim t\), if \((s, t) \in R\) for some probabilistic bisimulation R.

To define the probabilistic bisimilarity distances, it is convenient to partition the set of state pairs into the following three sets.

Definition 4

The sets \(S^2_0\), \(S^2_1\) and \(S^2_?\) are defined by

$$\begin{aligned} S^2_0&= \{\, (s, t) \in S \times S \mid s \sim t \,\}\\ S^2_1&= \{\, (s, t) \in S \times S \mid \ell (s) \not = \ell (t) \,\}\\ S^2_?&= (S \times S) \setminus (S^2_0 \cup S^2_1) \end{aligned}$$

The set \(S^2_0\) contains those state pairs that have distance zero (cf. Theorem 6). The set \(S^2_1\) contains those state pairs that have a different label and, therefore, have distance one (cf. Definition 5). The set \(S^2_?\) contains the remaining state pairs. Note that some of these state pairs may have distance one, but cannot have distance zero. The probabilistic bisimilarity distances are defined in terms of the following function.

Definition 5

The function \(\varDelta : (S \times S \rightarrow [0, 1]) \rightarrow (S \times S \rightarrow [0, 1])\) is defined by

$$ \varDelta (d)(s, t) = \left\{ \begin{array}{ll} 0 &{} \textit{if }(s, t) \in S^2_0\\ 1 &{} \textit{if }(s, t) \in S^2_1\\ \displaystyle \inf _{\omega \in \varOmega _{\mathbb {R}}(\tau (s), \tau (t))} \sum _{u, v \in S} \omega (u, v) \; d(u, v) &{} \textit{if }(s, t) \in S^2_? \end{array} \right. $$

Let \(d \in S \times S \rightarrow [0, 1]\) and \(\omega \in \mathcal {D}_{\mathbb {R}}(S \times S)\). Instead of \(\sum _{u, v \in S} \omega (u, v) \; d(u, v)\) we write \(\omega \cdot d\) in the remainder to avoid clutter. Similarly, for \(f \in S \rightarrow [0, 1]\) and \(\mu \in \mathcal {D}_{\mathbb {R}}(S)\) we write \(f \cdot \mu \) instead of \(\sum _{s \in S} f(s) \, \mu (s)\).

For d, \(e \in S \times S \rightarrow [0, 1]\), we define \(d \sqsubseteq e\) if for all s, \(t \in S\), \(d(s, t) \le e(s, t)\). According to, for example, [22, Lemma 3.2], \(<S \times S \rightarrow [0, 1], \mathord {\sqsubseteq }>\) is a complete lattice. Since the function \(\varDelta \) is a monotone function from a complete lattice to itself, we can conclude from the Knaster-Tarski fixed point theorem (see, for example, [13, Theorem 2.35]) that \(\varDelta \) has a least fixed point. We denote this least fixed point by \(\delta \). This least fixed point maps each pair of states to a real number in the interval [0, 1]: the probabilistic bisimilarity distance of the states. Distance zero captures probabilistic bisimilarity.

Theorem 6

([21, Theorem 4.10]). For all s, \(t \in S\), \(\delta (s, t) = 0\) if and only if \(s \sim t\).

The probabilistic bisimilarity distance function \(\delta \) is the limit of the distance functions \(\delta _n\) which only consider the first n transitions when comparing the similarity of the behaviour of states. This result can be seen as an instance of the Kleene fixed point theorem [34].

Definition 7

For each \(n \ge 0\), the function \(\delta _n : S \times S \rightarrow [0, 1]\) is defined by

$$ \delta _n(s, t) = \left\{ \begin{array}{ll} 0 &{} \textit{if }n=0\\ \varDelta (\delta _{n-1})(s, t) &{} \textit{otherwise.} \end{array} \right. $$

Proposition 8

\(\displaystyle \lim _{n \rightarrow \infty } \delta _n = \delta \).

3 A Logical Characterization

Below, we present a logical characterization of the probabilistic bisimilarity distances. We start with a logic very similar to the one introduced by Desharnais et al. [20].

Definition 9

The logic \(\mathcal {L}_{\lnot }\) is defined by

figure k

where \(a \in L\) and \(q \in \mathbb {Q} \cap [0, 1]\).

The above logic is slightly different from the one presented in [20] as we consider Markov chains with labelled states, whereas Desharnais et al. studied Markov chains with labelled transitions. In particular, a and \(\mathord {\bigcirc }\varphi \) were combined as \(<a> \varphi \). Since we restrict our attention to finite state systems, we can restrict ourselves to finite disjunctions. In our setting, the constants true and false can be expressed as \(\bigvee _{a \in L} a\) (recall that we assume that the set L is finite as well) and \(\lnot \textrm{true}\), respectively. The logic of Desharnais et al. also contains the operator \(\lceil \varphi \rceil ^q\) which is redundant, as observed in [21, page 336]. The logic considered by Desharnais [18] lacks negation, but does include \(\lceil \varphi \rceil ^q\) and conjunction. The real-valued interpretation of the logic of Desharnais et al., which considers labelled transitions, is adjusted to our setting of labelled states as follows.

Definition 10

The function \(\llbracket {\cdot }\rrbracket \! : \mathcal {L}_{\lnot } \rightarrow S \rightarrow [0, 1]\) is defined by

$$\begin{aligned} \begin{array}{rcl} \!\llbracket {a}\rrbracket \!(s) &{} = &{} \left\{ \begin{array}{ll} 1 &{} \textit{if }\ell (s) = a\\ 0 &{} \textit{otherwise} \end{array} \right. \\ \!\llbracket {\mathord {\bigcirc }\varphi }\rrbracket \!(s) &{} = &{} \!\llbracket {\varphi }\rrbracket \! \cdot \tau (s)\\ \!\llbracket {\lnot \varphi }\rrbracket \!(s) &{} = &{} 1 - \!\llbracket {\varphi }\rrbracket \!(s)\\ \!\llbracket {\varphi \ominus q}\rrbracket \!(s) &{} = &{} \max (\!\llbracket {\varphi }\rrbracket \!(s) - q, 0)\\ \!\llbracket {\varphi \vee \psi }\rrbracket \!(s) &{} = &{} \max (\!\llbracket {\varphi }\rrbracket \!(s), \!\llbracket {\psi }\rrbracket \!(s)) \end{array} \end{aligned}$$

Note that \(\llbracket {\textrm{false}}\rrbracket \!\) and \(\llbracket {\textrm{true}}\rrbracket \!\) are the constant zero and constant one functions, respectively. The probabilistic bisimilarity distances can be characterized in terms of the logic.

Theorem 11

([5, Theorem 40 and 44]). For all s, \(t \in S\),

$$ \delta (s, t) = \sup _{\varphi \in \mathcal {L}_{\lnot }} \!\llbracket {\varphi }\rrbracket \!(s) - \!\llbracket {\varphi }\rrbracket \!(t). $$

In the remainder of this paper, we consider the following logic. This logic also characterizes probabilistic bisimilarity distances. As we will show later, this logic can explain the probabilistic bisimilarity distances more concisely than the logic presented above.

Definition 12

The logic \(\mathcal {L}\) is defined by

figure l

where \(a \in L\) and \(q \in \mathbb {Q} \cap [0, 1]\).

Note that negation has been removed and conjunction has been added. Also the operator \(\oplus q\), which is dual to \(\ominus q\), has been added. This logic is very similar to the one considered by Desharnais [18].

Definition 13

The function \(\llbracket {\cdot }\rrbracket \! : \mathcal {L} \rightarrow S \rightarrow [0, 1]\) of Definition 10 is modified by

$$ \begin{array}{rcl} \!\llbracket {\varphi \oplus q}\rrbracket \!(s) &{} = &{} \min \{\!\llbracket {\varphi }\rrbracket \!(s) + q, 1\}\\ \!\llbracket {\varphi \wedge \psi }\rrbracket \!(s) &{} = &{} \min \{\!\llbracket {\varphi }\rrbracket \!(s), \!\llbracket {\psi }\rrbracket \!(s)\} \end{array} $$

As already mentioned above, also this logic characterizes the probabilistic bisimilarity distances.

Theorem 14

For all s, \(t \in S\), \(\delta (s, t) = \sup _{\varphi \in \mathcal {L}} \!\llbracket {\varphi }\rrbracket \!(s) - \!\llbracket {\varphi }\rrbracket \!(t)\).

Proof sketch

Each formula of \(\mathcal {L}\) can be rewritten to an equivalent formula of \(\mathcal {L}_{\lnot }\). For example, if \(\varphi \) is rewritten to \(\psi \) then \(\varphi \oplus q\) is rewritten to \(\lnot (\lnot \psi \ominus q)\). Each formula of \(\mathcal {L}\) has a dual: if \(\llbracket {\varphi }\rrbracket \! = 1 - \!\llbracket {\psi }\rrbracket \!\) then \(\varphi \) is a dual of \(\psi \). For example, if \(\varphi \) is a dual of \(\psi \) then \(\varphi \ominus q\) is a dual of \(\psi \oplus q\). Each formula \(\mathcal {L}_{\lnot }\) can be rewritten to an equivalent formula of \(\mathcal {L}\). For example, if \(\varphi \) is rewritten to \(\psi \) then \(\lnot \varphi \) is rewritten to a dual of \(\psi \). The result now follows from Theorem 11.    \(\square \)

4 All Operators are Necessary

The logic \(\mathcal {L}\) is a minimal logic that characterizes the probabilistic bisimilarity distances. That is, if we remove any operator from the logic then the resulting logic does not characterizes the probabilistic bisimilarity distances anymore. Due to lack of space, we only consider the logic \(\mathcal {L}_{\setminus \ominus }\), which does not have the \(\ominus q\) operator.

Definition 15

The logic \(\mathcal {L}_{\setminus \ominus }\) is defined by

figure m

where \(a \in L\) and \(q \in \mathbb {Q} \cap [0, 1]\).

Theorem 16

There exists a labelled Markov chain \(<S, L, \tau , \ell>\) and s, \(t \in S\) such that

figure n

Proof sketch

Consider the following labelled Markov chain.

figure o

It can be shown that \(\delta (s, t) = \frac{7}{8}\). Furthermore, we can prove that for all \(\varphi \in \mathcal {L}_{\setminus \ominus }\) and \(q \in \mathbb {Q} \cap [0, 1]\), if then by structural induction on \(\varphi \). Using this result and Theorem 14, we can also show that for all \(\varphi \in \mathcal {L}_{\setminus \ominus }\), \(\llbracket {\varphi }\rrbracket \!(s) - \!\llbracket {\varphi }\rrbracket \!(t) \le \frac{27}{32}\) by structural induction on \(\varphi \).    \(\square \)

5 Explainability

In general, the probabilistic bisimilarity distance of two states cannot be explained by a single formula, as we will show next. That is, generally there does not exist a distinguishing formula for every pair of states of a labelled Markov chain. But, as we will prove below, for every pair of states there exists a sequence of formulas that explains their distance.

Theorem 17

There exists a labelled Markov chain \(<S, L, \tau , \ell>\) and s, \(t \in S\) such that for all \(\varphi \in \mathcal {L}\), .

Proof sketch

Consider the following labelled Markov chain.

figure s

It can be shown that \(\delta (s, t) = 1\). We can also prove that for all \(\varphi \in \mathcal {L}\), by structural induction on \(\varphi \).    \(\square \)

As we will show next, for every pair of states (st) there exists a sequence of formulas \((\xi _n)_n\) such that \(\delta (s, t) = \lim _{n \rightarrow \infty } \!\llbracket {\xi _n}\rrbracket \!(s) - \!\llbracket {\xi _n}\rrbracket \!(t)\). This sequence \((\xi _n)_n\) explains the distance \(\delta (s, t)\).

Proposition 18

For all s, \(t \in S\) there exists \((\xi _n)_n\) such that

$$ \delta (s, t) = \lim _{n \rightarrow \infty } \!\llbracket {\xi _n}\rrbracket \!(s) - \!\llbracket {\xi _n}\rrbracket \!(t). $$

Proof sketch

This can be concluded from Theorem 14 and the following. Let X be a nonempty subset of \(\mathbb {R}\) that is bounded above. Then there exists a sequence \((x_n)_n\) in X that converges to \(\sup X\) [8, page 4].    \(\square \)

The proof of the above proposition is not constructive. Below, we will construct a sequence of formulas \((\varphi _{st}^n)_n\) that explains the distance of the states s and t. In particular, \(\varphi _{st}^n\) is constructed so that

figure u

and, hence, \(\llbracket {\varphi _{st}^n}\rrbracket \!(s) - \!\llbracket {\varphi _{st}^n}\rrbracket \!(t) = \delta _n(s, t)\). That is, the formula \(\varphi _{st}^n\) explains the distance \(\delta _n(s, t)\).

If \(n = 0\) then \(\delta _n(s, t) = 0\). We choose the formula false since

figure v

Let , also \(\delta _n(s, t) = 0\). Again we choose the formula false to explain the distance. For \((s, t) \in S^2_1\), we have that \(\delta _n(s, t) = 1\). In this case the formula \(\ell (s)\) explains \(\delta _n(s, t)\) since \(\ell (s) \not = \ell (t)\) and, therefore,

figure x

To construct a formula that explains distance \(\delta _n(s, t)\) for \((s, t) \in S^2_?\), we rely on the following result about distances and nonexpansive functions. A function \(f \in S \rightarrow [0, 1]\) is nonexpansive if for all s, \(t \in S\), \(| f(s) - f(t) | \le \delta _n(s, t)\). The set of nonexpansive functions is denoted by . This set forms a convex polytope and is known as the Lipschitz polytope. We denote its vertices by .

Proposition 19

For all \((s, t) \in S^2_?\) and \(n \ge 0\), there exists such that \(\delta _{n+1}(s, t) = f_{st}^{n} \cdot (\tau (s) - \tau (t))\).

Proof sketch

Let \((s, t) \in S^2_?\) and \(n \ge 0\). Then

$$ \delta _{n+1}(s, t) = \inf _{\omega \in \varOmega _{\mathbb {R}}(\tau (s), \tau (t))} \omega \cdot \delta _{n}. $$

We can view \(\delta _{n+1}(s, t)\) as the minimal cost of a transportation problem, where \(\tau (s)(u)\) represents the amount transported from the origin u, \(\tau (t)(v)\) captures the amount received at the destination v, \(\delta _n(u, v)\) represents the transportation cost from u to v, and each \(\omega \) captures a transportation plan, that is, \(\omega (u, v)\) is the amount transported from u to v (see, for example, [40, page 15]).

From the Kantorovich-Rubinstein duality theorem [31] we can conclude that

figure ab

In this dual to the above transportation problem, each f represents a price function (see, for example, [40, page 81]). Since a linear function on a convex polytope attains its maximum at a vertex (see, for example, [49, Theorem 2 of Chapter 1]), we can conclude that

figure ac

Since we can prove that , there exists such that \(\delta _{n+1}(s, t) = f^{n}_{st} \cdot (\tau (s) - \tau (t))\).    \(\square \)

The function \(f^n_{st}\) plays a key role in the formula explaining \(\delta _n(s, t)\). However, \(f^n_{st}\) is not necessarily unique. Consider the following labelled Markov chain.

figure af

For this example, the sequence \((\delta _n)_n\) converges in three steps, that is, \(\delta = \delta _3\). We have that \(\delta _2(u,v) = \frac{1}{2}\) and \(\delta _3(s,t) = \frac{1}{2}\). So we need the function \(f^2_{st}\) to satisfy \(\delta _3(s,t) = f^2_{st}(u) - f^2_{st}(v)\) and \(|f^2_{st}(u) - f^2_{st}(v)| \le \frac{1}{2}\). For each \(0 \le q \le \frac{1}{2}\), \(f^2_{st}(u) = \frac{1}{2} + q\) and \(f^2_{st}(v) = q\) satisfies these properties. As we will see, any \(f^n_{st}\) that satisfies these properties can be used to construct \(\varphi ^n_{st}\). How to compute these functions \(f_{st}^n\) is the topic of the next section.

As we will show in Theorem 22, we can construct a formula \(\psi ^n_{st}\) that captures the function \(f^n_{st}\), that is, \(\llbracket {\psi ^n_{st}}\rrbracket \! = f^n_{st}\). More about this soon. By means of \(\psi ^{n-1}_{st}\) we can explain the distance \(\delta _n(s, t)\) by the formula \((\mathord {\bigcirc }\psi ^{n-1}_{st}) \ominus (f^{n-1}_{st} \cdot \tau (t))\) since we have that

$$\begin{aligned} \!\llbracket {(\mathord {\bigcirc }\psi ^{n-1}_{st}) \ominus (f^{n-1}_{st} \cdot \tau (t))}\rrbracket \!(s)&= \max \{ (\!\llbracket {\psi ^{n-1}_{st}}\rrbracket \! \cdot \tau (s)) - (f^{n-1}_{st} \cdot \tau (t)), 0 \}\\&= \max \{ (f^{n-1}_{st} \cdot \tau (s)) - (f^{n-1}_{st} \cdot \tau (t)), 0 \}\\&= \max \{ f^{n-1}_{st} \cdot (\tau (s) - \tau (t)), 0 \}\\&= \max \{ \delta _n(s, t), 0 \}\\&= \delta _n(s, t) \end{aligned}$$

and, similarly, we can deduce that

figure ag

Let us return to the formula \(\psi ^n_{st}\) that captures the function \(f^n_{st}\). To construct \(\psi ^n_{st}\) we use the following result.

Lemma 20

([2, Lemma A7.2]). Let \(f \in S \rightarrow [0, 1]\). If for all u, \(v \in S\), there exists \(g_{uv} \in S \rightarrow [0, 1]\) such that \(g_{uv}(u) = f(u)\) and \(g_{uv}(v) = f(v)\), then

$$ f = \min _{u \in S} \max _{v \in S} g_{uv} = \max _{u \in S} \min _{v \in S} g_{uv}. $$

To apply the above lemma, we need to construct for all u, \(v \in S\) a formula \(\psi ^n_{stuv}\) such that

figure ah

The details are provided in Definition 21 and Theorem 22. From Lemma 20 we can then conclude that

figure ai

The above can be summarized as follows.

Definition 21

For all s, \(t \in S\),

$$ \varphi _{st}^0 = \textrm{false} $$

and

$$ \varphi _{st}^1 = \left\{ \begin{array}{ll} \textrm{false} &{} \textit{if }(s, t) \in S^2_0 \cup S^2_?\\ \ell (s) &{} \textit{if }(s, t) \in S^2_1 \end{array} \right. $$

For all s, \(t \in S\) and \(n \ge 2\),

figure aj

For all \((s, t) \in S^2_?\) and \(n \ge 1\),

$$ \psi ^n_{st} = \bigwedge _{u \in S} \bigvee _{v \in S} \psi ^n_{stuv} $$

For all \((s, t) \in S^2_?\), u, \(v \in S\), and \(n \ge 1\),

figure ak

Note that, for \((s, t) \in S^2_?\) and \(n \ge 2\), the formula \(\varphi _{st}^n\) contains \(|S|^2\) subformulas of the form \(\varphi _{uv}^{n-1}\). As a consequence, the size of \(\varphi _{st}^n\) grows exponentially in n. As we will see in Section 7, we can compute \(\varphi _{st}^n\) in polynomial time by sharing subformulas.

The above definition shows some similarities with the sequence of formulas introduced in [43, Definition 8]. Their setting is different: the transitions are labelled (as in [20]), the transition function is deterministic, and the labelling of the transitions is probabilistic. Their logic is simpler than the one introduced in [20] since the systems they consider are simpler. The sequence of formulas that they introduce is syntactically simpler than the one we define above. Their formulas are only used to prove a logical characterization, although those formulas can also be used for explainability.

Consider the states s and t of the following labelled Markov chain.

figure al

By definition, \(\varphi ^0_{st} = \textrm{false}\) and \(\varphi ^1_{st} = \textrm{false}\). For \(\varphi ^2_{st}\) we get

figure am

This formula can be simplified to \(\textrm{false}\). In the logic of Desharnais et al., which lacks \(\wedge \) and \(\oplus \), one would need 111 additional \(\lnot \), making it less concise.

The formula \(\varphi ^3_{st}\) fills more than a page, but can be simplified to the formula . Although generally there does not exist a distinguishing formula for each pair of states (Theorem 17), in this case the formula \(\varphi ^3_{st}\) explains the distance of states s and t, since \(\delta (s, t) = 0.125\), \(\llbracket {\varphi ^3_{st}}\rrbracket \!(s) = 0.125\) and \(\llbracket {\varphi ^3_{st}}\rrbracket \!(t) = 0\). The formula captures the probability of reaching a green state in one transition and subsequently reaching another green state.

The formula \(\varphi ^3_{ts}\) can be simplified to . Since we have that \(\llbracket {\varphi ^3_{ts}}\rrbracket \!(t) = 0.125\) and \(\llbracket {\varphi ^3_{ts}}\rrbracket \!(s) = 0\), the formula \(\varphi _{ts}^3\) explains the distance \(\delta (t, s) = 0.125\). The formula represents the probability of reaching a green state in one transition and subsequently reaching a purple state.

The outermost test can be removed from the explanation. Hence, the formulas and explain the distance of states s and t as well.

Theorem 22

  1. (a)

    For all s, \(t \in S\) and \(n \ge 0\), \(\llbracket {\varphi ^n_{st}}\rrbracket \!(s) = \delta _n(s, t)\) and \(\llbracket {\varphi ^n_{st}}\rrbracket \!(t) = 0\).

  2. (b)

    For all \((s, t) \in S^2_?\) and \(n \ge 1\), \(\llbracket {\psi _{st}^n}\rrbracket \! = f_{st}^{n}\).

  3. (c)

    For all \((s, t) \in S^2_?\), u, \(v \in S\), and \(n \ge 1\), \(\llbracket {\psi ^n_{stuv}}\rrbracket \!(u) = f_{st}^{n}(u)\) and \(\llbracket {\psi ^n_{stuv}}\rrbracket \!(v) = f_{st}^{n}(v)\).

Proof sketch

This theorem can be proved by induction on n. Most steps of the proof have already been discussed above. To prove (c), let \((s, t) \in S^2_?\), u, \(v \in S\) and \(n \ge 1\). We need to distinguish three cases. Here we only consider the case that . Then

figure as

   \(\square \)

Combining Proposition 8 and Theorem 22, we obtain the following explainability result.

Corollary 23

For all s, \(t \in S\), \(\lim _{n \rightarrow \infty } \!\llbracket {\varphi _{st}^n}\rrbracket \!(s) - \!\llbracket {\varphi _{st}^n}\rrbracket \!(t) = \delta (s, t)\).

6 Computing \(f_{st}^n\)

Proposition 19 states that the functions \(f_{st}^n\) exist. Below, we will show that these functions can be computed in polynomial time.

Let \((s, t) \in S^2_?\). The function is defined as the constant zero function satisfies \(\delta _1(s, t) = f_{st}^0 \cdot (\tau (s) - \tau (t))\) and can be computed in polynomial time. To prove that the remaining functions \(f_{st}^n\), with \(n \ge 1\), can be computed in polynomial time as well, we use the primal network simplex algorithm to solve minimum-cost flow problems due to Orlin [42] and the ellipsoid method to solve linear programming problems due to Khachiyan [32]. As we will show below, \(f_{st}^n\) can be computed as \({\textsc {FindVertex}}(\delta _n, \tau (s), \tau (t))\).

figure au

In line 4 we use Orlin’s primal network simplex algorithm to compute the minimum cost for the following network (NE). The nodes of the network consist of two copies of each \(u \in S\), denoted \(u_0\) and \(u_1\). The supply of node \(u_0\) is \(\mu (u)\) and the demand of node \(u_1\) is \(\nu (u)\). Each edge \((u_0, v_1)\) has cost d(uv).

figure av

Each \(\omega \in \varOmega _{\mathbb {R}}(\mu , \nu )\) corresponds to a feasible flow, where \(\omega (u, v)\) captures the flow from \(u_0\) to \(v_1\). The constraints \(\omega (u, S) = \mu (u)\) and \(\omega (S, u) = \nu (u)\), defining \(\varOmega _{\mathbb {R}}(\mu , \nu )\), capture that the supply of \(u_0\) flows from \(u_0\) and the demand of \(u_1\) flows to \(u_1\). For a feasible flow \(\omega \), its cost is \(\omega \cdot d\). Hence, \(d_{\mu \nu }\) captures the minimum cost.

Note that, by definition, the supplies and demands are rational. We can prove that \(d_{\mu \nu } = \omega \cdot d\) for some \(\omega \in \varOmega _{\mathbb {Q}}(\mu , \nu )\). Since d is rational as well, we can conclude that \(d_{\mu \nu }\) is also rational. Orlin’s primal network simplex algorithm can compute the minimum cost and, hence, can be used to compute \(d_{\mu \nu }\). Orlin’s algorithm is strongly polynomial: \(O(|N|^2 |E|^2 \log |N|)\). Since there are 2|S| nodes and \(|S|^2\) edges, \(d_{\mu \nu }\) can be computed in \(O(|S|^6 \log |S|)\).

In line 5 we use Khachiyan’s ellipsoid method to find a feasible solution of a linear programming problem with the variables \(x_s\), for \(s \in S\), and the constraints

$$\begin{aligned} \forall s, t \in S :&\, x_s - x_t \le d(s, t)\\ \forall s \in S :&\, x_s \ge 0\\ \forall s \in S :&\, x_s \le 1\\&\sum _{s \in S} x_s \, (\mu (s) - \nu (s)) = d_{\mu \nu } \end{aligned}$$

By means of the ellipsoid method we can find a vertex of the convex polytope defined by the above constraints. This method is polynomial in the size of the constraints, in this case, the size of d, \(\mu \), \(\nu \), and \(d_{\mu \nu }\).

Let \(n \ge 1\) and \((s, t) \in S^2_?\). Since we can show that \(\delta _n\) is rational and \(\delta _n(s, s)~=~0\) for all \(s \in S\), we can apply FindVertex to \(\delta _n\), \(\tau (s)\) and \(\tau (t)\). In this case, line 4 computes \(\inf _{\omega \in \varOmega _{\mathbb {R}}(\tau (s), \tau (t))} \omega \cdot \delta _n\), which equals \(\delta _{n+1}(s, t)\). As a consequence, \({\textsc {FindVertex}}(\delta _n, \tau (s), \tau (t))\) returns such that \(f_{st}^n \cdot (\tau (s) - \tau (t)) = \delta _{n+1}(s, t)\).

As we already observed above, line 4 can be computed in polynomial time in the size of the labelled Markov chain and line 5 can be computed in polynomial time in the size of \(\delta _n\), \(\tau (s)\), \(\tau (t)\), and \(\delta _{n+1}(s, t)\), which we can show to be polynomial time in the size of the labelled Markov chain and n. Hence, the running time of \({\textsc {FindVertex}}(\delta _n, \tau (s), \tau (t))\) is polynomial in the size of the labelled Markov chain and n.

7 The Algorithm

Given a labelled Markov chain \(<S, L, \tau , \ell>\) and \(N \in \mathbb {N}\), we can explain the distances \(\delta (s, t)\) for s, \(t \in S\) by computing the formulas \(\varphi _{st}^n\) for \(0 \le n \le N\). To obtain this sequence of formulas, we implement Definition 21 as follows. Below, for s, t, \(u \in S\), we use the array cells \(\textrm{distance}[s][t]\), \(\textrm{function}[s][t][u]\), and \(\textrm{formula}[s][t][n]\) to represent the distance \(\delta _{n-1}(s, t)\), the function value \(f_{st}^{n-1}(u)\), and the formula \(\varphi _{st}^n\), respectively. In line 5-17, we compute \(\delta _0\), \(f_{st}^0\), \(\varphi _{st}^0\), and \(\varphi _{st}^1\). The loop of line 20–50, first computes the distances \(\delta _n\) (line 21–27), then determines the function \(f_{st}^n\) (line 30), and finally computes formulas \(\varphi _{st}^{n+1}\) (line 31–49).

figure ax
figure ay

Let us first discuss the correctness of the above algorithm. In line 4, \(\sim \) is computed by deciding probabilistic bisimilarity. The loop spanning line 20–50 has the following invariant.

$$\begin{aligned} \forall s, t \in S : \textrm{distance}[s][t]&= \delta _{n-1}(s, t)\end{aligned}$$
(1)
$$\begin{aligned} \forall (s, t) \in S^2_? : \forall u \in S : \textrm{function}[s][t][u]&= f_{st}^{n-1}(u)\end{aligned}$$
(2)
$$\begin{aligned} \forall s, t \in S : \forall 0 \le i \le n : \textrm{formula}[s][t][i]&= \varphi _{st}^i \end{aligned}$$
(3)

Let us check that the above loop invariant holds when we reach line 21 for the first time. In line 7 we set distance to zero. Hence, (1) is satisfied when we reach line 21. In line 17 we set function to zero. Hence, (2) is also satisfied when we reach line 21. In line 6, 10, 13, and 15 we set formula such that (3) is satisfied when we reach line 21.

Next, we check that the loop maintains the above invariant, that is, if the invariant holds at line 21 then it also holds at line 50. Assume that the invariant holds at line 21. From (2) and line 22–27 we can conclude that

$$ \textrm{distance}[s][t] = \left\{ \begin{array}{ll} 0 &{} \,\textrm{if}\,(s, t) \in S^2_0\\ 1 &{} \,\textrm{if}\,(s, t) \in S^2_1\\ f_{st}^{n-1} \cdot (\tau (s) - \tau (t)) &{} \,\textrm{otherwise} \end{array} \right. $$

once we arrive at line 28. Hence, from Proposition 19 we can conclude that \(\textrm{distance}[s][t] = \delta _n(s, t)\) for all s, \(t \in S\). Therefore, (1) holds at line 50.

Since \(\textrm{distance} = \delta _n\) at line 30 and, as we have seen in Section 6, \({\textsc {FindVertex}}(\delta _n, \tau (s), \tau (t))\) returns \(f_{st}^n\), we assign \(f_{st}^n\) to \(\textrm{function}[s][t]\) in line 30. Hence, (2) holds at line 50. We can also verify that line 31–49 ensure that (3) is maintained by the loop.

Finally, we will argue that the running time of the above algorithm is polynomial in the size of the labelled Markov chain and N. Probabilistic bisimilarity can be decided in polynomial time as was first shown by Baier [3]. More efficient algorithms have been proposed by Buchholz [7], Derisavi, Hermanns, and Sanders [17] and Valmari and Franceschinis [50]. Hence, line 4 is polynomial time.

Each line of 6–17 can be implemented in constant time. Since each line of this part is executed at most \(N |S|^3\) times, the running time of line 5–17 is polynomial in the size of the labelled Markov chain and N.

The loop consisting of line 20–50 is executed \(N-1\) times. As we already discussed in Section 6, the running time of \({\textsc {FindVertex}}(\delta _n, \tau (s), \tau (t))\) is polynomial in the size of the labelled Markov chain and n. When we arrive at line 30, distance equals \(\delta _n\) and, hence, this line is polynomial in the size of the labelled Markov chain and n. All other lines of the loop can be implemented in constant time. Each line is executed at most \(|S|^4\) times. Therefore, the running time of line 20–50 is polynomial in the size of the labelled Markov chain and N.

8 Conclusion

In this paper, we study a minor variation of the logic introduced by Desharnais et al. in [20]. In particular, we show that

  1. 1.

    the logic is a minimal one that characterizes the probabilistic bisimilarity distances,

  2. 2.

    in general, there does not exist a distinguishing formula \(\varphi _{st}\) for states s and t such that \(\llbracket {\varphi _{st}}\rrbracket \!(s) - \!\llbracket {\varphi _{st}}\rrbracket \!(t) = \delta (s, t)\),

  3. 3.

    there exists a sequence \((\varphi _{st}^n)_{n}\) of formulas that explains distance \(\delta (s, t)\) as \(\lim _{n \rightarrow \infty } \!\llbracket {\varphi _{st}^n}\rrbracket \!(s) - \!\llbracket {\varphi _{st}^n}\rrbracket \!(t) = \delta (s, t)\), and

  4. 4.

    each formula \(\varphi _{st}^n\) can be computed in polynomial time.

As pointed out by Hillerström in [30], an early paper on computing distinguishing formulas, to explain why states are not bisimilar “arguments must be concise in the sense that an argument must not contain redundant or irrelevant information.” This applies to our setting as well. The distinguishing formulas introduced in Definition 21 are in many cases far from concise. We leave the simplification of these formulas for future research.

One may wonder whether adding fixed points to the logic, in the form variables X and either operators \(\mu X\) and \(\nu X\) or equations of the form \(X = \varphi \), would allow us to explain the probabilistic bisimilarity distance of two states by means of a single formula. A logic similar to the one studied in this paper that contains fixed points has been studied by De Alfaro et al. [1]. Whether simply adding fixed points to the logic suffices is not immediately clear as the \(\ominus p_n\) and \(\oplus q_n\) in the formula \(\psi _{uvst}^n\) vary as n varies. Extending the logic so that the probabilistic bisimilarity distance of two states can be explained by means of a single formula is another potential topic for future research.

Graf and Sifakis [28] introduce the notion of a characteristic formula for a state s: a state satisfies this formula if and only if it is behaviourally equivalent to s. Characteristic formulas have been developed for probabilistic bisimilarity. For example, Deng and van Glabbeek [16] present characteristic formulas for probabilistic automata. Sack and Zhang [47] introduce a general framework to construct characteristic formulas for probabilistic automata. In the setting of probabilistic bisimilarity distances, a characteristic formula for a state s of a labelled Markov chain can be formalized in the following ways. The formula \(\varphi _s\) is a characteristic formula for the state s if

$$\begin{aligned} \textrm{for all states}~t\textrm{, } \!\llbracket {\varphi _s}\rrbracket \!(s) - \!\llbracket {\varphi _s}\rrbracket \!(t) = \delta (s, t) \end{aligned}$$
(4)

or

$$\begin{aligned} \textrm{for all states}~t\textrm{, } \!\llbracket {\varphi _s}\rrbracket \!(t) = \delta (s, t). \end{aligned}$$
(5)

It can be shown that (4) and (5) are equivalent: if there exists a formula that satisfies (4) then there also exists a (different) formula that satisfies (5). Whether such a formula or a sequence of such formulas exists for the logic studied in this paper is an open question that may be tackled in future research.

A preliminary implementation of the algorithm in Java is available [45]. Improving the code is another avenue for further research.