# Trace Refinement in Labelled Markov Decision Processes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9634)

## Abstract

Given two labelled Markov decision processes (MDPs), the trace-refinement problem asks whether for all strategies of the first MDP there exists a strategy of the second MDP such that the induced labelled Markov chains are trace-equivalent. We show that this problem is decidable in polynomial time if the second MDP is a Markov chain. The algorithm is based on new results on a particular notion of bisimulation between distributions over the states. However, we show that the general trace-refinement problem is undecidable, even if the first MDP is a Markov chain. Decidability of those problems was stated as open in 2008. We further study the decidability and complexity of the trace-refinement problem provided that the strategies are restricted to be memoryless.

## Keywords

Markov Decision Process Local Strategy Nonnegative Matrix Nonnegative Matrice Dirac Distribution
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## 1 Introduction

We consider labelled Markov chains (MCs) whose transitions are labelled with symbols from an alphabet $$\mathsf{L}$$. Upon taking a transition, the MC emits the associated label. In this way, an MC defines a trace-probability function $$Tr : \mathsf{L}^* \rightarrow [0,1]$$ which assigns to each finite trace $$w \in \mathsf{L}^*$$ the probability that the MC emits w during its first |w| transitions. Consider the MC depicted in Fig. 1 with initial state $$p_0$$. For example, see that if in state $$p_0$$, with probability $$\frac{1}{4}$$, a transition to state $$p_c$$ is taken and c is emitted. We have, e.g., $$Tr (a b c) = \frac{1}{4}\cdot \frac{1}{4}\cdot \frac{1}{4}$$. Two MCs over the same alphabet $$\mathsf{L}$$ are called equivalent if their trace-probability functions are equal. Fig. 1. An MC with its trace-probability function. This MC, denoted by $$\mathcal {C}(\mathcal {A})$$, is also used in the reduction from universality of probabilistic automata to $$\mathsf{MC \sqsubseteq MDP}$$. Fig. 2. An MDP where the choice of controller is relevant only in $$q_1$$. Two available moves $$\mathsf{m}_1,\mathsf{m}_2$$ are shown with small black circles.

The study of such MCs and their equivalence has a long history, going back to Schützenberger  and Paz . Schützenberger and Paz studied weighted and probabilistic automata, respectively. Those models generalize labelled MCs, but the respective equivalence problems are essentially the same. It can be extracted from  that equivalence is decidable in polynomial time, using a technique based on linear algebra. Variants of this technique were developed, see e.g. [8, 20]. Tzeng  considered the path-equivalence problem for nondeterministic automata which asks, given nondeterministic automata A and B, whether each word has the same number of accepting paths in A as in B. He gives an NC algorithm1 for deciding path equivalence which can be straightforwardly adapted to yield an NC algorithm for equivalence of MCs.

More recently, the efficient decidability of the equivalence problem was exploited, both theoretically and practically, for the verification of probabilistic systems, see e.g. [12, 13, 14, 15, 17]. In those works, equivalence naturally expresses properties such as obliviousness and anonymity, which are difficult to formalize in temporal logic. The inclusion problem for two probabilistic automata asks whether for each word the acceptance probability in the first automaton is less than or equal to the acceptance probability in the second automaton. Despite its semblance to the equivalence problem, the inclusion problem is undecidable , even for automata of fixed dimension . This is unfortunate, especially because deciding language inclusion is often at the heart of verification algorithms.

We study another “inclusion-like” generalization of the equivalence problem: trace refinement in labelled Markov decision processes (MDPs). MDPs extend MCs by nondeterminism; in each state, a controller chooses, possibly randomly and possibly depending on the history, one out of finitely many moves2. A move determines a probability distribution over the emitted label and the successor state. In this way, an MDP and a strategy of the controller induce an MC.

The trace-refinement problem asks, given two MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$, whether for all strategies for $$\mathcal {D}$$ there is a strategy for $$\mathcal{E}$$ such that the induced MCs are equivalent. Consider the MDP depicted in Fig. 2 where in state $$q_1$$ there are two available moves; one move generates the label c with probability 1, the other move generates d with probability 1. A strategy of the controller that chooses the last generated label in the state $$q_1$$, either c or d, with probability 1, induces the same trace-probability function as the MC shown in Fig. 1; the MDP thus refines that MC. The described strategy needs one bit of memory to keep track of the last generated label. It was shown in  that the strategy for $$\mathcal{E}$$ may require infinite memory, even if $$\mathcal {D}$$ is an MC. The decidability of trace refinement was posed as an open problem, both in the introduction and in the conclusion of . The authors of  also ask about the decidability of subcases, where $$\mathcal {D}$$ or $$\mathcal{E}$$ are restricted to be MCs. In this paper we answer all those questions. We show that trace refinement is undecidable, even if $$\mathcal {D}$$ is an MC. In contrast, we show that trace refinement is decidable efficiently (in NC, hence in P), if $$\mathcal{E}$$ is an MC. Moreover, we prove that the trace-refinement problem becomes decidable if one imposes suitable restrictions on the strategies for $$\mathcal {D}$$ and $$\mathcal{E}$$, respectively. More specifically, we consider memoryless (i.e., no dependence on the history) and pure memoryless (i.e., no randomization and no dependence on the history) strategies, establishing various complexity results between NP and PSPACE.

To obtain the aforementioned NC result, we demonstrate a link between trace refinement and a particular notion of bisimulation between two MDPs that was studied in . This variant of bisimulation is not defined between two states as in the usual notion, but between two distributions on states. An exponential-time algorithm that decides (this notion of) bisimulation was provided in . We sharpen this result by exhibiting a coNP algorithm that decides bisimulation between two MDPs, and an NC algorithm for the case where one of the MDPs is an MC. For that we refine the arguments devised in . The model considered in  is more general than ours in that they also consider continuous state spaces, but more restricted than ours in that the label is determined by the move.

The full version of the paper is available online .

## 2 Preliminaries

A trace over a finite set $$\mathsf{L}$$ of labels is a finite sequence $$w=a_1 \cdots a_n$$ of labels where the length of the trace is $$|w|=n$$. The empty trace $$\epsilon$$ has length zero. For $$n\ge 0$$, let $$\mathsf{L}^{n}$$ be the set of all traces with length n; we denote by $$\mathsf{L}^{*}$$ the set of all (finite) traces over $$\mathsf{L}$$.

For a function $$d : S \rightarrow [0, 1]$$ over a finite set S, define the norm$$||d|| := \sum _{s \in S} d(s)$$. The support of d is the set $$\mathsf{Supp}(d) = \{s \in S \mid d(s) > 0\}$$. The function d is a probability subdistribution over S if $$||d|| \le 1$$; it is a probability distribution if $$||d|| = 1$$. We denote by $$\mathsf{subDist}(S)$$ (resp. $$\mathsf{Dist}(S)$$) the set of all probability subdistributions (resp. distributions) over S. Given $$s \in S$$, the Dirac distribution on s assigns probability 1 to s; we denote it by $$d_s$$. For a non-empty subset $$T \subseteq S$$, the uniform distribution over T assigns probability $$\frac{1}{|T|}$$ to every element in T.

### 2.1 Labelled Markov Decision Processes

A labelled Markov decision process (MDP) $$\mathcal {D}= \langle Q,\mu _0,\mathsf{L},\delta \rangle$$ consists of a finite set Q of states, an initial distribution $$\mu _0 \in \mathsf{Dist}(Q)$$, a finite set $$\mathsf{L}$$ of labels, and a finite probabilistic transition relation $$\delta \subseteq Q \times \mathsf{Dist}(\mathsf{L}\times Q)$$ where states are in relation with distributions over pairs of labels and successors. We assume that for each state $$q\in Q$$ there exists some distribution $$d\in \mathsf{Dist}(\mathsf{L}\times Q)$$ where $$\langle q,d \rangle \in \delta$$. The set of moves in q is $$\mathsf{moves}(q)=\{d\in \mathsf{Dist}(\mathsf{L}\times Q) \mid \langle q,d \rangle \in \delta \}$$; denote by $$\mathsf{moves}=\bigcup _{q\in Q} \mathsf{moves}(q)$$ the set of all moves.

For the complexity results, we assume that probabilities of transitions are rational and given as fractions of integers represented in binary.

We describe the behaviour of an MDP as a trace generator running in steps. The MDP starts in the first step in state q with probability $$\mu _{0}(q)$$. In each step, if the MDP is in state q the controller chooses $$\mathsf{m}\in \mathsf{moves}(q)$$; then, with probability $$\mathsf{m}(a,q')$$, the label a is generated and the next step starts in the successor state $$q'$$.

Given $$q\in Q$$, denote by $$\mathsf{post}(q)$$ the set $$\{(a,q') \in \mathsf{Supp}(\mathsf{m}) \mid \mathsf{m}\in \mathsf{moves}(q)\}$$. A path in $$\mathcal {D}$$ is a sequence $$\rho =q_{0} a_{1} q_{1} \dots a_n q_{n}$$ such that $$(a_{i+1},q_{i+1})\in \mathsf{post}(q_{i})$$ for all $$0\le i<n$$. The path $$\rho$$ has the last state $$\mathsf{last}(\rho )=q_{n}$$; and the generated trace after $$\rho$$ is $$a_{1} a_{2} \cdots a_{n}$$, denoted by $$\mathsf{trace}(\rho )$$. We denote by $$\mathsf{Paths}(\mathcal {D})$$ the set of all paths in $$\mathcal {D}$$, and by $$\mathsf{Paths}(w)=\{\rho \in \mathsf{Paths}(\mathcal {D})\mid \mathsf{trace}(\rho )=w\}$$ the set of all path generating w.

Strategies. A randomized strategy (or simply a strategy) for an MDP $$\mathcal {D}$$ is a function $$\alpha : \mathsf{Paths}(\mathcal {D}) \rightarrow \mathsf{Dist}(\mathsf{moves})$$ that, given a finite path $$\rho$$, returns a probability distribution $$\alpha (\rho ) \in \mathsf{Dist}(\mathsf{moves}(\mathsf{last}(\rho )))$$ over the set of moves in $$\mathsf{last}(\rho )$$, used to generate a label a and select a successor state $$q'$$ with probability $$\sum _{\mathsf{m}\in \mathsf{moves}(q)} \alpha (\rho )(\mathsf{m}) \cdot \mathsf{m}(a,q')$$ where $$q=\mathsf{last}(\rho )$$.

A strategy $$\alpha$$ is pure if for all $$\rho \in \mathsf{Paths}(\mathcal {D})$$, we have $$\alpha (\rho )(\mathsf{m})=1$$ for some $$\mathsf{m}\in \mathsf{moves}$$; we thus view pure strategies as functions $$\alpha : \mathsf{Paths}(\mathcal {D})\rightarrow \mathsf{moves}$$. A strategy $$\alpha$$ is memoryless if $$\alpha (\rho ) = \alpha (\rho ')$$ for all paths $$\rho , \rho '$$ with $$\mathsf{last}(\rho ) = \mathsf{last}(\rho ')$$; we thus view memoryless strategies as functions $$\alpha : Q \rightarrow \mathsf{Dist}(\mathsf{moves})$$. A strategy $$\alpha$$ is trace-based if $$\alpha (\rho ) = \alpha (\rho ')$$ for all $$\rho , \rho '$$ where $$\mathsf{trace}(\rho )=\mathsf{trace}(\rho ')$$ and $$\mathsf{last}(\rho ) = \mathsf{last}(\rho ')$$; we view trace-based strategies as functions $$\alpha :\mathsf{L}^{*} \times Q \rightarrow \mathsf{Dist}(\mathsf{moves})$$.

Trace-Probability Function. For an MDP $$\mathcal {D}$$ and a strategy $$\alpha$$, the probability of a single path is inductively defined by $$\mathsf{Pr}_{\mathcal {D},\alpha }(q) = \mu _0(q)$$ and
$$\mathsf{Pr}_{\mathcal {D},\alpha }(\rho a q) = \mathsf{Pr}_{\mathcal {D},\alpha }(\rho ) \cdot \sum _{\mathsf{m}\in \mathsf{moves}(\mathsf{last}(\rho ))} \alpha (\rho )(\mathsf{m}) \cdot \mathsf{m}(a,q).$$
The trace-probability function $$Tr _{\mathcal {D},\alpha } : \mathsf{L}^{*} \rightarrow [0,1]$$ is, given a trace w, defined by
$$Tr _{\mathcal {D},\alpha }( w ) =\sum _{\rho \in \mathsf{Paths}(w)}\mathsf{Pr}_{\mathcal {D},\alpha }(\rho ).$$
We may drop the subscript $$\mathcal {D}$$ or $$\alpha$$ from $$Tr _{\mathcal {D},\alpha }$$ if it is understood. We denote by $$subDis _{\mathcal {D},\alpha }(w) \in \mathsf{subDist}(Q)$$, the subdistribution after generating a traces w, that is
$$subDis _{\mathcal {D},\alpha }(w)(q)=\sum _{\rho \in \mathsf{Paths}(w):\mathsf{last}(\rho )=q}\mathsf{Pr}_{\mathcal {D},\alpha }(\rho ).$$
We have:
\begin{aligned} Tr _{\mathcal {D},\alpha }( w ) =|| subDis _{\mathcal {D},\alpha }(w)|| \end{aligned}
(1)
A version of the following lemma was proved in [8, Lemma 1]:

### Lemma 1

Let $$\mathcal {D}$$ be an MDP and $$\alpha$$ be a strategy. There exists a trace-based strategy $$\beta$$ such that $$Tr _{\alpha } = Tr _{\beta }$$.

Here, by $$Tr _{\alpha } = Tr _{\beta }$$ we mean $$Tr _{\alpha }(w) = Tr _{\beta }(w)$$ for all traces $$w \in \mathsf{L}^*$$.

Labeled Markov Chains. A finite-state labeled Markov chain (MC for short) is an MDP where only a single move is available in each state, and thus controller’s choice plays no role. An MC $$\mathcal {C}= \langle Q,\mu _0,\mathsf{L},\delta \rangle$$ is an MDP where $$\delta : Q \rightarrow \mathsf{Dist}(\mathsf{L}\times Q)$$ is a probabilistic transition function. Since MCs are MDPs, we analogously define paths, and the probability of a single path inductively as follows: $$\mathsf{Pr}_{\mathcal {C}}(q) = \mu _0(q)$$ and $$\mathsf{Pr}_{\mathcal {C}}(\rho a q) = \mathsf{Pr}_{\mathcal {C}}(\rho ) \cdot \delta (q')(a,q)$$ where $$q'=\mathsf{last}(\rho )$$. The notations $$subDis _{\mathcal {C}}(w)$$ and $$Tr _{\mathcal {C}}$$ are defined analogously.

### 2.2 Trace Refinement

Given two MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$ with the same set $$\mathsf{L}$$ of labels, we say that $$\mathcal{E}$$refines $$\mathcal {D}$$, denoted by $$\mathcal {D}\sqsubseteq \mathcal{E}$$, if for all strategies $$\alpha$$ for $$\mathcal {D}$$ there exists some strategy $$\beta$$ for $$\mathcal{E}$$ such that $$Tr _{\mathcal {D}} = Tr _{\mathcal{E}}$$. We are interested in the problem $$\mathsf{MDP\sqsubseteq MDP}$$, which asks, for two given MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$, whether $$\mathcal {D}\sqsubseteq \mathcal{E}$$. The decidability of this problem was posed as an open question in . We show in Theorem 2 that the problem $$\mathsf{MDP\sqsubseteq MDP}$$ is undecidable.

We consider various subproblems of $$\mathsf{MDP\sqsubseteq MDP}$$, which asks whether $$\mathcal {D}\sqsubseteq \mathcal{E}$$ holds. Specifically, we speak of the problem
• $$\mathsf{MDP\sqsubseteq MC}$$ when $$\mathcal{E}$$ is restricted to be an MC;

• $$\mathsf{MC\sqsubseteq MDP}$$ when $$\mathcal {D}$$ is restricted to be an MC;

• $$\mathsf{MC\sqsubseteq MC}$$ when both $$\mathcal {D}$$ and $$\mathcal{E}$$ are restricted to be MCs.

We show in Theorem 2 that even the problem $$\mathsf{MC\sqsubseteq MDP}$$ is undecidable. Hence we consider further subproblems. Specifically, we denote by $$\mathsf{MC\sqsubseteq MDP_{m}}$$ the problem where the MDP is restricted to use only memoryless strategies, and by $$\mathsf{MC\sqsubseteq MDP_{pm}}$$ the problem where the MDP is restricted to use only pure memoryless strategies. When both MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$ are restricted to use only pure memoryless strategies, the trace-refinement problem is denoted by $$\mathsf{MDP_{pm}\sqsubseteq MDP_{pm}}$$. The problem $$\mathsf{MC\sqsubseteq MC}$$ equals the trace-equivalence problem for MCs: given two MCs $$\mathcal {C}_1,\mathcal {C}_2$$ we have $$\mathcal {C}_1 \sqsubseteq C_2$$ if and only if $$Tr _{\mathcal {C}_1} = Tr _{\mathcal {C}_2}$$ if and only if $$\mathcal {C}_2 \sqsubseteq \mathcal {C}_1$$. This problem is known to be in NC , hence in P.

## 3 Undecidability Results

In this section we show:

### Theorem 2

The problem $$\mathsf{MC\sqsubseteq MDP}$$ is undecidable. Hence a fortiori, $$\mathsf{MDP\sqsubseteq MDP}$$ is undecidable.

### Proof

To show that the problem $$\mathsf{MC \sqsubseteq MDP}$$ is undecidable, we establish a reduction from the universality problem for probabilistic automata. A probabilistic automaton is a tuple $$\mathcal {A}= \langle Q,\mu _0,\mathsf{L},\delta ,F \rangle$$ consisting of a finite set Q of states, an initial distribution $$\mu _0 \in \mathsf{Dist}(Q)$$, a finite set $$\mathsf{L}$$ of letters, a transition function $$\delta : Q \times \mathsf{L}\rightarrow \mathsf{Dist}(Q)$$ assigning to every state and letter a distribution over states, and a set F of final states. For a word $$w \in \mathsf{L}^*$$ we write $$dis _\mathcal {A}(w) \in \mathsf{Dist}(Q)$$ for the distribution such that, for all $$q \in Q$$, we have that $$dis _\mathcal {A}(w)(q)$$ is the probability that, after inputting w, the automaton $$\mathcal {A}$$ is in state q. We write $$\mathsf{Pr}_\mathcal {A}(w) = \sum _{q \in F} dis _\mathcal {A}(w)(q)$$ to denote the probability that $$\mathcal {A}$$ accepts w. The universality problem asks, given a probabilistic automaton $$\mathcal {A}$$, whether $$\mathsf{Pr}_\mathcal {A}(w) \ge \frac{1}{2}$$ holds for all words w. This problem is known to be undecidable .

Let $$\mathcal {A}= \langle Q,\mu _0,\mathsf{L},\delta ,F \rangle$$ be a probabilistic automaton; without loss of generality we assume that $$\mathsf{L}= \{ a,b \}$$. We construct an MDP $$\mathcal {D}$$ such that $$\mathcal {A}$$ is universal if and only if $$\mathcal {C}\sqsubseteq \mathcal {D}$$ where $$\mathcal {C}$$ is the MC shown in Fig. 1. The MDP $$\mathcal {D}$$ is constructed from $$\mathcal {A}$$ as follows; see Fig. 3. Fig. 3. Sketch of the construction of the MDP $$\mathcal {D}$$ from the probabilistic automaton $$\mathcal {A}$$, for the undecidability result of $$\mathsf{MC \sqsubseteq MDP}$$. Here, p is an accepting state whereas q is not. To read the picture, note that in p there is a transition to the state $$p_1$$ with probability x and label a: $$\delta (p,a)(p_1)=x$$.
Its set of states is $$Q \cup \{ q_c,q_d \}$$, and its initial distribution is $$\mu _0$$. (Here and in the following we identify subdistributions $$\mu \in \mathsf{subDist}(Q)$$ and $$\mu \in \mathsf{subDist}(Q \cup \{q_c,q_d\})$$ if $$\mu (q_c) = \mu (q_d) = 0$$.) We describe the transitions of $$\mathcal {D}$$ using the transition function $$\delta$$ of $$\mathcal {A}$$. Consider a state $$q \in Q$$:
• If $$q \in F$$, there are two available moves $$\mathsf{m}_c,\mathsf{m}_d$$; both emit a with probability $$\frac{1}{4}$$ and simulate the probabilistic automaton $$\mathcal {A}$$ reading the letter a, or emit b with probability $$\frac{1}{4}$$ and simulate the probabilistic automaton $$\mathcal {A}$$ reading the letter b. With the remaining probability of $$\frac{1}{2}$$, $$\mathsf{m}_c$$ emits c and leads to $$q_c$$ and $$\mathsf{m}_d$$ emits d and leads to $$q_d$$. Formally, $$\mathsf{m}_c(c,q_c)=\frac{1}{2}$$, $$\mathsf{m}_d(d,q_d)=\frac{1}{2}$$ and $$\mathsf{m}_c(e,q')=\mathsf{m}_d(e,q')=\frac{1}{4}\delta (q,e)(q')$$ where $$q' \in Q$$ and $$e\in \{a,b\}$$.

• If $$q \notin F$$, there is a single available move $$\mathsf{m}$$ such that $$\mathsf{m}(d,q_d)=\frac{1}{2}$$ and $$\mathsf{m}(e,q')=\frac{1}{4}\delta (q,e)(q')$$ where $$q'\in Q$$ and $$e\in \{a,b\}$$.

• The only move from $$q_c$$ is the Dirac distribution on $$(c,q_c)$$; likewise the only move from $$q_d$$ is the Dirac distribution on $$(d,q_d)$$.

This MDP $$\mathcal {D}$$ “is almost” an MC, in the sense that a strategy $$\alpha$$ does not influence its behaviour until eventually a transition to $$q_c$$ or $$q_d$$ is taken. Indeed, for all $$\alpha$$ and for all $$w \in \{a,b\}^*$$ we have $$subDis _{\mathcal {D},\alpha }(w) = \frac{1}{4^{|w|}} dis _\mathcal {A}(w)$$. In particular, it follows $$Tr _{\mathcal {D},\alpha }(w) = || subDis _{\mathcal {D},\alpha }(w)|| = \frac{1}{4^{|w|}} || dis _\mathcal {A}(w)|| = \frac{1}{4^{|w|}}$$. Further, if $$\alpha$$ is trace-based we have:
\begin{aligned} \begin{aligned} Tr _{\mathcal {D},\alpha }(w c)&= || subDis _{\mathcal {D},\alpha }(w c)|| \qquad \qquad \qquad \qquad \qquad \text {by (1)} \\&= subDis _{\mathcal {D},\alpha }(w c)(q_c) \qquad \qquad \qquad \qquad \quad \text {structure of }~\mathcal {D}\\&= \sum _{q \in F} subDis _{\mathcal {D},\alpha }(w)(q) \cdot \alpha (w, q)(\mathsf{m}_c) \cdot \frac{1}{2} \quad \quad \text {structure of } \mathcal {D}\\&= \frac{1}{4^{|w|}} \sum _{q \in F} dis _\mathcal {A}(w)(q) \cdot \alpha (w, q)(\mathsf{m}_c) \cdot \frac{1}{2} \qquad \text {as argued above} \end{aligned} \end{aligned}
(2)
We show that $$\mathcal {A}$$ is universal if and only if $$\mathcal {C}\sqsubseteq \mathcal {D}$$. Let $$\mathcal {A}$$ be universal. Define a trace-based strategy $$\alpha$$ with $$\alpha (w, q)(\mathsf{m}_c) = \frac{1}{2 \mathsf{Pr}_\mathcal {A}(w)}$$ for all $$w \in \{a,b\}^*$$ and $$q \in F$$. Note that $$\alpha (w, q)(\mathsf{m}_c)$$ is a probability as $$\mathsf{Pr}_\mathcal {A}(w) \ge \frac{1}{2}$$. Let $$w \in \{a, b\}^*$$. We have:
\begin{aligned} Tr _{\mathcal {D}, \alpha }(w)&= \frac{1}{4^{|w|}} \qquad \text {as argued above} \\&= Tr _{\mathcal {C}}(w) \quad \text {Figure}\,\text {1} \end{aligned}
Further we have:
\begin{aligned} Tr _{\mathcal {D}, \alpha }(w c)&= \frac{1}{4^{|w|}} \sum _{q \in F} dis _\mathcal {A}(w)(q) \cdot \alpha (w, q)(\mathsf{m}_c) \cdot \frac{1}{2} \qquad \text {by (2)} \\&= \frac{1}{4^{|w|}} \sum _{q \in F} dis _\mathcal {A}(w)(q) \cdot \frac{1}{\mathsf{Pr}_\mathcal {A}(w)} \cdot \frac{1}{4} \qquad \quad \text {definition of }\alpha \\&= \frac{1}{4^{|w|+1}} \qquad \qquad \qquad \qquad \qquad \qquad \ \ \mathsf{Pr}_\mathcal {A}(w) = \sum _{q \in F} dis _\mathcal {A}(w)(q) \\&= Tr _{\mathcal {C}}(w c) \qquad \qquad \qquad \qquad \qquad \qquad \ \text {Figure}\,\text {1} \end{aligned}
It follows from the definitions of $$\mathcal {D}$$ and $$\mathcal {C}$$ that for all $$k \ge 1$$, we have $$Tr _{\mathcal {D},\alpha }(w c^k) = Tr _{\mathcal {D},\alpha }(w c) = Tr _{\mathcal {C}}(w c) = Tr _{\mathcal {C}}(w c^k)$$. We have $$\sum _{e \in \{a,b,c,d\}} Tr _{\mathcal {D},\alpha }(w e) = Tr _{\mathcal {D},\alpha }(w) = Tr _{\mathcal {C}}(w) = \sum _{e \in \{a,b,c,d\}} Tr _{\mathcal {C}}(w e)$$. Since for $$e \in \{a,b,c\}$$ we also proved that $$Tr _{\mathcal {D},\alpha }(w e) = Tr _{\mathcal {C}}(w e)$$ it follows that $$Tr _{\mathcal {D},\alpha }(w d) = Tr _{\mathcal {C}}(w d)$$. Hence, as above, $$Tr _{\mathcal {D},\alpha }(w d^k) = Tr _{\mathcal {C}}(w d^k)$$ for all $$k \ge 1$$. Finally, if $$w \notin (a+b)^* \cdot (c^* + d^*)$$ then $$Tr _{\mathcal {D},\alpha }(w) = 0 = Tr _{\mathcal {C}}(w)$$.
For the converse, assume that $$\mathcal {A}$$ is not universal. Then there is $$w \in \{a,b\}^*$$ with $$\mathsf{Pr}_\mathcal {A}(w) < \frac{1}{2}$$. Let $$\alpha$$ be a trace-based strategy. Then we have:
\begin{aligned} Tr _{\mathcal {D}, \alpha }(w c)&= \frac{1}{4^{|w|}} \sum _{q \in F} dis _\mathcal {A}(w)(q) \cdot \alpha (w, q)(\mathsf{m}_c) \cdot \frac{1}{2} \qquad \text {by (2)} \\&\le \frac{1}{4^{|w|}} \cdot \frac{1}{2} \cdot \sum _{q \in F} dis _\mathcal {A}(w)(q) \qquad \qquad \qquad \ \alpha (w, q)(\mathsf{m}_c) \le 1 \\&= \frac{1}{4^{|w|}} \cdot \frac{1}{2} \cdot \mathsf{Pr}_\mathcal {A}(w) \qquad \qquad \qquad \qquad \ \ \ \mathsf{Pr}_\mathcal {A}(w) = \sum _{q \in F} dis _\mathcal {A}(w)(q) \\&< \frac{1}{4^{|w|}} \cdot \frac{1}{2} \cdot \frac{1}{2} \qquad \qquad \qquad \qquad \qquad \ \text {definition of } w \\&= Tr _\mathcal {C}(w c) \qquad \qquad \qquad \qquad \qquad \quad \ \ \ \text {Figure}\,\text {1} \end{aligned}
We conclude that there is no trace-based strategy $$\alpha$$ with $$Tr _{\mathcal {D},\alpha } = Tr _\mathcal {C}$$. By Lemma 1 there is no strategy $$\alpha$$ with $$Tr _{\mathcal {D},\alpha } = Tr _\mathcal {C}$$. Hence $$\mathcal {C}\not \sqsubseteq \mathcal {D}$$.    $$\square$$

A straightforward reduction from $$\mathsf{MDP\sqsubseteq MDP}$$ now establishes:

### Theorem 3

The problem that, given two MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$, asks whether $$\mathcal {D}\sqsubseteq \mathcal{E}$$ and $$\mathcal{E}\sqsubseteq \mathcal {D}$$ is undecidable.

## 4 Decidability for Memoryless Strategies

Given two MCs $$\mathcal {C}_1$$ and $$\mathcal {C}_2$$, the (symmetric) trace-equivalence relation $$\mathcal {C}_1 \sqsubseteq \mathcal {C}_2$$ is polynomial-time decidable . An MDP $$\mathcal {D}$$ under a memoryless strategy $$\alpha$$ induces a finite MC $$\mathcal {D}(\alpha )$$, and thus once a memoryless strategy is fixed for the MDP, its relation to another given MC in the trace-equivalence relation $$\sqsubseteq$$ can be decided in P. Theorems 4 and 5 provide tight complexity bounds of the trace-refinement problems for MDPs that are restricted to use only pure memoryless strategies. In Theorems 6 and 7 we establish bounds on the complexity of the problem when randomization is allowed for memoryless strategies.

### 4.1 Pure Memoryless Strategies

In this subsection, we show that the two problems $$\mathsf{MC\sqsubseteq MDP_{pm}}$$ and $$\mathsf{MDP_{pm}\sqsubseteq MDP_{pm}}$$ are NP-complete and $$\varPi ^p_2$$-complete.

Membership of $$\mathsf{MC\sqsubseteq MDP_{pm}}$$ in NP and $$\mathsf{MDP_{pm}\sqsubseteq MDP_{pm}}$$ in $$\varPi ^p_2$$ are obtained as follows. Given an MC $$\mathcal {C}$$ and an MDP $$\mathcal {D}$$, the polynomial witness of $$\mathcal {C}\sqsubseteq \mathcal {D}$$ is a pure memoryless strategy $$\alpha$$ for $$\mathcal {D}$$. Once $$\alpha$$ is fixed, then $$\mathcal {C}\sqsubseteq \mathcal {D}(\alpha )$$ can be decided in P. Given another MDP $$\mathcal{E}$$, for all pure memoryless strategies $$\beta$$ of $$\mathcal{E}$$ whether there exists a polynomial witness $$\alpha$$ for $$\mathcal{E}\sqsubseteq \mathcal {D}$$ such that $$\mathcal{E}(\beta ) \sqsubseteq \mathcal {D}(\alpha )$$ can be decided in P.

The hardness results are by reductions from the subset-sum problem and a variant of the quantified subset-sum problem. Given a set $$\{s_1, s_2,\cdots ,s_n\}$$ of natural numbers and $$N\in \mathbb N$$, the subset-sum problem asks whether there exists a subset $$S\subseteq \{s_1, \cdots ,s_n\}$$ such that $$\sum _{s\in S}s=N$$. The subset-sum problem is known to be NP-complete . The quantified version of subset sum is a game between a universal player and an existential player. Given $$k,N\in \mathbb N$$, the game is played turn-based for k rounds. For each round $$1\le i \le k$$, two sets $$\{s_1, s_2,\cdots ,s_n\}$$ and $$\{t_1, t_2,\cdots ,t_m\}$$ of natural numbers are given. In each round i, the universal player first chooses $$S_i\subseteq \{s_1, \cdots ,s_n\}$$ and then the existential player chooses $$T_i\subseteq \{t_1, \cdots ,t_m\}$$. The existential player wins if and only if
\begin{aligned} \sum _{s\in S_1}s+\sum _{t\in T_1}t+ \cdots +\sum _{s\in S_k}s+\sum _{t\in T_k}t =N. \end{aligned}
The quantified subset sum is known to be PSPACE-complete . The proof therein implies that the variant of the problem with a fixed number k of rounds is $$\varPi ^p_{2k}$$-complete.
To establish the NP-hardness of $$\mathsf{MC\sqsubseteq MDP_{pm}}$$, consider an instance of subset sum, i.e., a set $$\{s_1, \cdots ,s_n\}$$ and $$N\in \mathbb N$$. We construct an MC $$\mathcal {C}$$ and an MDP $$\mathcal {D}$$ such that there exists $$S\subseteq \{s_1, \cdots ,s_n\}$$ with $$\sum _{s\in S}s=N$$ if and only if  $$\mathcal {C}\sqsubseteq \mathcal {D}$$ when $$\mathcal {D}$$ uses only pure memoryless strategies. Fig. 4. The MC $$\mathcal {C}$$ in the reduction for NP-hardness of $$\mathsf{M \sqsubseteq MDP_m}$$. Fig. 5. The gadget $$G_u$$ for the set $$\{u_1,\cdots ,u_k\}$$.

The MC $$\mathcal {C}$$ is shown in Fig. 4. it generates traces in $$ab^+$$ with probability $$\frac{N}{P}$$ and traces in $$ac^+$$ with probability $$1-\frac{N}{P}$$ where $$P=s_1+\cdots +s_n$$.

For a set $$\{u_1,\cdots , u_k\}$$, we define a gadget $$G_u$$ that is an MDP with $$k+2$$ states: $$u_1, \cdots ,u_k$$ and $$u_b,u_c$$; see Fig. 5. For all states $$u_i$$, two moves $$\mathsf{m}_{i,b}$$ and $$\mathsf{m}_{i,c}$$ are available, the Dirac distributions on $$(a,u_b)$$ and $$(a,u_c)$$. The states $$u_{b},u_c$$ emit only the single labels b and c. The MDP $$\mathcal {D}$$ is exactly the gadget $$G_s$$ for $$\{s_1,\cdots , s_n\}$$ equipped with the initial distribution $$\mu _0$$ where $$\mu _0(s_i)=\frac{s_i}{P}$$ for all $$1\le i \le n$$. Choosing b in $$s_i$$ simulates the membership of $$s_i$$ in S by adding $$\frac{s_i}{P}$$ to the probability of generating $$ab^{+}$$.

### Theorem 4

The problem $$\mathsf{MC\sqsubseteq MDP_{pm}}$$ is NP-complete.

To establish the $$\varPi ^p_2$$-hardness of $$\mathsf{MDP_{pm}\sqsubseteq MDP_{pm}}$$, consider an instance of quantified subset sum, i.e., $$N\in \mathbb N$$ and two sets $$\{s_1,\cdots ,s_n\}$$ and $$\{t_1,\cdots ,t_m\}$$. We construct MDPs $$\mathcal{E}_{univ},\mathcal{E}_{exist}$$ such that the existential player wins in one round if and only if $$\mathcal{E}_{univ} \sqsubseteq \mathcal{E}_{exist}$$ restricted to use pure memoryless strategies.

Let $$P=s_1+\cdots +s_n$$ and $$R=t_1+\cdots +t_m$$. Pick a small real number $$0<x<1$$ so that $$0 < x P, x R, x N < 1$$. Pick real numbers $$0\le y_1,y_2 \le 1$$ such that $$y_1+x N=y_2+x R$$.

The MDPs $$\mathcal{E}_{univ}$$ and $$\mathcal{E}_{exist}$$ have symmetric constructions. To simulate the choice of the universal player, the MDP $$\mathcal{E}_{univ}$$ is the gadget $$G_s$$ for the set $$\{s_1,\cdots ,s_n\}$$ where two new states $$s_r,s_y$$ are added. The transitions in $$s_r$$ and $$s_y$$ are the Dirac distributions on $$(a,s_b)$$ and $$(a,s_c)$$, respectively. The initial distribution $$\mu _0$$ for $$\mathcal{E}_{univ}$$ is such that $$\mu _0(s_y)=\frac{1}{2}y_1$$ and $$\mu _0(s_r)=1-\frac{1}{2}(xP+y_1)$$, and $$\mu _0(s_i)=\frac{1}{2}xs_i$$ for all $$1\le i \le n$$. Similarly, to simulate the counter-attack of the existential player, the MDP $$\mathcal{E}_{exist}$$ is the gadget $$G_t$$ for  $$\{t_1,\cdots ,t_m\}$$ with two new states $$t_r,t_y$$. The transitions in $$t_r$$ and $$t_y$$ are the Dirac distributions on $$(a,t_b)$$ and $$(a,t_c)$$, respectively. The initial distribution $$\mu '_0$$ is $$\mu '_0(p_y)=\frac{1}{2}y_2$$ and $$\mu '_0(p_r)=1-\frac{1}{2}(xT+y_2)$$, and $$\mu '_0(t_j)=\frac{1}{2}xt_j$$ for all $$1\le j \le m$$. Choosing b in a set of states $$s_i$$ by the universal player must be defended by choosing c in a right set of states $$t_j$$ by existential player such that the probabilities of emitting $$ab^{+}$$ in MDPs are equal.

### Theorem 5

The problem $$\mathsf{MDP_{pm}\sqsubseteq MDP_{pm}}$$ is $$\varPi ^p_2$$-complete.

### 4.2 Memoryless Strategies

In this subsection, we provide upper and lower complexity bounds for the problem $$\mathsf{MC\sqsubseteq MDP_{m}}$$: a reduction to the existential theory of the reals and a reduction from nonnegative factorization of matrices.

A formula of the existential theory of the reals is of the form $$\exists x_1 \ldots \exists x_m~R(x_1, \ldots , x_n)$$, where $$R(x_1, \ldots , x_n)$$ is a boolean combination of comparisons of the form $$p(x_1, \ldots , x_n) \sim 0$$, where $$p(x_1, \ldots , x_n)$$ is a multivariate polynomial and $$\mathord {\sim } \in \{ \mathord {<}, \mathord {>}, \mathord {\le }, \mathord {\ge }, \mathord {=}, \mathord {\ne } \}$$. The validity of closed formulas (i.e., when $$m=n$$) is decidable in PSPACE [4, 18], and is not known to be PSPACE-hard.

### Theorem 6

The problem $$\mathsf{MC\sqsubseteq MDP_{m}}$$ is polynomial-time reducible to the existential theory of the reals, hence in PSPACE. Fig. 6. The MC $$\mathcal {C}$$ of the reduction from NMF to $$\mathsf{MC\sqsubseteq MDP_m}$$.

Given a nonnegative matrix $$M \in \mathbb {R}^{n \times m}$$, a nonnegative factorization of M is any representation of the form $$M = A \cdot W$$ where $$A \in \mathbb {R}^{n \times r}$$ and $$W \in \mathbb {R}^{r \times m}$$ are nonnegative matrices (see [2, 5, 22] for more details). The NMF problem asks, given a nonnegative matrix $$M \in \mathbb {R}^{n \times m}$$ and a number $$r \in \mathbb N$$, whether there exists a factorization $$M = A \cdot W$$ with nonnegative matrices $$A \in \mathbb {R}^{n \times r}$$ and $$W \in \mathbb {R}^{r \times m}$$. The NMF problem is known to be NP-hard, but membership in NP is open .

Below, we present a reduction from the NMF problem to $$\mathsf{MC\sqsubseteq MDP_m}$$. To establish the reduction, consider an instance of the NMF problem, i.e., a nonnegative matrix $$M \in \mathbb {R}^{n \times m}$$ and a number $$r \in \mathbb N$$. We construct an MC $$\mathcal {C}$$ and an MDP $$\mathcal {D}$$ such that the NMF instance is a yes-instance if and only if $$\mathcal {C}\sqsubseteq \mathcal {D}$$ where $$\mathcal {D}$$ is restricted to use only memoryless strategies.

We assume, without loss of generality, that M is a stochastic matrix, that is $$\sum \limits _{j=1}^m M[i,j]=1$$ for all rows $$1\le i\le n$$. We know, by [2, Sect. 5], that there exists a nonnegative factorization of M with rank r if and only if there exist two stochastic matrices $$A \in \mathbb {R}^{n \times r}$$ and $$W \in \mathbb {R}^{r \times m}$$ such that $$M=A \cdot W$$.

The transition probabilities in the MC $$\mathcal {C}$$ encode the entries of matrix M. The initial distribution of the MC is the Dirac distribution on $$q_{{ in }}$$; see Fig. 6. There are $$n+m+1$$ labels $$a_1,\cdots ,a_n,b_1,\cdots ,b_m,c$$. The transition in $$q_{{ in }}$$ is the uniform distribution over $$\{(a_i,q_i) \mid 1\le i \le n\}$$. In each state $$q_i$$, each label $$b_j$$ is emitted with probability M[ij], and a transition to $$q_{{ fi }}$$ is taken. In state $$q_{{ fi }}$$ only c is emitted. Observe that for all $$1\le i\le n$$ and $$1\le j\le m$$ we have $$Tr _{\mathcal {C}}(a_i)=\frac{1}{n}$$ and $$Tr _{\mathcal {C}}(a_i\cdot b_j \cdot c^{*})=\frac{1}{n}M[i,j]$$.

The initial distribution of the MDP $$\mathcal {D}$$ is the uniform distribution over $$\{p_1,\cdots ,p_n\}$$; see Fig. 7. In each $$p_i$$ (where $$1\le i\le n$$), there are r moves $$\mathsf{m}_{i,1}, \mathsf{m}_{i,2},\cdots ,\mathsf{m}_{i,r}$$ where $$\mathsf{m}_{i,k}(a_i,\ell _k)=1$$ and $$1\le k \le r$$. In each $$\ell _k$$, there are m moves $$\mathsf{m}'_{k,1}, \mathsf{m}'_{k,2},\cdots ,\mathsf{m}'_{k,m}$$ where $$\mathsf{m}'_{k,j}(b_j,p_{{ fi }})=1$$ where $$1\le j\le m$$. In state $$p_{{ fi }}$$, only c is emitted. The probabilities of choosing the move $$m_{i,k}$$ in $$p_i$$ and choosing $$m'_{k,j}$$ in $$\ell _k$$ simulate the entries of A[ik] and W[kj]. Fig. 7. The MDP $$\mathcal {D}$$ of the reduction from NMF to $$\mathsf{MC\sqsubseteq MDP_m}$$.

### Theorem 7

The NMF problem is polynomial-time reducible to $$\mathsf{MC\sqsubseteq MDP_m}$$, hence $$\mathsf{MC\sqsubseteq MDP_m}$$ is NP-hard.

Recall that it is open whether the NMF problem is in NP and whether the existential theory of the reals is PSPACE-hard. So Theorems 6 and 7 show that proving NP-completeness or PSPACE-completeness of $$\mathsf{MC\sqsubseteq MDP_m}$$ requires a breakthrough in those areas.

## 5 Bisimulation

In this section we show:

### Theorem 8

The problem $$\mathsf{MDP\sqsubseteq MC}$$ is in NC, hence in P.

We prove Theorem 8 in two steps: First, in Proposition 9 below, we establish a link between trace refinement and a notion of bisimulation between distributions that was studied in . Second, we show that this notion of bisimulation can be decided efficiently (in NC, hence in P) if one of the MDPs is an MC. Proposition 9 then implies Theorem 8. Along the way, we prove that bisimulation between two MDPs can be decided in coNP, improving the exponential-time result from . We rebuild a detailed proof from scratch, not referring to , as the authors were unable to verify some of the technical claims made in .

A local strategy for an MDP $$\mathcal {D}= \langle Q,\mu _{0},\mathsf{L},\delta \rangle$$ is a function $$\alpha : Q \rightarrow \mathsf{Dist}(\mathsf{moves})$$ that maps each state q to a distribution $$\alpha (q)\in \mathsf{Dist}(\mathsf{moves}(q))$$ over moves in q. We call $$\alpha$$pure if for all states q there is a move $$\mathsf{m}$$ such that $$\alpha (q)(\mathsf{m}) = 1$$. For a subdistribution $$\mu \in \mathsf{subDist}(Q)$$, a local strategy $$\alpha$$, and a label $$a \in \mathsf{L}$$, define the successor subdistribution $$\mathsf{Succ}(\mu , \alpha , a)$$ with
$$\mathsf{Succ}(\mu , \alpha , a)(q') = \sum _{q \in Q} \mu (q) \cdot \sum _{\mathsf{m}\in \mathsf{moves}(q)} \alpha (q)(\mathsf{m}) \cdot \mathsf{m}(a, q')$$
for all $$q' \in Q$$. Let $$\mathcal {D}= \langle Q_\mathcal {D},\mu ^{\mathcal {D}}_{0},\mathsf{L},\delta _{\mathcal {D}} \rangle$$ and $$\mathcal{E}= \langle Q_\mathcal{E},\mu ^{\mathcal{E}}_{0},\mathsf{L},\delta _{\mathcal{E}} \rangle$$ be two MDPs over the same set $$\mathsf{L}$$ of labels. A bisimulation is a relation $$\mathcal {R}\subseteq \mathsf{subDist}(Q_\mathcal {D}) \times \mathsf{subDist}(Q_\mathcal{E})$$ such that whenever $$\mu _\mathcal {D}\mathrel {\mathcal {R}} \mu _\mathcal{E}$$ then
• $$||\mu _\mathcal {D}|| = ||\mu _\mathcal{E}||$$;

• for all local strategies $$\alpha _\mathcal {D}$$ there exists a local strategy $$\alpha _\mathcal{E}$$ such that for all $$a \in \mathsf{L}$$ we have $$\mathsf{Succ}(\mu _\mathcal {D}, \alpha _\mathcal {D}, a) \mathrel {\mathcal {R}} \mathsf{Succ}(\mu _\mathcal{E}, \alpha _\mathcal{E}, a)$$;

• for all local strategies $$\alpha _\mathcal{E}$$ there exists a local strategy $$\alpha _\mathcal {D}$$ such that for all $$a \in \mathsf{L}$$ we have $$\mathsf{Succ}(\mu _\mathcal {D}, \alpha _\mathcal {D}, a) \mathrel {\mathcal {R}} \mathsf{Succ}(\mu _\mathcal{E}, \alpha _\mathcal{E}, a)$$.

As usual, a union of bisimulations is a bisimulation. Denote by $$\mathord {\sim }$$ the union of all bisimulations, i.e., $$\mathord {\sim }$$ is the largest bisimulation. We write $$\mathcal {D}\sim \mathcal{E}$$ if $$\mu ^\mathcal {D}_{0} \sim \mu ^{\mathcal{E}}_{0}$$. In general, the set $$\sim$$ is uncountably infinite, so methods for computing state-based bisimulation (e.g., partition refinement) are not applicable. The following proposition establishes a link between trace refinement and bisimulation.

### Proposition 9

Let $$\mathcal {D}$$ be an MDP and $$\mathcal {C}$$ be an MC. Then $$\mathcal {D}\sim \mathcal {C}$$ if and only if $$\mathcal {D}\sqsubseteq \mathcal {C}$$.

We often view a subdistribution $$d \in \mathsf{subDist}(Q)$$ as a row vector $$d \in [0,1]^Q$$. For a local strategy $$\alpha$$ and a label a, define the transition matrix$$\varDelta _{\alpha }(a) \in [0,1]^{Q \times Q}$$ with $$\varDelta _{\alpha }(a)[q,q'] = \sum _{\mathsf{m}\in \mathsf{moves}(q)} \alpha (q)(\mathsf{m}) \cdot \mathsf{m}(a, q')$$. Viewing subdistributions $$\mu$$ as row vectors, we have:
\begin{aligned} \mathsf{Succ}(\mu , \alpha , a) = \mu \cdot \varDelta _{\alpha }(a) \end{aligned}
(3)
In the following we consider MDPs $$\mathcal {D}= \langle Q,\mu ^{\mathcal {D}}_{0},\mathsf{L},\delta \rangle$$ and $$\mathcal{E}= \langle Q,\mu ^{\mathcal{E}}_{0},\mathsf{L},\delta \rangle$$ over the same state space. This is without loss of generality, since we might take the disjoint union of the state spaces. Since $$\mathcal {D}$$ and $$\mathcal{E}$$ differ only in the initial distribution, we will focus on $$\mathcal {D}$$.
Let $$B \in \mathbb {R}^{Q \times k}$$ with $$k \ge 1$$. Assume the label set is $$\mathsf{L}= \{a_1, \ldots , a_{|\mathsf{L}|}\}$$. For $$\mu \in \mathsf{subDist}(Q)$$ and a local strategy $$\alpha$$ we define a point $$p(\mu , \alpha ) \in \mathbb {R}^{{|\mathsf{L}|} \cdot k}$$ such that
$$p(\mu , \alpha ) \ = \ \big ( \mu \varDelta _{\alpha }(a_1) B \quad \mu \varDelta _{\alpha }(a_2) B \quad \cdots \quad \mu \varDelta _{\alpha }(a_{|\mathsf{L}|}) B \big ).$$
For the reader’s intuition, we remark that we will choose matrices $$B \in \mathbb {R}^{Q \times k}$$ so that if two subdistributions $$\mu _\mathcal {D}, \mu _\mathcal{E}$$ are bisimilar then $$\mu _\mathcal {D}B = \mu _\mathcal{E}B$$. (In fact, one can compute B so that the converse holds as well, i.e., $$\mu _\mathcal {D}\sim \mu _\mathcal{E}$$ if and only if $$\mu _\mathcal {D}B = \mu _\mathcal{E}B$$.) It follows that, for subdistributions $$\mu _\mathcal {D}, \mu _\mathcal{E}$$ and local strategies $$\alpha _\mathcal {D}, \alpha _\mathcal{E}$$, if $$\mathsf{Succ}(\mu _\mathcal {D}, \alpha _\mathcal {D}, a) \sim \mathsf{Succ}(\mu _\mathcal{E}, \alpha _\mathcal{E}, a)$$ holds for all $$a \in \mathsf{L}$$ then $$p(\mu _\mathcal {D},\alpha _\mathcal {D}) = p(\mu _\mathcal{E},\alpha _\mathcal{E})$$. Let us also remark that for fixed $$\mu \in \mathsf{subDist}(Q)$$, the set $$P_\mu = \{p(\mu ,\alpha ) \mid \alpha \text { is a local strategy}\} \subseteq \mathbb {R}^{{|\mathsf{L}|} \cdot k}$$ is a (bounded and convex) polytope. As a consequence, if $$\mu _\mathcal {D}\sim \mu _\mathcal{E}$$ then the polytopes $$P_{\mu _\mathcal {D}}$$ and $$P_{\mu _\mathcal{E}}$$ must be equal. In the next paragraph we define “extremal” strategies $$\widehat{\alpha }$$, which intuitively are local strategies such that $$p(\mu ,\widehat{\alpha })$$ is a vertex of the polytope $$P_{\mu }$$.
Let $$\mathbf {v} \in \mathbb {R}^{{|\mathsf{L}|} \cdot k}$$ be a column vector; we denote column vectors in boldface. We view $$\mathbf {v}$$ as a “direction”. Recall that $$d_q$$ is the Dirac distribution on the state q. A pure local strategy $$\widehat{\alpha }$$ is extremal in direction $$\mathbf {v}$$with respect to B if
\begin{aligned}&p(d_q, \alpha ) \mathbf {v} \le p(d_q, \widehat{\alpha }) \mathbf {v} \end{aligned}
(4)
\begin{aligned}&p(d_q, \alpha ) \mathbf {v} = p(d_q, \widehat{\alpha }) \mathbf {v} \quad \text {implies} \quad p(d_q, \alpha ) = p(d_q, \widehat{\alpha }) \end{aligned}
(5)
for all states $$q \in Q$$ and all pure local strategies $$\alpha$$.

By linearity, if (4) and (5) hold for all pure local strategies $$\alpha$$ then (4) and (5) hold for all local strategies $$\alpha$$. We say a local strategy $$\widehat{\alpha }$$ is extremal with respect to B if there is a direction $$\mathbf {v}$$ such that $$\widehat{\alpha }$$ is extremal in direction $$\mathbf {v}$$ with respect to B.

### Proposition 10

Let $$\mathcal {D}= \langle Q,\mu _{0},\mathsf{L},\delta \rangle$$ be an MDP. Let $$B_1 \in \mathbb {R}^{Q \times k_1}$$ and $$B_2 \in \mathbb {R}^{Q \times k_2}$$ for $$k_1, k_2 \ge 1$$. Denote by $$\mathcal {V}_1, \mathcal {V}_2 \subseteq \mathbb {R}^Q$$ the subspaces spanned by the columns of $$B_1, B_2$$, respectively. Assume $$\mathcal {V}_1 \subseteq \mathcal {V}_2$$. If a local strategy $$\widehat{\alpha }$$ is extremal with respect to $$B_1$$ then $$\widehat{\alpha }$$ is extremal with respect to $$B_2$$.

In light of this fact, we may define that $$\widehat{\alpha }$$ be extremal with respect to a column-vector space $$\mathcal {V}$$ if $$\widehat{\alpha }$$ is extremal with respect to a matrix B whose column space equals $$\mathcal {V}$$.

The following proposition describes a vector space $$\mathcal {V}$$ so that two subdistributions are bisimilar if and only if their difference (viewed as a row vector) is orthogonal to $$\mathcal {V}$$.

### Proposition 11

Let $$\mathcal {D}= \langle Q,\mu _{0},\mathsf{L},\delta \rangle$$ be an MDP. Let $$\mathcal {V}\subseteq \mathbb {R}^{Q}$$ be the smallest column-vector space such that
• $$\mathbf {1} = (1 \ 1 \cdots 1)^T \in \mathcal {V}$$ (where T denotes transpose) and

• $$\varDelta _{\widehat{\alpha }}(a) \mathbf {u} \in \mathcal {V}$$ for all $$\mathbf {u} \in \mathcal {V}$$, all labels $$a \in \mathsf{L}$$ and local strategies $$\widehat{\alpha }$$ that are extremal with respect to $$\mathcal {V}$$.

Then for all $$\mu _\mathcal {D}, \mu _\mathcal{E}\in \mathsf{subDist}(Q)$$, we have $$\mu _\mathcal {D}\sim \mu _\mathcal{E}$$ if and only if $$\mu _\mathcal {D}\mathbf {u} = \mu _\mathcal{E}\mathbf {u}$$ for all $$\mathbf {u} \in \mathcal {V}$$.

Proposition 11 allows us to prove the following theorem:

### Theorem 12

The problem that, given two MDPs $$\mathcal {D}$$ and $$\mathcal{E}$$, asks whether $$\mathcal {D}\sim \mathcal{E}$$ is in coNP.

In the following, without loss of generality, we consider an MDP $$\mathcal {D}= \langle Q,\mu ^{\mathcal {D}}_{0},\mathsf{L},\delta \rangle$$ and an MC $$\mathcal {C}= \langle Q_\mathcal {C},\mu ^{\mathcal {C}}_{0},\mathsf{L},\delta \rangle$$ with $$Q_\mathcal {C}\subseteq Q$$. We may view subdistributions $$\mu _\mathcal {C}\in \mathsf{subDist}(Q_\mathcal {C})$$ as $$\mu _\mathcal {C}\in \mathsf{subDist}(Q)$$ in the natural way. The following proposition is analogous to Proposition 11.

### Proposition 13

Let $$\mathcal {D}= \langle Q,\mu ^{\mathcal {D}}_{0},\mathsf{L},\delta \rangle$$ be an MDP and $$\mathcal {C}= \langle Q_\mathcal {C},\mu ^{\mathcal {C}}_{0},\mathsf{L},\delta \rangle$$ be an MC with $$Q_\mathcal {C}\subseteq Q$$. Let $$\mathcal {V}\subseteq \mathbb {R}^{Q}$$ be the smallest column-vector space such that
• $$\mathbf {1} = (1 \ 1 \cdots 1)^T \in \mathcal {V}$$ (where T denotes transpose) and

• $$\varDelta _{\alpha }(a) \mathbf {u} \in \mathcal {V}$$ for all $$\mathbf {u} \in \mathcal {V}$$, all labels $$a \in \mathsf{L}$$ and all local strategies $$\alpha$$.

Then for all $$\mu _\mathcal {D}\in \mathsf{subDist}(Q)$$ and all $$\mu _\mathcal {C}\in \mathsf{subDist}(Q_\mathcal {C})$$, we have $$\mu _\mathcal {D}\sim \mu _\mathcal {C}$$ if and only if $$\mu _\mathcal {D}\mathbf {u} = \mu _\mathcal {C}\mathbf {u}$$ for all $$\mathbf {u} \in \mathcal {V}$$.

Notice the differences to Proposition 11: there we considered all extremal local strategies (potentially exponentially many), here we consider all local strategies (in general infinitely many). However, we show that one can efficiently find few local strategies that span all local strategies. This allows us to reduce (in logarithmic space) the bisimulation problem between an MDP and an MC to the bisimulation problem between two MCs, which is equivalent to the trace-equivalence problem in MCs (by Proposition 9). The latter problem is known to be in NC . Theorem 8 then follows with Proposition 9.

## Footnotes

1. 1.

The complexity class NC is the subclass of P containing those problems that can be solved in polylogarithmic parallel time (see e.g. ).

2. 2.

As in  we speak of moves rather than of actions, to avoid possible confusion with the label alphabet $$\mathsf{L}$$.

## References

1. 1.
Full version. ArXiv CoRR: http://arxiv.org/abs/1510.09102 (2015)
2. 2.
Arora, S., Ge, R., Kannan, R., Moitra, A.: Computing a nonnegative matrix factorization - provably. In: STOC, pp. 145–162. ACM (2012)Google Scholar
3. 3.
Blondel, V.D., Canterini, V.: Undecidable problems for probabilistic automata of fixed dimension. Theor. Comput. Sci. 36(3), 231–245 (2003)
4. 4.
Canny, J.: Some algebraic and geometric computations in PSPACE. In: STOC, pp. 460–467 (1988)Google Scholar
5. 5.
Cohen, J., Rothblum, U.: Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl. 190, 149–168 (1993)
6. 6.
Condon, A., Lipton, R.: On the complexity of space bounded interactive proofs (extended abstract). In: FOCS, pp. 462–467 (1989)Google Scholar
7. 7.
Cormen, T., Stein, C., Rivest, R., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, New York (2001)
8. 8.
Doyen, L., Henzinger, T.A., Raskin, J.-F.: Equivalence of labeled Markov chains. Int. J. Found. Comput. Sci. 19(3), 549–563 (2008)
9. 9.
Fearnley, J., Jurdzinski, M.: Reachability in two-clock timed automata is PSPACE-complete. Inf. Comput. 243, 26–36 (2015)
10. 10.
Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Oxford (1995)
11. 11.
Hermanns, H., Krčál, J., Křetínský, J.: Probabilistic bisimulation: naturally on distributions. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 249–265. Springer, Heidelberg (2014). Technical report at http://arxiv.org/abs/1404.5084 Google Scholar
12. 12.
Kiefer, S., Murawski, A.S., Ouaknine, J., Wachter, B., Worrell, J.: Language equivalence for probabilistic automata. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 526–540. Springer, Heidelberg (2011)
13. 13.
Kiefer, S., Murawski, A.S., Ouaknine, J., Wachter, B., Worrell, J.: $${\sf APEX}$$: an analyzer for open probabilistic programs. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 693–698. Springer, Heidelberg (2012)
14. 14.
Li, L., Feng, Y.: Quantum Markov chains: description of hybrid systems, decidability of equivalence, and model checking linear-time properties. Inf. Comput. 244, 229–244 (2015)
15. 15.
Ngo, T.M., Stoelinga, M., Huisman, M.: Confidentiality for probabilistic multi-threaded programs and its verification. In: Jürjens, J., Livshits, B., Scandariato, R. (eds.) ESSoS 2013. LNCS, vol. 7781, pp. 107–122. Springer, Heidelberg (2013)
16. 16.
Paz, A.: Introduction to Probabilistic Automata. Academic Press, Cambridge (1971)
17. 17.
Peyronnet, S., De Rougemont, M., Strozecki, Y.: Approximate verification and enumeration problems. In: Roychoudhury, A., D’Souza, M. (eds.) ICTAC 2012. LNCS, vol. 7521, pp. 228–242. Springer, Heidelberg (2012)
18. 18.
Renegar, J.: On the computational complexity and geometry of the first-order theory of the reals. Parts I-III. J. Symbolic Comput. 13(3), 255–352 (1992)
19. 19.
Schützenberger, M.-P.: On the definition of a family of automata. Inf. Control 4, 245–270 (1961)
20. 20.
Tzeng, W.: A polynomial-time algorithm for the equivalence of probabilistic automata. SIAM J. Comput. 21(2), 216–227 (1992)
21. 21.
Tzeng, W.: On path equivalence of nondeterministic finite automata. Inf. Process. Lett. 58(1), 43–46 (1996)
22. 22.
Vavasis, S.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)

© Her Majesty the Queen in Right of the United Kingdom 2016

## Authors and Affiliations

• Nathanaël Fijalkow
• 1
Email author
• Stefan Kiefer
• 1