1 Introduction

Functions defined over syntactical structures such as strings, trees and graphs are ubiquitous in computer science. Automata models allow one to succinctly represent such functions. In particular, weighted automata can efficiently model functions mapping structured objects to values in a semi-ring. Weighted automata have been defined to handle functions whose domain are e.g. strings [9, 26], trees [8, 16] and 2-dimensional words [11]. More recently, Bailly et al. [2] proposed a computational model for functions mapping labeled graphs (or hypergraphs) to values in a field (see also [22, Chap. 2]): Graph Weighted Models (GWMs). GWMs extend the notion of linear representation of a function defined over strings and trees to functions defined over graphs labeled by symbols in a ranked alphabet: loosely speaking, while string weighted automata can be defined by associating each symbol in a finite alphabet to a linear map and tree weighted automata by associating each symbol in a ranked alphabet to a multilinear map, GWMs are defined by associating each arity k symbol from a ranked alphabet to a kth order tensor. The computation of a GWM boils down to mapping each vertex in a graph to the tensor associated to its label and performing contractions directed by the edges of the input graph to obtain a value in the supporting field. When restricted to the families of strings, trees or 2-dimensional words, GWMs are expressively equivalent to the classical notions of weighted automata over these structures.

Weighted automata have recently received interest from the machine learning community due to their ability to represent functions defined over structured objects. Efficient (and often consistent) learning algorithms have been developed for such computational models defined over sequences [3, 6, 10, 19] and trees [1, 4, 14]. Motivated by the relevance of learning functions defined over richer families of labeled graphs, our long term objective is to design efficient learning algorithms for GWMs. This is however a challenging task. Given the close relationship between minimization and learning for classical weighted automata (see e.g. [7, 21, 27]), we take a first step in this direction by tackling the problem of minimizing GWMs defined over the simple family of circular strings.

Circular strings are strings whose last symbol is connected to the first. A circular string can be seen as a directed graph where each vertex is labeled by a symbol from a finite alphabet and is connected to his unique successor (i.e. a labeled graph composed of a unique cycle). Circular strings are relevant in biology (see e.g. [20] and references therein) and have been studied from a formal language perspective in the non-quantitative setting in [24]. The study of GWMs defined over such graphs is particularly relevant since circular strings are in some sense the simplest family of graphs with cycles (and cycles can be seen as the key obstacle for going from strings and trees to general graphs). Moreover, GWMs defined over the family of circular strings—which we henceforth denote by GWM\(^\mathrm{c}\)s to avoid confusions—take a simple form making them easily amenable to theoretical study: a   GWM\(^\mathrm{c}\)  is given by a set of matrices \(\mathbf {A}^\sigma \) for each symbol \(\sigma \) in a finite alphabet, and maps any circular string \(\sigma _1\sigma _2\cdots \sigma _k\) to the trace of the products of the matrices associated with the letters in the stringFootnote 1. Despite the simplicity of this computational model and its strong connection with string weighted automata, the minimization problem is considerably more challenging than in the case of string or tree weighted automata. More precisely, while the minimization problem can easily be handled using notions from linear algebra for e.g. real-valued string weighted automata (see e.g. [7]), we show in this paper that the minimization of GWM\(^\mathrm{c}\)s requires fundamental concepts from the theory of finite-dimensional algebras (such as the ones of radical and semi-simplicity).

Contributions. Throughout the paper, we only consider automata defined over a field of characteristic 0. After introducing notions on weighted automata, GWM\(^\mathrm{c}\)s and finite-dimensional algebras in Sect. 2, we first tackle the problem of deciding the equivalence of GWM\(^\mathrm{c}\)s in Sect. 3. The study of the equivalence problem is motivated by the simple observation that two minimal GWMs computing the same function are not necessarily related by a change of basis, which is in contrast with a classical result stating that two minimal string weighted automata are equivalent if and only if they are related by a change of basis. Building from this observation, we unravel the fundamental notion of semi-simple GWM\(^\mathrm{c}\)  and we show that any function recognizable by a \(GWM^{c}\) can be computed by a semi-simple GWM\(^\mathrm{c}\) (Corollary 1) and that two semi-simple \(GWM^{c}s\) of equal dimensions computing the same function are necessarily related by a change of basis (Corollary 2). These two results naturally give rise to a polynomial time algorithm to decide whether two \(GWM^{c}s\) are equivalent. We then move on to the minimization problem in Sect. 4, where we give a polynomial time minimization algorithm for \(GWM^{c}s\) which fundamentally relies on the notion of semi-simple GWM\(^\mathrm{c}\) (Corollary 3). While the problem of minimizing a GWM defined over the simple family of circular strings is central to this paper, we see it as a test bed for developing the theory of general GWMs: beyond the minimization and equivalence algorithms we propose, we believe that one of our main contributions is to illustrate how the theory of GWMs will rely on advanced concepts from algebra theory and to unravel fundamental properties that will surely be central to the study of GWMs defined over more general families of graphs (such as the one of semi-simple GWM\(^\mathrm{c}\)).

1.1 Notations

For any integer n we let \([n] = \{1,2,\cdots , n\}\). We denote the set of integers by \(\mathbb {N}\) and the fields of real and rational numbers by \(\mathbb {R}\) and \(\mathbb {Q}\) respectively. Let \(\mathbb {F}\) be a field of characteristic 0, we denote by \(\mathcal {M}_{n}(\mathbb {F}) = \mathbb {F}^{n\times n}\) the set of all \(n\times n\) matrices over \(\mathbb {F}\). We use lower case bold letters for vectors (e.g. \(\mathbf {v} \in \mathbb {F}^{d_1}\)) and upper case bold letters for matrices (e.g. \(\mathbf {M}\in \mathbb {F}^{d_1 \times d_2}\)). We denote by \(\mathbf {I}_n\) the \(n\times n\) identity matrix (or simply \(\mathbf {I}\) if the dimension is clear from context). Given a matrix \(\mathbf {M}\in \mathbb {F}^{d_1 \times d_2}\), we denote its entries by \(\mathbf {M}_{i,j}\) and we use \(\mathrm {vec}(\mathbf {M}) \in \mathbb {F}^{d_1 d_2}\) to denote the column vector obtained by concatenating the columns of \(\mathbf {M}\). We use \(\ker (\mathbf {A})\) to denote the kernel (or null space) of a matrix \(\mathbf {A}\). Given two matrices \(\mathbf {A}\in \mathcal {M}_m(\mathbb {F})\) and \(\mathbf {B}\in \mathcal {M}_n(\mathbb {F})\) we denote their Kronecker product by \(\mathbf {A}\otimes \mathbf {B}\in \mathcal {M}_{mn}(\mathbb {F})\) and their direct sum by \(\mathbf {A}\oplus \mathbf {B}\in \mathcal {M}_{m+n}(\mathbb {F})\): \(\mathbf {A}\otimes \mathbf {B}\) is the block matrix with blocks \((\mathbf {A}_{i,j}\mathbf {B})_{i,j}\) and \(\mathbf {A}\oplus \mathbf {B}\) is the block diagonal matrix with \(\mathbf {A}\) in the upper diagonal block and \(\mathbf {B}\) in the lower one. We denote by \(\varSigma ^*\) the set of strings on a finite alphabet \(\varSigma \) and the empty string by \(\lambda \). We denote by \(\varSigma ^+\) the set of non-empty strings and by \(\varSigma ^k\) the set of all strings of length k.

2 Preliminaries

We first present notions on weighted automata, graph weighted models and finite dimensional algebras. The reader is referred to [9, 16, 25] for more details on weighted automata theory, to [2] and [22, Chap. 2] for an introduction to graph weighted models, and to [13, 17] for a thorough introduction to finite dimensional algebras.

2.1 Weighted Automata and GWMs over Circular Strings

Let \(\varSigma \) be a finite alphabet. A weighted finite automaton (WFA) over a field \(\mathbb {F}\) with n states is a tuple \(M=({\varvec{\alpha }}, \{\mathbf {M}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }} )\) where \({\varvec{\alpha }},{\varvec{\omega }}\in \mathbb {F}^n\) are the initial and final weight vectors respectively, and \(\mathbf {M}^\sigma \in \mathcal {M}_n(\mathbb {F})\) is the transition matrix for each symbol \(\sigma \in \varSigma \). A WFA computes a function \(f_M:\varSigma ^*\rightarrow \mathbb {F}\) defined for each word \(x=x_1x_2\cdots x_k\in \varSigma ^*\) by

$$f_M(x) = {\varvec{\alpha }}^\top \mathbf {M}^{x_1}\mathbf {M}^{x_2}\cdots \mathbf {M}^{x_k}{\varvec{\omega }}.$$

We will often use the shorthand notation \(\mathbf {M}^x= \mathbf {M}^{x_1}\mathbf {M}^{x_2}\cdots \mathbf {M}^{x_k}\) for any word \(x=x_1x_2\cdots x_k\in \varSigma ^*\). A WFA M with n states is minimal if its number of states is minimal, i.e. any WFA \(M'\) such that \(f_M=f_{M'}\) has at least n states. We say that a function \(f:\varSigma ^*\rightarrow \mathbb {R}\) is WFA-recognizable if there exists a WFA computing it.

Graph weighted models (GWMs) have been introduced as a computational model over arbitrary labeled graphs and hypergraphs in [2]. In this paper, we focus on the simple model of GWMs defined over the family of circular strings. A circular string is a string without a beginning or an end, one can think of it as a string closed onto itself (see Fig. 1).

Fig. 1.
figure 1

(left) Graph representation of the string abba where the special vertices labeled with \(\alpha \) and \(\omega \) denote the beginning and end of the string respectively. (right) In contrast, the circular string abba has no beginning and no end, it is thus the same object as e.g. the circular string baab.

A d-dimensional GWM A over circular strings (GWM\(^\mathrm{c}\)) on \(\varSigma \) is given by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). It computes a function \(f_A:\varSigma ^+ \rightarrow \mathbb {F}\) definedFootnote 2 for each word \({x=x_1x_2\cdots x_k\in \varSigma ^+}\) by

$$f_A(x) = \mathrm {Tr} ( \mathbf {A}^{x_1}\mathbf {A}^{x_2}\cdots \mathbf {A}^{x_k}) = \mathrm {Tr} (\mathbf {A}^x).$$

By invariance of the trace under cyclic permutation, we have \(f_A(x_1 x_2\cdots x_k) = f_A(x_2 x_3\cdots x_k x_1) = f_A(x_3 x_4\cdots x_k x_1 x_2) = \cdots \). This is in accordance with the definition of a circular string: for any string \(x'\) obtained by cyclic permutation of the letters of a string x, both x and \(x'\) correspond to the same circular string. Similarly to WFAs, a GWM\(^\mathrm{c}\)  is minimal if its dimension is minimal and a function \(f:\varSigma ^+\rightarrow \mathbb {F}\) is \(GWM^{c}{\text {-}}recognizable\) if it can be computed by a GWM\(^\mathrm{c}\).

It is immediate to see that there exist WFA-recognizable functions that are not GWM\(^\mathrm{c}\)-recognizable, this is the case of any WFA-recognizable function that is not invariant under cyclic permutation of letters in a wordFootnote 3. In contrast, one can easily show that any GWM\(^\mathrm{c}\)-recognizable function is WFA-recognizable. More precisely, we have the following result.

Proposition 1

For any d-dimensional GWM\(^\mathrm{c}\) \(A = \{\mathbf {A}^\sigma \}_{\sigma \in \varSigma } \) on \(\varSigma \), the WFA M with \(d^2\) states \(({\varvec{\alpha }}, \{\mathbf {M}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }} )\) where \({\varvec{\alpha }}={\varvec{\omega }}=\mathrm {vec}(\mathbf {I}_d)\) and \(\mathbf {M}^\sigma = \mathbf {I}_d\otimes \mathbf {A}^\sigma \) for each \(\sigma \in \varSigma \), is such that \(f_M(x) = f_A(x)\) for all \(x \in \varSigma ^*\).

Proof

For any \(w=w_1\cdots w_n \in \varSigma ^*\) we have \(f_A(w) = \mathrm {Tr} (\mathbf {A}^w) = \sum _{i\in [d]} \mathbf {A}^w_{i,i} = \sum _{i\in [d]} \mathbf {e}_i^\top \mathbf {A}^w \mathbf {e}_i\) where \(\mathbf {e}_i\) is the i-th vector of the canonical basis of \(\mathbb {F}^d\). Since \({\varvec{\alpha }}={\varvec{\omega }}= (\mathbf {e}_1^\top , \cdots , \mathbf {e}_d^\top )^\top \) and \(\mathbf {M}^\sigma = \mathbf {I} \otimes \mathbf {A}^\sigma \) is the block-diagonal matrix with \(\mathbf {A}^\sigma \) repeated d times on the diagonal, one can check that \(f_M(w) ={\varvec{\alpha }}^\top \mathbf {M}^w {\varvec{\omega }}= \sum _{i\in [d]} \mathbf {e}_i^\top \mathbf {A}^w \mathbf {e}_i = f_A(w)\).   \(\square \)

It follows from this proposition that the learning and equivalence problems for GWM\(^\mathrm{c}\)s could be handled by using the corresponding algorithms for WFAs. We will nonetheless study the equivalence problem in the next sectionFootnote 4 without falling back onto the theory of WFAs, which will allow us to unravel fundamental properties of GWMs that will be particularly relevant to further studies (moreover, the minimization problem obviously cannot be handled in such a way).

2.2 Finite-Dimensional Algebras

An algebra \(\mathcal {A}\) over a field \(\mathbb {F}\) (or \(\mathbb {F}\)-algebra) is a vector space over the field \(\mathbb {F}\) equipped with a bilinear operation (called multiplication or product). An algebra is associative if its product is associative and it is finite-dimensional if it is of finite dimension as a vector space over \(\mathbb {F}\). In this paper, we will only consider finite-dimensional associative algebras. A sub-algebra \(\mathcal {B}\) of an algebra \(\mathcal {A}\) is a linear subspace of \(\mathcal {A}\) which is closed under product (i.e. \(\mathcal {B}\) equipped with the operations of \(\mathcal {A}\) is an algebra itself).

A classical example of finite-dimensional algebra is the set \(\mathcal {L}(V)\) of linear operators on some finite-dimensional vector space V (where the product is composition). In this particular example, the algebra \(\mathcal {L}(V)\) is isomorphic to the full matrix algebra \(\mathcal {M}_d(\mathbb {F})\), where d is the dimension of V; we will mainly focus on matrix algebras in this paper, i.e. sub-algebras of the full matrix algebra \(\mathcal {M}_d(\mathbb {F})\) for some d (an example of such an algebra is the set of \(d\times d\) upper triangular matrices). In particular, we will often consider the algebra generated by a finite set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) for some finite alphabet \(\varSigma \), that is the set of all finite linear combinations of matrices of the form \(\mathbf {A}^x=\mathbf {A}^{x_1}\mathbf {A}^{x_2}\cdots \mathbf {A}^{x_k}\) for \(x=x_1x_2\cdots x_k\in \varSigma ^*\). More formally, if we denote by \(\mathcal {A}\) this algebra, we have

$$\mathcal {A}= \left\{ \sum _{i=1}^n \alpha _i \mathbf {A}^{w_i} : n\in \mathbb {N},\ \alpha _1,\cdots ,\alpha _n\in \mathbb {F},\ w_1,\cdots ,w_n\in \varSigma ^* \right\} .$$

Let \(\mathcal {A}\) be a finite-dimensional algebra over \(\mathbb {F}\). A sub-algebra \(\mathcal {X}\) of \(\mathcal {A}\) is called an ideal of \(\mathcal {A}\) if both \(xa\in \mathcal {X}\) and \(ax\in \mathcal {X}\) for any \(x\in \mathcal {X},\ a\in \mathcal {A}\) (i.e. \(\mathcal {X}\) is both left and right \(\mathcal {A}\)-invariant), which we will denote by \(\mathcal {A}\mathcal {X}=\mathcal {X}\mathcal {A}=\mathcal {A}\). A sub-algebra \(\mathcal {X}\) of \(\mathcal {A}\) is nilpotent if there exists some integer k such that \(\mathcal {X}^k= \{x_1x_2\cdots x_k :\ x_i\in \mathcal {X},\ i\in [k]\}=\{0\}\). The factor algebra \(\mathcal {A}/\mathcal {X}\) of an algebra \(\mathcal {A}\) by an ideal \(\mathcal {X}\) is the algebra consisting of all cosets \(a+\mathcal {X}\) for \(a\in \mathcal {A}\), in other words \(\mathcal {A}/\mathcal {X}\) is the quotient of \(\mathcal {A}\) by the equivalence relation (\(a\sim b\) if and only if \(a-b\in \mathcal {X}\)). The radicalFootnote 5 of \(\mathcal {A}\) is the maximal nilpotent ideal of \(\mathcal {A}\) and will be denoted by \(\mathrm {Rad}(\mathcal {A})\) (the existence of \(\mathrm {Rad}(\mathcal {A})\) follows from the fact that \(\mathcal {A}\) is of finite dimension). An algebra \(\mathcal {A}\) is semi-simple if its radical is \(\{0\}\).

Let us illustrate these definitions with a very simple example. Let \(\mathcal {G}\subset \mathcal {M}_2(\mathbb {R})\) be the algebra generated by the matrix \(\mathbf {G}= \left[ \begin{matrix}1&{}1\\ 0&{}1\end{matrix}\right] \). One can easily check that

\(\mathcal {G}= \left\{ \left[ \begin{matrix}\alpha &{}\beta \\ 0 &{} \alpha \end{matrix}\right] :\alpha ,\ \beta \in \mathbb {R}\right\} \) and is thus of dimension 2. Consequently, both

$$\begin{aligned} \mathcal {G}_1 = \left\{ \left[ \begin{matrix}\alpha &{} 0\\ 0 &{} \alpha \end{matrix}\right] :\ \alpha \in \mathbb {R}\right\} \ \ \text {and}\ \ \mathcal {G}_2 = \left\{ \left[ \begin{matrix}0&{} \beta \\ 0 &{} 0\end{matrix}\right] :\ \beta \in \mathbb {R}\right\} \end{aligned}$$
(1)

are sub-algebras of \(\mathcal {G}\). Moreover, \(\mathcal {G}_2\) is a nilpotent ideal and one can check that it is maximal, i.e. \(\mathrm {Rad}(\mathcal {G})=\mathcal {G}_2\) and hence \(\mathcal {G}\) is not semi-simple.

Intuitively, the radical of an algebra \(\mathcal {A}\) contains its bad elements (in the sense that these elements annihilate all simple \(\mathcal {A}\)-modules). In our previous example, this bad property translates into the fact that the non-zero elements of \(\mathcal {G}_2\) cannot be diagonalized. We will use two fundamental results from the theory of finite dimensional algebra. The first one is the Wedderburn-Malcev theorem which states that (under some conditions on the ground field \(\mathbb {F}\)) the elements of the radical can be filtered out from the algebra, i.e. one can find a sub-algebra of \(\mathcal {A}\) that is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) (see e.g. [17, Theorem 6.2.3]).

Theorem 1

(Wedderburn-Malcev Theorem). Let \(\mathcal {A}\) be a finite-dimensional algebra over a field of characteristic0. There exists a semi-simple subalgebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) which is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) and such that \(\mathcal {A}= \tilde{\mathcal {A}}\oplus \mathrm {Rad}(\mathcal {A})\) (direct sum of vector spaces).

Going back to the example of the algebra \(\mathcal {G}\) described above, we showed that it is not semi-simple, however one can easily check that \(\mathcal {G}/\mathrm {Rad}(\mathcal {G})\) is isomorphic to the algebra \(\mathcal {G}_1\) in Eq. (1) which is semi-simple, and furthermore that \(\mathcal {G}= \mathcal {G}_1\oplus \mathrm {Rad}(\mathcal {G})\).

The second fundamental result we will need is related to the notion of representation of an algebra. A representation of an \(\mathbb {F}\)-algebra \(\mathcal {A}\) is a homomorphism of \(\mathcal {A}\) into the algebra \(\mathcal {L}(V)\) of the linear operators on some vector space V (over \(\mathbb {F}\)). Two representations \(\rho :\mathcal {A}\rightarrow \mathcal {L}(V)\) and \(\tau :\mathcal {A}\rightarrow \mathcal {L}(W)\) are similar if there exists an isomorphism \(\phi :V\rightarrow W\) such that \(\rho (a) = \phi ^{-1}\tau (a)\phi \) for all \(a\in \mathcal {A}\). For semi-simple algebras, the notion of similar representations is fundamentally related to the trace operator, which will be particularly relevant to the present study. Formally, we have the following theorem (see e.g. [17, Corollary 2.6.3]).

Theorem 2

Let \(\rho \) and \(\tau \) be two representations of a semi-simple algebra \(\mathcal {A}\) over a field of characteristic 0. These representations are similar if and only if \(\mathrm {Tr} (\rho (a)) = \mathrm {Tr} (\tau (a))\) for all \(a\in \mathcal {A}\).

3 Semi-Simple GWMs and the Equivalence Problem

In this section, we study the equivalence problem: given two GWMs over circular strings, how can we decide whether they compute the same function? In light of Proposition 1, one could solve this problem by simply converting the two GWM\(^\mathrm{c}\)s into WFAs and checking whether these two WFAs compute the same function; indeed the equivalence problem for WFAs defined over a field is decidable in polynomial time [9]. Nonetheless, we will tackle this problem without relying on this proposition and, by doing so, we will unravel the notion of semi-simple \(GWM^{c}\) which will be relevant to the study of the minimization problem in the next section (and which should also be central to the study of GWMs defined over more general families of graphs).

3.1 Semi-Simplicity, Nilpotent Matrices and Traces

Let \(\mathcal {A}\) be a finite dimensional matrix algebra. Recall that the radical of \(\mathcal {A}\) is its maximal nilpotent ideal. A useful characterization of the elements of the radical relies on the notion of strongly nilpotent elements: \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if \(\mathbf {A}\mathbf {X}\) is nilpotent for any \(\mathbf {X}\in \mathcal {A}\). It turns out that the radical of \(\mathcal {A}\) is exactly the set of its strongly nilpotent elements [17, Corollary 3.1.10]. Since the computation of a GWM\(^\mathrm{c}\)  boils down to applying the trace operator, we will leverage this property to relate the notions of radical and semi-simplicity to simple properties of the elements of \(\mathcal {A}\) with respect to the trace operator. We start with a simple lemma relating nilpotency and trace.

Lemma 1

Let \(\mathbb {F}\) be a field of characteristic 0 and let \(\mathbf {A}\in \mathcal {M}_{d}(\mathbb {F})\). Then \(\mathbf {A}\) is nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\).

Proof

Let \(\mathbf {A}\) be a nilpotent matrix and let k be such that \(\mathbf {A}^k=0\). Suppose \(\mathbf {A}\mathbf {v}=\gamma \mathbf {v}\) for some \(\mathbf {v}\ne \mathbf {0}\) (where \(\gamma \) could belong to an algebraically closed field extension of \(\mathbb {F}\)). Then \(\mathbf {A}^k\mathbf {v}=\gamma ^k \mathbf {v}= 0\) hence \(\gamma = 0\) since \(\mathbb {F}\) is of characteristic 0, thus \(\mathbf {A}\) has only 0 eigenvalues and \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\).

Conversely, suppose that \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\). Then, we have \(\mathrm {Tr} (P(\mathbf {A})) = 0\) for any polynomial P with constant term 0. Suppose that \(\mathbf {A}\) has a non-zero eigenvalue \(\gamma \) and let \(m>0\) be its multiplicity. Choose a polynomial P such that \(P(\gamma ) = 1\), \(P(0)=0\) and \(P(\mu ) = 0\) for any eigenvalue \(\mu \) of \(\mathbf {A}\) distinct from \(\gamma \). We then have \(0 = \mathrm {Tr} (P(\mathbf {A})) = m\), a contradiction. Hence \(\mathbf {A}\) has only zero eigenvalues and is nilpotent.    \(\square \)

One can use the previous lemma to show that an element \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\), which leads to the following useful characterization of the semi-simplicity of an algebra.

Proposition 2

Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be a matrix algebra. We have

$$\mathrm {Rad}(\mathcal {A}) = \left\{ \mathbf {A}\in \mathcal {A}\ :\ \mathrm {Tr} (\mathbf {A}\mathbf {X}) = 0 \ \text { for all }\mathbf {X}\in \mathcal {A}\right\} .$$

Consequently, \(\mathcal {A}\) is semi-simple if and only if for all \(\mathbf {A}\in \mathcal {A}\) different from 0 there exists \(\mathbf {X}\in \mathcal {A}\) such that \(\mathrm {Tr} (\mathbf {A}\mathbf {X})\ne 0\).

Proof

We will show that \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\). The proposition will then directly follows from the fact that \(\mathrm {Rad}(\mathcal {A})\) is the set of strongly nilpotent elements of \(\mathcal {A}\) and from the fact that \(\mathcal {A}\) is semi-simple if and only if \(\mathrm {Rad}(\mathcal {A})=\{0\}\).

Let \(\mathbf {A}\in \mathcal {A}\) be such that \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\). Since \(\mathbf {X}(\mathbf {A}\mathbf {X})^{n-1}\in \mathcal {A}\) for all \(n\ge 1\) and all \(\mathbf {X}\in \mathcal {A}\) we have \(\mathrm {Tr} ((\mathbf {A}\mathbf {X})^n) = 0\) for all \(n\ge 1\) and all \(\mathbf {X}\in \mathcal {A}\), hence \(\mathbf {A}\mathbf {X}\) is nilpotent for all \(\mathbf {X}\in \mathcal {A}\) by Lemma 1, i.e. \(\mathbf {A}\) is strongly nilpotent. Conversely, let \(\mathbf {A}\) be a strongly nilpotent element of \(\mathcal {A}\). By Lemma 1 we have \(\mathrm {Tr} ((\mathbf {A}\mathbf {X})^n)=0\) for all \(\mathbf {X}\in \mathcal {A}\) and all \(n\ge 1\), in particular \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\).    \(\square \)

3.2 Equivalence of GWMs

We now consider the problem of deciding whether two GWM\(^\mathrm{c}\)s are equivalent. Let us first briefly show how one can decide whether two real-valued WFAs compute the same function. One way to address this problem relies on the following result: two minimal real-valued WFAs computing the same function are related by a change of basis. Note that it is easy to check that WFAs are invariant under a change of basis of their weight vectors and transition matrices. The following proposition show that such a change of basis is actually the only way for two minimal WFAs to compute the same function [26] (see also [6, Corollary 4.2]).

Proposition 3

If two WFAs \(A=({\varvec{\alpha }},\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }})\) and \(\tilde{A}=(\tilde{{\varvec{\alpha }}}, \{\tilde{\mathbf {A}}^\sigma \}_{\sigma \in \varSigma }, \tilde{{\varvec{\omega }}})\) with d states taking their values in \(\mathbb {R}\) are minimal and compute the same function, i.e. \(f_A=f_{\tilde{A}}\), then there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {R})\) such that

$${\varvec{\alpha }}^\top =\tilde{{\varvec{\alpha }}}^\top \mathbf {P},\ \ {\varvec{\omega }}= \mathbf {P}^{-1}\tilde{{\varvec{\omega }}}\ \ \text {and }\mathbf {A}^\sigma =\mathbf {P}^{-1}\tilde{\mathbf {A}}^\sigma \mathbf {P}\ { for~each~}\sigma \in \varSigma .$$

Hence, to decide whether two WFAs compute the same function one can simply minimize them and check whether the weight vectors and transition matrices obtained after minimization are related by a change of basis (which can both be done in polynomial time). In contrast, one can easily find an example of two minimal \(GWM^{c}s\) whose matrices are not related by a change of basis. Consider the constant function \(f(x)=2\) for all \(x\in \varSigma ^+\). One can check that the two GWM\(^\mathrm{c}\)s G and \(\tilde{G}\) with 2 states defined by the matrices

$$\mathbf {G}= \left[ \begin{matrix}1 &{} 1\\ 0 &{} 1\end{matrix}\right] \ \ \ \text {and}\ \ \ \tilde{\mathbf {G}} = \left[ \begin{matrix}1 &{} 0 \\ 0 &{} 1\end{matrix}\right] $$

respectively are minimal and compute f, however \(\mathbf {G}\) and \(\tilde{\mathbf {G}}\) are not similar.

Let us now introduce the notion of semi-simple\(GWM^{c}\). We say that a GWM\(^\mathrm{c}\)  A defined by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) is semi-simple if the algebra \(\mathcal {A}\) generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) is semi-simple. It follows from the example presented in Sect. 2.2 that G is not semi-simple while \(\tilde{G}\) is a semi-simple \(GWM^{c}\) computing the \(GWM^{c}{\text {-}}recognizable function\) f. We will now show that this simple example can be generalized:any \(GWM^{c}{\text {-}}recognizable\) function can be computed by a semi-simple \(GWM^{c}\). This non-trivial result relies on the following theorem which is a direct consequence of the Wedderburn-Malcev theorem.

Theorem 3

Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be a matrix algebra over a field of characteristic 0. Then there exist a semi-simple sub-algebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) and a surjective homomorphism \(\pi :\mathcal {A}\rightarrow \tilde{\mathcal {A}}\) such that \(\mathrm {Tr} (\mathbf {A}) = \mathrm {Tr} (\pi (\mathbf {A}))\) for all \(\mathbf {A}\in \mathcal {A}\).

Proof

By Theorem 1 there exists a semi-simple sub-algebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) which is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) and such that \(\mathcal {A}= \tilde{\mathcal {A}} \oplus \mathrm {Rad}(\mathcal {A})\) (direct sum of vector spaces). Let \(\pi :\mathcal {A}\rightarrow \tilde{\mathcal {A}}\) be the projection associated with this direct sum. Then for any \(\mathbf {A}\in \mathcal {A}\) we have

$$\begin{aligned} \mathrm {Tr} (\mathbf {A}) = \mathrm {Tr} (\pi (\mathbf {A}) + (1-\pi )(\mathbf {A})) = \mathrm {Tr} (\pi (\mathbf {A})) + \mathrm {Tr} ((1-\pi )(\mathbf {A})) = \mathrm {Tr} (\pi (\mathbf {A})). \end{aligned}$$

Indeed, since \((1-\pi )(\mathbf {A})\in \mathrm {Rad}(\mathcal {A})\), it is nilpotent, hence its trace is zero.   \(\square \)

Using the notations from Theorem 3, it follows that for any d-dimensional GWM\(^\mathrm{c}\)  A given by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_{d}(\mathbb {F})\) generating the algebra \(\mathcal {A}\), the d-dimensional \(\tilde{A}\) given by the matrices \(\{\tilde{\mathbf {A}}^\sigma = \pi (\mathbf {A}^\sigma )\}_{\sigma \in \varSigma }\) is a semi-simple GWM\(^\mathrm{c}\)  computing the function \(f_A\), hence the following corollary.

Corollary 1

Any function that can be computed by a GWM\(^\mathrm{c}\)  can be computed by a semi-simple GWM\(^\mathrm{c}\)  of the same dimension.

Given a finite dimensional algebra \(\mathcal {A}\), one can compute the surjective homomorphism \(\pi \) from Theorem 3 in polynomial time when \(\mathbb {F}\) allows efficient arithmetic computations (e.g. \(\mathbb {F}=\mathbb {Q}\)) [12, 15]. The algorithm takes as input a basis \(a_1,\cdots ,a_n\) of \(\mathcal {A}\) (as a vector space) and the structure coefficients of the algebra (which are the scalars \(c^k_{i,j}\in \mathbb {F}\) satisfying \(a_ia_j=\sum _k c_{i,j}^ka_k\)). Since one can easily compute a basis and the structure coefficients of a matrix algebra \(\mathcal {A}\) given a set of generators \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) in polynomial time, it follows that any GWM\(^\mathrm{c}\)  can be transformed in polynomial time into a semi-simple \(GWM^{c}\) (of the same dimension) computing the same function.

We now show that a result similar to Proposition 3 holds for semi-simple GWM\(^\mathrm{c}\)s: two semi-simple d-dimensional \(GWM^{c}s\) are equivalent if and only if they are related by a change of basis. This result relies on the following theorem.

Theorem 4

Let \(\varSigma \) be a finite alphabet and let \(\mathcal {A},\mathcal {B}\subset \mathcal {M}_d(\mathbb {F})\) be the algebras generated by the sets of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\) respectively.

If \(\mathcal {A}\) and \(\mathcal {B}\) are semi-simple and \(\mathrm {Tr} (\mathbf {A}^w)=\mathrm {Tr} (\mathbf {B}^w)\) for all \(w\in \varSigma ^*\) then \(\mathcal {A}\) is isomorphic to \(\mathcal {B}\). Moreover, the mapping \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\) defined by extending the mapping

$$\phi :\mathbf {A}^x \mapsto \mathbf {B}^x\ \ \text { for all }x\in \varSigma ^*$$

by linearity is well-defined and is an isomorphism.

Proof

The mapping \(\phi \) is by construction a trace-preserving surjective semi-group homomorphism. We first showFootnote 6 that \(\phi \) can be extended to a homomorphism \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\). By definition, any \(\mathbf {A}\in \mathcal {A}\) can be written as \(\mathbf {A}= \sum _{i=1}^n \alpha _i \mathbf {A}^{x_i}\) for some \(n\in \mathbb {N}\), \(\alpha _1,\cdots ,\alpha _n\in \mathbb {F}\), \(x_1,\cdots ,x_n\in \varSigma ^*\). We will show that the mapping

$$\tilde{\phi }:\ \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} \longmapsto \sum _{i=1}^n\alpha _i\phi (\mathbf {A}^{x_i})$$

is well-defined. By construction of \(\tilde{\phi }\), it suffices to show that if \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} = 0\) for some \(\alpha _i\in \mathbb {F}\), \(x_i\in \varSigma ^*\), then \(\tilde{\phi }( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} ) = 0\). Suppose \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} = 0\), then \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\mathbf {A}^x = 0\) for any \(x\in \varSigma ^*\). By linearity of the trace and since \(\phi \) is a trace-preserving morphism, it follows that

$$\begin{aligned} 0&= \sum _{i=1}^n\alpha _i \mathrm {Tr} \left[ \mathbf {A}^{x_i}\mathbf {A}^x\right] = \sum _{i=1}^n\alpha _i \mathrm {Tr} \left[ \phi (\mathbf {A}^{x_i}\mathbf {A}^x )\right] = \sum _{i=1}^n\alpha _i \mathrm {Tr} \left[ \phi (\mathbf {A}^{x_i})\phi (\mathbf {A}^x)\right] \\&= \mathrm {Tr} \left[ \left( \sum _{i=1}^n\alpha _i \phi (\mathbf {A}^{x_i})\right) \phi (\mathbf {A}^x)\right] = \mathrm {Tr} \left[ \tilde{\phi }\left( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\right) \phi (\mathbf {A}^x)\right] \end{aligned}$$

for all \(x\in \varSigma ^*\). By linearity of the trace and since \(\phi \) is surjective, we thus have \(\mathrm {Tr} \left[ \tilde{\phi }\left( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\right) \mathbf {B}\right] = 0\) for any \(\mathbf {B}\in \mathcal {B}\), hence \(\tilde{\phi }\left( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\right) \) belongs to \(\mathrm {Rad}(\mathcal {B})\) by Proposition 2 and must be 0 since \(\mathcal {B}\) is semi-simple.

One can easily check that \(\tilde{\phi }\) is trace-preserving, is surjective and is a homomorphism. It remains to show that \(\tilde{\phi }\) is injective. Let \(\mathbf {A}\in \mathcal {A}\) be such that \(\tilde{\phi }(\mathbf {A}) = 0\). Since \(\tilde{\phi }\) is a homomorphism we have \(\tilde{\phi }(\mathbf {A}\mathbf {X})=0\) for any \(\mathbf {X}\in \mathcal {A}\), and thus \(0=\mathrm {Tr} (\tilde{\phi }(\mathbf {A}\mathbf {X})) = \mathrm {Tr} (\mathbf {A}\mathbf {X})\) for all \(\mathbf {X}\in \mathcal {A}\). Hence \(\mathbf {A}\in \mathrm {Rad}(\mathcal {A})\) by Proposition 2 and must be 0 since \(\mathcal {A}\) is semi-simple.    \(\square \)

The previous theorem can be leveraged to show that if two semi-simple GWM\(^\mathrm{c}\)s of the same dimension compute the same function, then they are related by a change of basis (note that the converse of this statement is immediate since the trace is a basis independent operator). Let A and B be two d-dimensional semi-simple GWM\(^\mathrm{c}\)s computing the same function and let \(\mathcal {A},\mathcal {B}\subset \mathcal {M}_d\) be the algebras generated by their respective sets of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\). First observe that the identity mapping \(\rho :\mathcal {A}\rightarrow \mathcal {L}(\mathbb {F}^d)\) defined by \(\rho (\mathbf {A})=\mathbf {A}\) for all \(\mathbf {A}\in \mathcal {A}\) is (trivially) a representation of the algebra \(\mathcal {A}\). Now, since A and B compute the same function and are semi-simple, we have \(\mathrm {Tr} (\mathbf {A}^w)=\mathrm {Tr} (\mathbf {B}^w)\) for all \(w\in \varSigma ^*\) and it follows from Theorem 4 that \(\mathcal {A}\) is isomorphic to \(\mathcal {B}\); let \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\) be the isomorphism defined in this theorem. Then, the mapping \(\tau :\mathcal {A}\rightarrow \mathcal {L}(\mathbb {F}^d)\) defined by \(\tau (\mathbf {A}) = \tilde{\phi }(\mathbf {A})\) for all \(\mathbf {A}\in \mathcal {A}\) is also a representation of \(\mathcal {A}\), and since \(\mathcal {A}\) is semi-simple it follows from Theorem 2 that \(\rho \) and \(\tau \) are similar. That is, there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\rho (\mathbf {A}) = \mathbf {P}^{-1}\tau (\mathbf {A})\mathbf {P}\) for all \(\mathbf {A}\in \mathcal {A}\). In particular we have

$$\mathbf {A}^\sigma = \rho (\mathbf {A}^\sigma ) = \mathbf {P}^{-1}\tau (\mathbf {A}^\sigma )\mathbf {P}= \mathbf {P}^{-1}\tilde{\phi }(\mathbf {A}^\sigma )\mathbf {P}= \mathbf {P}^{-1}\mathbf {B}^\sigma \mathbf {P}$$

for all \(\sigma \in \varSigma \), hence the following corollary.

Corollary 2

Two d-dimensional semi-simple GWM\(^\mathrm{c}\)s A and B compute the same function if and only if they are related by a change of basis, i.e. there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\mathbf {A}^\sigma = \mathbf {P}^{-1}\mathbf {B}^\sigma \mathbf {P}\) for all \(\sigma \in \varSigma \).

In the case where \(\mathbb {F}\) allows for efficient arithmetic computations (e.g. \(\mathbb {F}=\mathbb {Q}\)), it follows that the equivalence of GWM\(^\mathrm{c}\)s can be decided in polynomial time. Indeed, given two GWM\(^\mathrm{c}\)s A and B of the same dimension defined by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\) respectively, one can first transform them into semi-simple GWM\(^\mathrm{c}\)s using Theorem 3 and the algorithm in [12, 15], and then check whether the resulting matrices are related by a change of basis. The case where the two GWM\(^\mathrm{c}\)s are not of the same dimension can be easily handled. Without loss of generality, suppose that A and B are semi-simple GWM\(^\mathrm{c}\)s of dimension d and \(d'\) respectively with \(d'<d\). One can construct a d-dimensional GWM\(^\mathrm{c}\)  \(\tilde{B}\) computing the same function as B by considering the block-diagonal matrices \(\tilde{\mathbf {B}}^\sigma = \mathbf {B}^\sigma \oplus \mathbf {0}\) for each \(\sigma \in \varSigma \) (where \(\mathbf {0}\) is the \((d-d')\times (d-d')\) matrix with all entries equal to 0). It is easy to check that \(\tilde{B}\) is semi-simple if B is semi-simple, hence one can decide if A is equivalent to B by checking whether the matrices \(\mathbf {A}^\sigma \) and \(\tilde{\mathbf {B}}^\sigma \) are related by a change of basis.

4 Minimization of GWMs over Circular Strings

We now consider the minimization problem: given a GWM\(^\mathrm{c}\)  A, can we find a minimal GWM\(^\mathrm{c}\)  computing \(f_A\)? We will show that the answer is in the positive and that such a minimal GWM\(^\mathrm{c}\)  can be computed in polynomial time. We start with a technical lemma that generalizes the classical result stating that for any \(d\times d\) matrix \(\mathbf {A}\), the kernel of \(\mathbf {A}^d\) is equal to the kernel of \(\mathbf {A}^{d+k}\) for any \(k\ge 0\).

Lemma 2

Let \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) be a finite set of matrices. Then for all \(k\ge 0\) we have

$$\bigcap _{x\in \varSigma ^d}\mathrm {ker}(\mathbf {A}^x) = \bigcap _{y\in \varSigma ^{d+k}}\mathrm {ker}(\mathbf {A}^y).$$

Proof

For any integer i, let \(E_i = \bigcap _{x\in \varSigma ^i}\mathrm {ker}(\mathbf {A}^x)\). We start by showing that if \(E_i = E_{i+1}\) for some i then \(E_{i+1} = E_{i+2}\). The inclusion \(E_{i+1} \subseteq E_{i+2}\) is immediate. Suppose \(E_i=E_{i+1}\) for some integer i. If \(\mathbf {v}\in E_{i+2}\) then \(\mathbf {A}^\sigma \mathbf {v}\in \mathrm {ker}(\mathbf {A}^x)\) for all \(x\in \varSigma ^{i+1}\) and all \(\sigma \in \varSigma \), i.e. \(\mathbf {A}^\sigma \mathbf {v}\in E_{i+1} = E_i\) for all \(\sigma \in \varSigma \), which implies \(\mathbf {A}^\sigma \mathbf {v}\in \mathrm {ker}(\mathbf {A}^y)\) for all \(y\in \varSigma ^i\) and all \(\sigma \in \varSigma \) from which \(\mathbf {v}\in E_{i+1}\) follows directly. To conclude, since each \(E_i\) is a linear subspace of \(\mathbb {F}^d\), \(E_i\subsetneq E_{i+1}\) implies \(\dim {E_i} < \dim {E_{i+1}}\), hence there must exist an i for which \(E_i=E_{i+1}\) and this i cannot be greater than d.   \(\square \)

We show in the following theorem that the linear space \(E=\bigcap _{x\in \varSigma ^d}\mathrm {ker}(\mathbf {A}^x)\) is not relevant to the computation of a GWM\(^\mathrm{c}\)  A with matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\), i.e. one can project each matrix \(\mathbf {A}^x\) onto the orthogonal complement of E without changing the function computed by A.

Theorem 5

Let A be a GWM\(^\mathrm{c}\)  given by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). Consider the linear space

$$E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x) = \{\mathbf {v}\in \mathbb {F}^d: \mathbf {A}^x\mathbf {v}=\mathbf {0}\text { for all } x\in \varSigma ^d \}$$

and let \(\mathbf {\Pi }\in \mathbb {F}^{d\times d}\) be the matrix of the orthogonal projection onto E.

Then, the GWM\(^\mathrm{c}\)  \(\hat{A}\) given by the matrices \(\hat{\mathbf {A}}^\sigma =\mathbf {A}^\sigma (\mathbf {I}-\mathbf {\Pi })\) for each \(\sigma \in \varSigma \) is such that \(f_A=f_{\hat{A}}\).

Proof

Let \(\mathcal {A}\) be the algebra generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\). Let us first observe that E is \(\mathcal {A}\)-invariant, which follows from Lemma 2. Indeed, if \(\mathbf {v}\in E\) and \(y\in \varSigma ^*\) we have \(\mathbf {A}^x\mathbf {A}^y\mathbf {v}= \mathbf {0}\) for any \(x\in \varSigma ^d\) (since \(|xy|\ge d\)), hence \(\mathbf {A}^y\mathbf {v}\in E\); the extension to an arbitrary element of \(\mathbf {A}\) is immediate by linearity. This implies that for any \(\mathbf {A}\in \mathcal {A}\), we have

$$\begin{aligned} \mathbf {\Pi }\mathbf {A}\mathbf {\Pi }= \mathbf {A}\mathbf {\Pi }\ \ \ \ \ \text {and}\ \ \ \ \ (\mathbf {I}-\mathbf {\Pi })\mathbf {A}\mathbf {\Pi }= \mathbf {0}. \end{aligned}$$
(2)

Now, let \(k\ge 1\), let \(x=x_1x_2\cdots x_k\in \varSigma ^k\) and let \(\mathbf {P}_1 = \mathbf {\Pi }\) and \(\mathbf {P}_2 = \mathbf {I}-\mathbf {\Pi }\). We can decompose \(\mathbf {A}^x\) into

$$\begin{aligned} \mathbf {A}^x&= \prod _{i=1}^k \mathbf {A}^{x_i} = \prod _{i=1}^k \mathbf {A}^{x_i}(\mathbf {P}_1 + \mathbf {P}_2) = \sum _{j_1,\cdots ,j_k \in \{1,2\}} \mathbf {A}^{x_1}\mathbf {P}_{j_1} \mathbf {A}^{x_2} \mathbf {P}_{j_2}\cdots \mathbf {A}^{x_k}\mathbf {P}_{j_k} \\&= \hat{\mathbf {A}}^x + \mathbf {A}^{x_1}\mathbf {\Pi }\mathbf {A}^{x_2}\mathbf {\Pi }\cdots \mathbf {A}^{x_k}\mathbf {\Pi }+ \sum _{j_1,\cdots ,j_k \in \{1,2\}\text { s.t.} \atop \exists r,r':j_r\ne j_{r'}} \mathbf {A}^{x_1}\mathbf {P}_{j_1} \mathbf {A}^{x_2} \mathbf {P}_{j_2}\cdots \mathbf {A}^{x_k}\mathbf {P}_{j_k}. \end{aligned}$$

We will show that the traces of all the summands in this last expression, except for the first one, are equal to 0. First, using Eq. (2) we have \(\mathbf {A}^{x_1}\mathbf {\Pi }\mathbf {A}^{x_2}\mathbf {\Pi }\cdots \mathbf {A}^{x_k}\mathbf {\Pi }= \mathbf {A}^x\mathbf {\Pi }.\) Moreover, for any integer s such that \(sk\ge d\) we have \((\mathbf {A}^x\mathbf {\Pi })^s = \mathbf {A}^{x^s} \mathbf {\Pi }= \mathbf {0}\) by definition of E and by Lemma 2, thus \(\mathbf {A}^x\mathbf {\Pi }\) is nilpotent and its trace is 0 by Lemma 1. For the remaining terms, let \(j_1,\cdots ,j_k \in \{1,2\}\) not all equal. Let \(l\in [k]\) be an index such that \(j_l=2\) and \(j_{\overline{l+1}} = 1\) where \(\overline{l+1} = l+1\) if \(l<k\) and 1 otherwise. Using the invariance of the trace under cyclic permutations of a matrix product, we obtain

$$\begin{aligned} \mathrm {Tr} (\mathbf {A}^{x_1}\mathbf {P}_{j_1} \mathbf {A}^{x_2} \mathbf {P}_{j_2}\cdots \mathbf {A}^{x_k}\mathbf {P}_{j_k})&= \mathrm {Tr} (\mathbf {A}^{x_l}\mathbf {P}_{j_l} \mathbf {A}^{x_{\bar{l+1}}} \mathbf {P}_{j_{\overline{l+1}}}\cdots ) \\&= \mathrm {Tr} (\mathbf {A}^{x_l} (\mathbf {I}-\mathbf {\Pi })\mathbf {A}^{x_{\overline{l+1}}} \mathbf {\Pi }\cdots ) = 0 \end{aligned}$$

where we used Eq. 2 again for the last equality. To conclude, we have shown that \(\mathrm {Tr} (\mathbf {A}^x) = \mathrm {Tr} (\hat{\mathbf {A}}^x)\) for all \(x\in \varSigma ^*\), hence A and \(\hat{A}\) compute the same function on circular strings.   \(\square \)

Moreover, we now show that the subspace E from the previous theorem can be used to obtain a characterization of the minimality of a GWM\(^\mathrm{c}\).

Theorem 6

Let A be a GWM\(^\mathrm{c}\)  given by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). Then, A is minimal if and only if the linear space

$$E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x) = \{\mathbf {v}\in \mathbb {F}^d: \mathbf {A}^x\mathbf {v}=\mathbf {0}\text { for all } x\in \varSigma ^d \}$$

is trivial, i.e. \(E = \{\mathbf {0}\}\).

Proof

Suppose that E is not trivial and let \(\mathbf {\Pi }\) be the matrix of the orthogonal projection onto E. Then, the rank R of \(\mathbf {I}-\mathbf {\Pi }\) is strictly less than d and there exists an orthogonal matrix \(\mathbf {U}\in \mathbb {R}^{d\times R}\) such that \(\mathbf {I}-\mathbf {\Pi }= \mathbf {U}\mathbf {U}^\top \). It follows from the previous proposition that, for any non-empty word \(x=x_1\cdots x_k\), we have

$$\begin{aligned}&\mathrm {Tr} (\mathbf {A}^x) = \mathrm {Tr} (\mathbf {A}^{x_1} (\mathbf {I}-\mathbf {\Pi })\mathbf {A}^{x_2} (\mathbf {I}-\mathbf {\Pi })\cdots \mathbf {A}^{x_k} (\mathbf {I}-\mathbf {\Pi }))\\&= \mathrm {Tr} (\mathbf {A}^{x_1} \mathbf {U}\mathbf {U}^\top \mathbf {A}^{x_2} \mathbf {U}\mathbf {U}^\top \cdots \mathbf {A}^{x_k} \mathbf {U}\mathbf {U}^\top ) = \mathrm {Tr} ( (\mathbf {U}^\top \mathbf {A}^{x_1} \mathbf {U})(\mathbf {U}^\top \mathbf {A}^{x_2} \mathbf {U}) \cdots (\mathbf {U}^\top \mathbf {A}^{x_k} \mathbf {U})). \end{aligned}$$

Hence, the R-dimensional GWM\(^\mathrm{c}\)  given by the matrices \(\hat{\mathbf {A}}^\sigma = \mathbf {U}^\top \mathbf {A}^{\sigma } \mathbf {U}\) computes the same function as A, showing that A is not minimal.

Suppose now that A is not minimal. Let B be a GWM\(^\mathrm{c}\)  of dimension \(d' <d\), given by the matrices \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\), such that \(f_B=f_A\). Let \(\mathcal {A}\) (resp. \(\mathcal {B}\)) be the algebra generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) (resp. \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\)). By Corollary 1, we can assume that both A and B are semi-simple GWM\(^\mathrm{c}\)s, i.e. that the algebras \(\mathcal {A}\) and \(\mathcal {B}\) are semi-simple. For each \(\sigma \in \varSigma \), let \(\hat{\mathbf {B}}^\sigma = \mathbf {B}^\sigma \oplus \mathbf {0}\in \mathbb {R}^{d\times d}\) be the block diagonal matrix having \(\mathbf {B}^\sigma \) in the upper diagonal block and 0’s elsewhere. Let \(\hat{\mathcal {B}}\) be the algebras generated by the matrices \(\{\hat{\mathbf {B}}^\sigma \}_{\sigma \in \varSigma } \subset \mathcal {M}_d(\mathbb {F})\). It is easy to check that the GWM\(^\mathrm{c}\)  \(\hat{B}\) computes the same function as A and B and that the algebra \(\hat{\mathcal {B}}\) is semi-simple (it is indeed isomorphic to the semi-simple algebra \(\mathcal {B}\)). It then follows from Corollary 2 that there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\mathbf {A}^\sigma = \mathbf {P}\hat{\mathbf {B}}^\sigma \mathbf {P}^{-1}\text { for all }\sigma \in \varSigma .\) Let \(\mathbf {e}_d\) be the dth vector of the canonical basis of \(\mathbb {F}^d\), by definition of \(\hat{\mathbf {B}}^\sigma \) we have \(\hat{\mathbf {B}}^\sigma \mathbf {e}_d = \mathbf {0}\) for any \(\sigma \in \varSigma \), and consequently \(\mathbf {A}^\sigma \mathbf {P}\mathbf {e}_d = \mathbf {0}\) for any symbol \(\sigma \), showing that \(\mathbf {P}\mathbf {e}_d\in E\) and \(E \ne \{\mathbf {0}\}\).   \(\square \)

It follows from the two previous theorems that by restricting the linear operators \(\mathbf {A}^\sigma \) of a GWM\(^\mathrm{c}\)  A to the subspace \(E^\bot \), one can obtain a minimal GWM\(^\mathrm{c}\)  computing \(f_A\). We formally state this result in the following corollary.

Corollary 3

Let A be a GWM\(^\mathrm{c}\)  given by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) and let \(\mathbf {\Pi }\) be the matrix of the orthogonal projection onto the space \(E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x)\). For any orthogonal matrix \(\mathbf {U}\in \mathbb {F}^{d\times R}\) such that \(\mathbf {I}-\mathbf {\Pi }=\mathbf {U}\mathbf {U}^\top \) (where R is the dimension of \(E^\bot \)), the R-dimensional GWM\(^\mathrm{c}\)  \(\hat{A}\) given by the matrices \(\hat{\mathbf {A}}^\sigma = \mathbf {U}^\top \mathbf {A}^\sigma \mathbf {U}\) is a minimal GWM\(^\mathrm{c}\)  computing \(f_A\).

Proof

Using the invariance of the trace under cyclic permutations of a matrix product, it directly follows from Theorem 5 that \(f_{\hat{A}} = f_A\). Moreover, one can check that \(\hat{E} = \bigcap _{x\in \varSigma ^d} \ker (\hat{\mathbf {A}}^x) = \{\mathbf {0}\}\) by construction of the matrices \(\hat{\mathbf {A}}^\sigma \), hence \(\hat{A}\) is minimal by Theorem 6.   \(\square \)

We showed that a GWM\(^\mathrm{c}\)  can be minimized by restricting its matrices to the subspace \(E^\bot \). In order to do so, one needs to compute a basis of \(E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x)\). This can naively be done by first computing \(\ker (\mathbf {A}^x)\) for each \(x\in \varSigma ^d\) and then computing a basis for the intersection of these linear subspaces, however the complexity of this approach is exponential in the dimension d. We show in the following proposition that for semi-simple GWM\(^\mathrm{c}\)s, one simply needs to compute a basis of the space \(\bigcap _{\sigma \in \varSigma } \ker (\mathbf {A}^\sigma )\), which can be done in polynomial time (provided that the field \(\mathbb {F}\) admits efficient symbolic arithmetic, e.g. \(\mathbb {F}=\mathbb {Q}\)).

Proposition 4

Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be the finite dimensional algebra generated by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\). Then if \(\mathcal {A}\) is semi-simple we have

$$\bigcap _{x\in \varSigma ^d}\mathrm {ker}(\mathbf {A}^x) = \bigcap _{\sigma \in \varSigma }\mathrm {ker}(\mathbf {A}^\sigma ).$$

Proof

For any integer \(i\ge 1\), let \(E_i = \bigcap _{x\in \varSigma ^i}\mathrm {ker}(\mathbf {A}^x)\). Recall from the proof of Lemma 2 that \(E_i\subset E_{i+1}\) for all i and that \(E_i=E_{i+1}\) implies \(E_i = E_{i+k}\) for any integer \(k\ge 0\), hence it will be sufficient to show that \(E_1=E_2\). One can check that each \(E_i\) is \(\mathcal {A}\)-invariant, i.e. each \(E_i\) is an \(\mathcal {A}\)-module. Since \(\mathcal {A}\) is semi-simple, any \(\mathcal {A}\)-module is semi-simple [17, Theorem 2.6.2], which implies that if M is an \(\mathcal {A}\)-module, every submodule U of M has a complement [17, Proposition 2.2.1], i.e. there exists an \(\mathcal {A}\)-module V such that \(M=U\oplus V\). Now since \(E_1\) is a submodule of the \(\mathcal {A}\)-module \(E_2\), \(E_1\) has a complement U in \(E_2\), i.e. U is \(\mathcal {A}\)-invariant and \(E_2 = E_1\oplus U\). Let \(\mathbf {v}\in U\). We show \(\mathbf {v}=\mathbf {0}\). Since \(\mathbf {v}\in E_2\), we have \(\mathbf {A}^{\sigma _1}\mathbf {A}^{\sigma _2}\mathbf {v}= \mathbf {0}\) for all \(\sigma _1,\sigma _2\in \varSigma \), hence \(\mathbf {A}^\sigma \mathbf {v}\in E_1\) for all \(\sigma \in \varSigma \). Moreover, we have \(\mathbf {A}^\sigma \mathbf {v}\in U\) for all \(\sigma \in \varSigma \) since U is \(\mathcal {A}\)-invariant. It follows that \(\mathbf {A}^\sigma \mathbf {v}\in E_1\cap U=\{\mathbf {0}\}\) and \(\mathbf {A}^\sigma \mathbf {v}= \mathbf {0}\) for all \(\sigma \in \varSigma \), hence \(\mathbf {v}\in E_1\) and since \(\mathbf {v}\in U\) we have \(\mathbf {v}=\mathbf {0}\). To conclude, we have \(U = \{\mathbf {0}\}\), hence \(E_1 = E_2\).   \(\square \)

Since a GWM\(^\mathrm{c}\)  can be transformed into an equivalent semi-simple GWM\(^\mathrm{c}\)  in polynomial time (see Corollary 1 and the following discussion), the minimization of a GWM\(^\mathrm{c}\)  defined over circular strings can be achieved in polynomial time by first converting it to a semi-simple GWM\(^\mathrm{c}\)  and then applying Corollary 3 with Proposition 4. The overall minimization algorithm is summarized in Algorithm 1.

figure a

5 Conclusion

We proposed polynomial time algorithms to handle both the minimization and the equivalence problems for GWMs defined over circular strings. By doing so, we unraveled fundamental notions from algebra theory that will be central to the study of GWMs. In particular, the notion of semi-simple GWM\(^\mathrm{c}\)  was paramount to our analysis. Intuitively, semi-simplicity can be thought of as a weak form of minimality: components from the radical do not contribute to the final computation of a GWM\(^\mathrm{c}\) (semi-simplification thus corresponds to annihilating these irrelevant components from the algebra, i.e. from the GWM\(^\mathrm{c}\)’s dynamics).

The next step is of course to try to extend the results obtained in this paper to GWMs defined over more general families of graphs. One promising direction we are currently investigating relies on extending the central notion of semi-simple GWM\(^\mathrm{c}\)  to GWMs defined over arbitrary families of labeled graphs: by opening any edge e in a graph G one obtains a graph \(G_e\) with two free ports (i.e. edges having one end that is not connected to any vertex) which would be mapped by a d-dimensional GWM A to a matrix \(\mathbf {A}^{G_e}\in \mathcal {M}_d(\mathbb {F})\) (indeed, a GWM naturally maps any graph with k free ports to a kth order tensor; see [22, Sect. 2.2.3] for more details). For circular strings, opening an edge corresponds to choosing a particular position in the circular string leading to an actual string \(x\in \varSigma ^*\) which is mapped to \(\mathbf {A}^x\) by the GWM. For arbitrary labeled graphs, we have \(f_A(G)=\mathrm {Tr} (\mathbf {A}^{G_e})\) similarly to the case of circular strings. One can then consider the algebra \(\mathcal {A}\) generated by the matrices \(\mathbf {A}^{G_e}\) for any graph G in some family of graphs and any edge e in G, and define a semi-simple GWM as a GWM for which this algebra \(\mathcal {A}\) is semi-simple (note that one exactly recovers the notion of semi-simple GWM introduced here in the special case of circular strings). Hence, the fundamental results from algebra theory we leveraged in this paper should be directly relevant to the study of general GWMs. Beyond minimization, we intend to study the problem of approximate minimization (such as the ones considered in [7, 23] for string and tree weighted automata) along with the closely related problem of learning GWMs defined over richer families of graphs than the one of circular strings.