Number of Prefixes in Trace Monoids: Clique Polynomials and Dependency Graphs
 53 Downloads
Abstract
We present some asymptotic properties on the average number of prefixes in trace languages. Such languages are characterized by an alphabet and a set of commutation rules, also called concurrent alphabet, which can be encoded by an independency graph or by its complement, called dependency graph. One key technical result, which has its own interest, concerns general properties of graphs and states that “if an undirected graph admits a transitive orientation, then the multiplicity of the root of minimum modulus of its clique polynomial is smaller or equal to the number of connected components of its complement graph”. As a consequence, under the same hypothesis of transitive orientation of the independency graph, one obtains the relation \({\text {E}}[T_n] = O({\text {E}}[W_n])\), where the random variables \(T_n\) and \(W_n\) represent the number of prefixes in traces of length n under two different fundamental probabilistic models:

the uniform distribution among traces of length n (for \(T_n\)),

the uniform distribution among words of length n (for \(W_n\)).
These two quantities are related to the time complexity of algorithms for solving classical membership problems on trace languages.
Keywords
Trace monoids Clique polynomials Möbius functions Automata theory Analytic combinatorics Patterns in words1 Introduction
In computer science, trace monoids have been introduced by Mazurkiewiecz [22] as a model of concurrent events, describing which action can permute or not with another action (we give a formal definition of traces and trace monoids in Sect. 2, see also [14] for a treatise on the subject). In combinatorics, they are related to the fundamental studies of the “monoïde partiellement commutatif” introduced by Cartier and Foata in [10], and to its convenient geometrical view as heap of pieces proposed by Viennot in [25].
Several classical problems in language theory (recognition of rational and contextfree trace languages, determination of the number of representative words of a given trace, computing the finite state automaton recognizing these words) can be solved by algorithms that work in time and space proportional to (or strictly depending on) the number of prefixes of the input trace [3, 6, 7, 8, 15, 23]. This is due to the fact that prefixes represent the possible decompositions of a trace in two parts and hence they are natural indexes for computations on traces.
\(\{T_n\}_{n\in \mathbb {N}}\), the number of prefixes of traces of length n generated at random under the equidistribution of traces of given size;
\(\{W_n\}_{n\in \mathbb {N}}\), the number of prefixes of traces of length n generated at random under the equidistribution of representative words of given size.
For some families of trace monoids, the asymptotic average, variance, and limit distributions of \(\{T_n\}\) and \(\{W_n\}\) are known [6, 7, 19, 20, 21]. It is interesting that they rely on the structural properties of an underlying graph (the independency graph, defined in Sect. 2). For example, it is known that, for every trace monoid \(\mathcal {M}\), the maximum number of prefixes of a trace of length n is of the order \(\varTheta (n^{\alpha })\), where \(\alpha \) is the size of the largest clique in the concurrent alphabet defining \(\mathcal {M}\) [8]. We summarize further such results in Sect. 3. In analytic combinatorics (see [17] for an introduction to this field), it remains a nice challenge to get a more universal description of the possible asymptotics of \(T_n\) and \(W_n\).
The paper is organized as follows: in Sect. 2 we recall the basic definitions on trace monoids; in Sect. 3 we summarize some asymptotic results on the random variables \(T_n\) and \(W_n\); in Sects. 4 and 5, we present our main results on crosssections of trace monoids, clique polynomials, and a new bound relating the asymptotic behaviour of \(T_n\) and \(W_n\); we then conclude with possible future extensions of our work.
2 Notation and Preliminary Notions
For the reader not already familiar with the terminology of trace languages, we present in this section the key notions used in this article (see e.g. [14] for more details on all these notions).
Given a finite alphabet \(\varSigma \), as usual \(\varSigma ^*\) denotes the free monoid of all words over \(\varSigma \), \(\varepsilon \) is the empty word and w is the length of a word w for every \(w\in \varSigma ^*\). We recall that, for any \(w\in \varSigma ^*\), a prefix of w is a word \(u\in \varSigma ^*\) such that \(w=uv\), for some \(v \in \varSigma ^*\). Also, for any finite set \({\mathcal S}\), we denote by \(\#{\mathcal S}\) the cardinality of \({\mathcal S}\).
A concurrent alphabet is then a pair \((\varSigma ,{\mathcal C})\), where \({\mathcal C}\subseteq \varSigma \times \varSigma \) is a symmetric and irreflexive relation over \(\varSigma \). Such a pair can alternatively be defined by anundirected graph, which we call independency graph, where \(\varSigma \) is the set of nodes and \(\{\{a,b\} \mid (a,b) \in {\mathcal C}\}\) is the set of edges. Its complement \((\varSigma , C^c)\) is called dependency graph. As the notions of concurrent alphabet and independency graph are equivalent, in the sequel we indifferently refer to either of them. Informally, a concurrent alphabet lists the pairs of letters which can commute.
The trace monoid generated by a concurrent alphabet \((\varSigma ,{\mathcal C})\) is defined as the quotient monoid \(\varSigma ^*/ \equiv _{\mathcal C}\), where \(\equiv _{\mathcal C}\) is the smallest congruence extending the equations \(\{ab=ba : (a,b)\in {\mathcal C}\}\), and is denoted by \(\mathcal {M}(\varSigma ,{\mathcal C})\) or simply by \(\mathcal {M}\). Its elements are called traces and its subsets are named trace languages. In other words, a trace is an equivalence class of words with respect to the relation \(\equiv _{\mathcal C}\) given by the reflexive and transitive closure of the binary relation \(\sim _{\mathcal C}\) over \(\varSigma ^*\) such that \(uabv \sim _{\mathcal C}ubav\) for every \((a,b)\in {\mathcal C}\) and every \(u,v \in \varSigma ^*\). For any \(w\in \varSigma ^*\), we denote by [w] the trace represented by w; in particular \([\varepsilon ]\) is the empty trace, i.e. the unit of \(\mathcal {M}\). Note that the product of two traces \(r,s \in \mathcal {M}\), where \(r=[x]\) and \(s=[y]\), is the trace \(t= [xy]\), which does not depend on the representative words \(x,y\in \varSigma ^*\) and we denote the product by \(t=s\cdot r\). The length of a trace \(t \in \mathcal {M}\), denoted by t, is the length of any representative word. For any \(n\in \mathbb {N}\), let \(\mathcal {M}_n := \{ t \in \mathcal {M}: t = n\}\) and \(m_n:=\# \mathcal {M}_n\).
Note that if \({\mathcal C}=\emptyset \) then \(\mathcal {M}\) reduces to \(\varSigma ^*\), while if \({\mathcal C}=\{(a,b)\in \varSigma \times \varSigma \mid a\ne b\}\) then \(\mathcal {M}\) is the commutative monoid of all monomials with letters in \(\varSigma \).
Any trace \(t \in \mathcal {M}\) can be represented by a partial order over the multiset of letters of t, denoted by \(\text{ PO }(t)\). It works as follows: first, consider a word w satisfying \(t=[w]\). Then, for any pair of letters (a, b) of w, let \(a_i\) be the ith occurrence of the letter a and \(b_j\) the jth occurrence of the letter b. The partial order is then defined as \(a_i < b_j\) whenever \(a_i\) precedes \(b_j\) in all representative words of [w]. (See Example 1 hereafter.)
A prefix of a trace \(t \in \mathcal {M}\) is a trace p such that \(t=p\cdot s\) for some \(s \in \mathcal {M}\). Clearly, any prefix of t is a trace \(p=[u]\) where u is a prefix of a representative of t. It is easy to see that if p is a prefix of t then the \(\text{ PO }(u)\) is an order ideal of \(\text{ PO }(t)\) and can be represented by the corresponding antichain. We recall that an antichain of a partial order set \(({\mathcal S},\le )\) is a subset \(A\subseteq {\mathcal S}\) such that \(a\le b\) does not hold for any pair of distinct elements \(a,b\in A\), while an order ideal in \(({\mathcal S},\le )\) is a subset \(\{a\in {\mathcal S}\mid \exists \ b\in A \text{ such } \text{ that } a \le b\}\) for some antichain A of \(({\mathcal S},\le )\). For every \(t \in \mathcal {M}\), we denote by \({\text {Pref}}(t)\) the set of all prefixes of t.
Example 1
Recognizable, rational and contextfree trace languages are well defined by means of linearization and closure operations over traditional string languages; their properties and in particular the complexity of their membership problems are widely studied in the literature (see for instance [8, 14, 15, 23]).
For any alphabet \(\varSigma \) and trace monoid \(\mathcal {M}\), we denote by \({\mathbb {Z}\langle \!\langle \varSigma \rangle \!\rangle }\) the set of formal series on words (they are thus series in noncommutative variables) and by \({\mathbb {Z}\langle \!\langle \mathcal {M}\rangle \!\rangle }\) the set of formal series on traces (they are thus series in partially commutative variables), and \({\mathbb {Z}[\![z]\!]}\) stands for ring of classical power series in the variable z. These three distinct rings (with the operations of sum and Cauchy product, see [5, 14, 24]) will be used in Sects. 4 and 5.
3 Asymptotic Results for the Number of Prefixes
In this section, we recall the main results on the number of prefixes of a random trace, under two different probabilistic models.
3.1 Probabilistic Analysis on Equiprobable Words
3.2 Probabilistic Analysis on Equiprobable Traces
As it is known from [21] that \(p_{\mathcal {M}}\) has a unique root \(\rho \) of smallest modulus (and clearly \(\rho >0\) via Pringsheim’s theorem, see [17]), one gets \(m_n=\#\mathcal {M}_n = c \rho ^{n} n^{\ell 1} + O\left( \rho ^{n}n^{\ell 2}\right) \), where \(c>0\) is a constant and \(\ell \) is the multiplicity of \(\rho \) in \(p_{\mathcal {M}}(z)\). We observe that the existence of a unique root of smallest modulus for \(p_{\mathcal {M}}(z)\) is not a consequence of the strict monotonicity of the sequence \(\{m_n\}\). Indeed, if one considers \(M(z)=\frac{1}{(1z^3)(1z)^2}\), one has \(m_{n+3}= ((n + 5) m_n + 2 m_{n + 1} + 2 m_{n + 2})/(n + 3)\) so the sequence \(\{m_n\}\) is strictly increasing; however, the polynomial \((1z^3)(1z)^2\) has 3 distinct roots of smallest modulus. Therefore, such a M(z) cannot be the generating function of a trace monoid.
4 CrossSections of Trace Monoids
Crosssections are standard tools to study the properties of trace monoids by lifting the analysis at the level of free monoids. Intuitively, a crosssection of a trace monoid \(\mathcal {M}\) is a language \({\mathcal L}\) having exactly one representative string for each trace in \(\mathcal {M}\). Thus, the generating function of \({\mathcal L}\) coincides with M(z) and hence it satisfies equality (6). As a consequence, by choosing an appropriate regular crosssection \({\mathcal L}\), one can use the property of a finite state automaton recognizing \({\mathcal L}\) to study the singularities of M(z), i.e. the roots of \(p_{\mathcal {M}}(z)\).
for each trace \(t\in \mathcal {M}\), there exists a word \(w\in {\mathcal L}\) such that \(t=[w]\),
for each pair of words \(x, y \in {\mathcal L}\), if \([x]=[y]\) then \(x=y\).
Proposition 1 (Factorisation property)
Example 2
\(\emptyset \)  \(\{a,b\}\)  \(\{b,c\}\)  \(\{c\}\)  
\(\emptyset \)  \(a+b\)  \(c+d\)  e  0  
\(\widetilde{A} \ = \)  \(\{a,b\}\)  0  \(c+d\)  e  0 
\(\{b,c\}\)  0  d  e  a  
\(\{c\}\)  0  d  e  \(a+b\) 
Proposition 2
Proof (sketch)
The result follows from Proposition 1 by refining equalities (10) and recalling that all roots of clique polynomials are different from 0. \(\square \)
We observe that the reverse property does not hold in general, i.e. it may occur that an eigenvalue of A is not the reciprocal of a root of \(p_{\mathcal {M}}(z)\). However, as shown in the following section, such a reverse sentence is true whenever the graph \((\varSigma ,{\mathcal C})\) admits a transitive orientation.
5 Concurrent Alphabets with Transitive Orientation
Now let us consider a trace monoid \(\mathcal {M}\) such that its independency graph \((\varSigma ,C)\) admits a transitive orientation. Then, we may fix a total order \(\le \) on \(\varSigma \) such that \(<_{\mathcal C}\) is transitive. In this case, the definition of crosssection \({\mathcal L}_{\le }\) and of the automaton \(\mathcal A\) can be simplified, since the set of “forbidden” factors of the form bwa, with \(a<_{\mathcal C}b\) and \(w\in {\mathcal C}_a^*\), can be reduced to the simple set of words \({\mathcal S}= \{\tau \sigma \in \varSigma ^2 \mid \sigma <_{\mathcal C}\tau \}\). To prove this property, consider a forbidden factor of the above form bwa, with \(a<_{\mathcal C}b\) and \(w\in {\mathcal C}_a^*\); thus any symbol c occurring in w must verify \((a,c)\in {\mathcal C}\). As a consequence, either \(a<_{\mathcal C}c\) or \(c <_{\mathcal C}a\): in the first case ca belongs to \({\mathcal S}\) while, in the second case, by transitivity of \(<_{\mathcal C}\) we have \(c <_{\mathcal C}b\) and hence bc is in \({\mathcal S}\).
Thus, identity (8) can be simplified as \({\mathcal L}_{\le } = \varSigma ^* \backslash \bigcup _{a<_{\mathcal C}b} \varSigma ^* b a \varSigma ^*.\) Moreover, the state set of the automaton \(\mathcal A\) can be reduced to \(Q = \{{\text {Pred}}(a)\mid a \in \varSigma \}\) and the transition function now assumes values \(\delta ({\mathcal S},b)={\text {Pred}}(b)\), for every \({\mathcal S}\in Q\) and every \(b\in \varSigma \backslash {\mathcal S}\).
Proposition 3
Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet with an associated independency graph admitting a transitive orientation \(<_{\mathcal C}\). Let \(\le \) be a total order on \(\varSigma \) extending \(<_{\mathcal C}\). Also assume that the dependency graph \((\varSigma ,{\mathcal C}^c)\) is connected. Then the adjacency matrix A is primitive.
Proof (sketch)
Under these hypotheses, by the simplifications above, it turns out that the state diagram of the automaton \(\mathcal A\) (defined by \(\le \)) is strongly connected and has at least one loop. \(\square \)
The hypothesis of transitivity for \(<_{\mathcal C}\) cannot be avoided to guarantee that A is primitive. For instance, in Example 2 the dependency graph \((\varSigma ,{\mathcal C}^c)\) is connected but the orientation \(<_{\mathcal C}\) of \((\varSigma ,{\mathcal C})\) is not transitive, and in fact observe that the corresponding transition matrix is not irreducible and hence A is not primitive. Nevertheless, the smallest root of \(p_{\mathcal {M}}(z)\) is simple and then the same concurrent alphabet satisfies the following theorem.
Theorem 4
Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet. If its independency graph admits a transitive orientation \(<_{\mathcal C}\), then one has \(\ell \le k\), where \(\ell \) and k denote, respectively, the multiplicity of the smallest root of \(p_{\mathcal {M}}(z)\) and the number of connected components of the dependency graph \((\varSigma ,{\mathcal C}^c)\).
Proof (sketch)
First, it is wellknown [18, 21] that \(p_{\mathcal {M}}(z)\) is always the product of the clique polynomials of all independency subalphabets given by the connected components of \((\varSigma ,C^c)\). Then, each of these clique polynomials (using the additional condition that one has a transitive orientation) has a smallest root of multiplicity 1: this follows from Proposition 3 and a commutative analogue of a result in [11] stating that, when \((\varSigma ,C)\) has a transitive orientation, its clique polynomial equals \(\det (IzA)\). \(\square \)
Applying the previous theorem to relations (3) and (7), one gets the following.
Theorem 5
Let \((\varSigma ,{\mathcal C})\) be a concurrent alphabet. If its independency graph admits a transitive orientation \(<_{\mathcal C}\), then the random variables counting the number of prefixes in traces (as defined in (2)) satisfy \({\text {E}}[T_n] = O({\text {E}}[W_n])\).
Example 3
The following example considers an independency graph of \(\mathcal {M}\) that does not admit any transitive orientation. In this case \(p_{\mathcal {M}}(z)\) is a proper factor of \(\det (IzA)\), but its smallest root is again simple and hence \(\ell \le k\) is still true even if the hypothesis of Theorem 4 is not satisfied.
Example 4
\(\emptyset \)  \( \{a,b\} \)  \( \{a\} \)  \(\{b,d\}\)  \(\{d\}\)  

\(\widetilde{A} \ = \)  \(\emptyset \)  \(a+b\)  c  d  e  0 
\(\{a,b\} \)  0  c  d  e  0  
\(\{a\} \)  b  c  d  e  0  
\(\{b,d\}\)  0  c  0  e  a  
\(\{d\}\)  b  c  0  e  a 
Accordingly, one has \(\det (IzA) =16z+10z^25z^3 = (1z) p_{\mathcal {M}}(z).\) \(\blacksquare \)
6 Conclusion
We have investigated the fundamental role played by the clique polynomial in asymptotic studies of trace monoids. Building on the factorization property (stated in Proposition 1), we got a link between the multiplicity of its smallest root and the number of connected components of some associated graph (Theorem 4). This, in turn, is the key for a new asymptotic relation between the number of prefixes in traces of length n: \({\text {E}}[T_n] = O({\text {E}}[W_n])\) (Theorem 5), where \(T_n\) and \(W_n\) correspond to two natural models (uniform distribution over traces and over words). In the long version of this article, we plan to extend these analyses to more general cases (including concurrent alphabets without transitive orientation).
Several other problems remain open in our context and could be at the centre of future investigations. The first one concerns the adjacency matrix A defined in Sect. 4, which does not seem to be studied too much in the previous literature; in particular, in all our examples \(\text{ det }(IzA)\) is a clique polynomial, even when the concurrent alphabet \((\varSigma ,{\mathcal C})\) does not admit any transitive orientation. For this purpose, similarly to the approach used in [11] and in our proof of Theorem 4, it is possible to adapt a noncommutative approach building on links to words with forbidden patterns (see [2]). We plan to use these links to tackle the asymptotic behaviour of the variance and higher moments of \(\{T_n\}\), and the limit distributions of both \(\{T_n\}\) and \(\{W_n\}\) for all trace monoids.
In conclusion, all these studies are further illustration of the nice interplay between complex analysis (analytic combinatorics) and the structural properties of formal languages, as also illustrated e.g. in [4, 5, 16, 17, 19, 20].
References
 1.Anisimov, A.V., Knuth, D.E.: Inhomogeneous sorting. Int. J. Comput. Inf. Sci. 8, 255–260 (1979). https://doi.org/10.1007/BF00993053MathSciNetCrossRefzbMATHGoogle Scholar
 2.Asinowski, A., Bacher, A., Banderier, C., Gittenberger, B.: Analytic combinatorics of lattice paths with forbidden patterns, the vectorial kernel method, and generating functions for pushdown automata. Algorithmica 82, 1–43 (2020). https://doi.org/10.1007/s00453019006233MathSciNetCrossRefzbMATHGoogle Scholar
 3.Avellone, A., Goldwurm, M.: Analysis of algorithms for the recognition of rational and contextfree trace languages. RAIRO Theoret. Inform. Appl. 32, 141–152 (1998)MathSciNetCrossRefGoogle Scholar
 4.Banderier, C., Drmota, M.: Formulae and asymptotics for coefficients of algebraic functions. Comb. Probab. Comput. 24, 1–53 (2015)MathSciNetCrossRefGoogle Scholar
 5.Berstel, J., Reutenauer, C.: Rational Series and Their Languages. Springer, Heidelberg (1988)CrossRefGoogle Scholar
 6.Bertoni, A., Goldwurm, M.: On the prefixes of a random trace and the membership problem for contextfree trace languages. In: Huguet, L., Poli, A. (eds.) AAECC 1987. LNCS, vol. 356, pp. 35–59. Springer, Heidelberg (1989). https://doi.org/10.1007/3540510826_68CrossRefGoogle Scholar
 7.Bertoni, A., Goldwurm, M., Sabadini, N.: Analysis of a class of algorithms for problems on trace languages. In: Beth, T., Clausen, M. (eds.) AAECC 1986. LNCS, vol. 307, pp. 202–214. Springer, Heidelberg (1988). https://doi.org/10.1007/BFb0039193CrossRefGoogle Scholar
 8.Bertoni, A., Mauri, G., Sabadini, N.: Equivalence and membership problems for regular trace languages. In: Nielsen, M., Schmidt, E.M. (eds.) ICALP 1982. LNCS, vol. 140, pp. 61–71. Springer, Heidelberg (1982). https://doi.org/10.1007/BFb0012757CrossRefzbMATHGoogle Scholar
 9.Breveglieri, L., Crespi Reghizzi, S., Goldwurm, M.: Efficient recognition of trace languages defined by repeat until loops. Inf. Comput. 208, 969–981 (2010)MathSciNetCrossRefGoogle Scholar
 10.Cartier, P., Foata, D.: Problèmes combinatoire de commutation et réarrangements. Lecture Notes in Mathematics, vol. 85. Springer, Heidelberg (1969). https://doi.org/10.1007/BFb0079468CrossRefzbMATHGoogle Scholar
 11.Choffrut, C., Goldwurm, M.: Determinants and Möbius functions in trace monoids. Discrete Math. 194, 239–247 (1999)MathSciNetCrossRefGoogle Scholar
 12.Diekert, V.: Transitive orientations, Möbius functions, and complete semithue systems for free partially commutative monoids. In: Lepistö, T., Salomaa, A. (eds.) ICALP 1988. LNCS, vol. 317, pp. 176–187. Springer, Heidelberg (1988). https://doi.org/10.1007/3540194886_115CrossRefGoogle Scholar
 13.Diekert, V.: Möbius functions and confluent semicommutations. Theor. Comput. Sci. 108, 25–43 (1993)CrossRefGoogle Scholar
 14.Diekert, V., Rozenberg, G. (eds.): The Book of Traces. World Scientific, Singapore (1995)Google Scholar
 15.Duboc, C.: Commutations dans les monoïdes libres: un cadre théorique pour l’étude du parallélisme. Thèse, Faculté des Sciences de l’Université de Rouen (1986)Google Scholar
 16.Flajolet, P.: Analytic models and ambiguity of contextfree languages. Theor. Comput. Sci. 49, 283–309 (1987)MathSciNetCrossRefGoogle Scholar
 17.Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
 18.Fisher, D., Solow, A.: Dependence polynomials. Discrete Math. 82, 251–258 (1990)MathSciNetCrossRefGoogle Scholar
 19.Goldwurm, M.: Some limit distributions in analysis of algorithms for problems on trace languages. Int. J. Found. Comput. Sci. 1(3), 265–276 (1990)MathSciNetCrossRefGoogle Scholar
 20.Goldwurm, M.: Probabilistic estimation of the number of prefixes of a trace. Theor. Comput. Sci 92, 249–268 (1992)MathSciNetCrossRefGoogle Scholar
 21.Goldwurm, M., Santini, M.: Clique polynomials have a unique root of smallest modulus. Inf. Process. Lett. 75, 127–132 (2000)MathSciNetCrossRefGoogle Scholar
 22.Mazurkiewicz, A.: Concurrent program schemes and their interpretations, DAIMI Rep. PB 78, Aarhus University, Aarhus (1977)Google Scholar
 23.Rytter, W.: Some properties of trace languages. Fund. Inform. 7, 117–127 (1984)MathSciNetzbMATHGoogle Scholar
 24.Salomaa, A., Soittola, M.: AutomataTheoretic Aspects of Formal Power Series. Springer, New York (1978). https://doi.org/10.1007/9781461262640CrossRefzbMATHGoogle Scholar
 25.Viennot, G.X.: Heaps of pieces, I : Basic definitions and combinatorial lemmas. In: Labelle, G., Leroux, P. (eds.) Combinatoire énumérative. LNM, vol. 1234, pp. 321–350. Springer, Heidelberg (1986). https://doi.org/10.1007/BFb0072524CrossRefGoogle Scholar