Abstract
We introduce a novel class of features for multidimensional time series that are invariant with respect to transformations of the ambient space. The general linear group, the group of rotations and the group of permutations of the axes are considered. The starting point for their construction is Chen’s iterated-integral signature.
Similar content being viewed by others
1 Introduction
The analysis of multidimensional time series is a standard problem in data science. Usually, as a first step, features of a time series must be extracted that are (in some sense) robust and that characterize the time series. In many applications the features should additionally be invariant to a particular group acting on the data. In Human Activity Recognition for example, the orientation of the measuring device is often unknown. This leads to the requirement of rotation invariant features [37]. In EEG analysis, invariants to the general linear group are beneficial [12]. In other applications, the labeling of coordinates is arbitrary, which leads to permutation invariant features.
As any time series in discrete time can, via linear interpolation, be thought of as a multidimensional curve, one is naturally led to the search of invariants of curves. Invariant features of curves have been treated using various approaches, mostly focussing on two-dimensional curves. Among the techniques are Fourier series (of closed curves) [21, 27, 52], wavelets [6], curvature based methods [2, 36] and integral invariants [13, 35].
The usefulness of iterated integrals in data analysis has recently been realized, see for example [20, 26, 32, 51] and the introduction in [5]. Let us demonstrate the appearance of iterated integrals on a very simple example. Let \(X: [0,T] \to \mathbb{R}^{2}\) be a smooth curve. Say we are looking for a feature describing this curve that remains unchanged if one is handed a rotated version of \(X\). Maybe the simplest one that one can come up with is the (squared) total displacement length \(|X_{T} - X_{0}|^{2}\). Now,
where we have applied the fundamental theorem of calculus twice and then introduced the notation \(dX^{i}_{r}\) for \(\dot{X}^{i}_{r} dr\). We see that we have expressed this simple invariant in terms of iterated integrals of \(X\); the collection of which is usually called its signature. The aim of this work can be summarized as describing all invariants that can be obtained in this way. It turns out, when formulated in the right way, this search for invariants reduces to classical problems in invariant theory. We note that already in the early work of Chen (see for example [4, Chap. 3]) the topic of invariants arose, although a systematic study was missing (see also [23]).
The aim of this work is threefold. Firstly, we adapt classical results in invariant theory regarding non-commuting polynomials (or, equivalently, multilinear maps), to our situation. These results are spread out in the literature and sometimes need a little massaging. Secondly, it lays out the usefulness of the iterated-integral signature in the search for invariants of \(d\)-dimensional curves. We show, see Sect. 7, that certain “integral invariants” found in the literature are in fact found in the signature and our approach simplifies their enumeration. Lastly, we present new geometric insights into some entries found in the signature, Sect. 3.3.Footnote 1
The paper is structured as follows. In the next section we introduce the iterated-integral signature of a multidimensional curve, as well as some algebraic language to work with it. Based on this signature, we present in Sect. 3 and Sect. 4 invariants to the general linear group and the special orthogonal group. Both are based on classical results in invariant theory. For completeness, we present in Sect. 5 the invariants to permutations, which have been constructed in [1]. In Sect. 6 we show how to use all these invariants if an additional (time) coordinate is introduced. In Sect. 7 we relate our work to the integral invariants of [13] and demonstrate that the invariants presented there cannot be complete. We formulate the conjecture of completeness for our invariants and point out open algebraic questions.
For readers who want to use these invariants without having to go into the technical results, we propose the following route. The required notation is presented in the next section. The invariants are presented in Proposition 3.11, Proposition 4.4 and Proposition 5.4. Examples are given in Sect. 3.1 (in particular Remark 3.14), Example 4.7 and Example 5.6. All these invariants are also implemented in the software package [9]. For calculating the iterated-integral signature in Python we propose using the package iisignature, as described in [40].
2 The Signature of Iterated Integrals
By a multidimensional curve\(X\) we will denote a continuous mapping \(X: [0,T] \to \mathbb{R}^{d}\) of bounded variation.Footnote 2 The aim of this work is to find features (i.e. complex or real numbers) describing such a curve that are invariant under the general linear group, the group of rotations and the group of permutations. Note that in practical situations one is usually presented with a discrete sequence of data points in \(\mathbb{R}^{d}\), a multidimensional time series. Such a time series can be easily transformed into a (piecewise) smooth curve by linear interpolation.
It was proven in [22], which extends the work of [4], that a curve \(X = (X^{1},.., X^{d})\) is almost completely characterized by the collection of its iterated integralsFootnote 3
The collection of all these integrals is called the signatureFootnote 4 of \(X\). In a first step, we can hence reduce the goal
Find functions \(\varPsi : \text{curves} \to \mathbb{R}\) that are invariant under the action of a group \(G\) ,
to the goal
Find functions \(\varPsi : \text{signature of curves} \to \mathbb{R}\) that are invariant under the action of a group \(G\) .
By the shuffle identity (Lemma 2.1), any polynomial function on the signature can be re-written as a linear function on the signature. Assuming that arbitrary functions are well-approximated by polynomial functions, we are led to the final simplification, which is the goal of this paper
Find linear functions \(\varPsi : \text{signature of curves} \to \mathbb{R}\) that are invariant under the action of a group \(G\) .
2.1 Algebraic Underpinning
Let us introduce some algebraic notation in order to work with the collection of iterated integrals. Denote by \(T((\mathbb{R}^{d}))\) the space of formal power series in \(d\)non-commuting variables \(x_{1}, x_{2}, \dots , x_{d}\). We can conveniently store all the iterated integrals of the curve \(X\) in \(T((\mathbb{R}^{d}))\), by defining the signature of \(X\) to be
Here the sum is taken over all \(n \ge 0\) and all \(i_{1}, \dots , i _{n} \in \{1,2,..,d\}\). For \(n=0\) the summand is, for algebraic reasons, taken to be the constant 1.
The algebraic dual of \(T((\mathbb{R}^{d}))\) is \(T(\mathbb{R}^{d})\), the space of polynomialsFootnote 5 in \(x_{1}, x_{2}, \dots , x _{d}\). The dual pairing, denoted by \(\langle \cdot , \cdot \rangle \) is defined by declaring all monomials to be orthonormal, so for example
Here, we write the element of \(T((\mathbb{R}^{d}))\) on the left and the element of \(T(\mathbb{R}^{d})\) on the right. We can “pick out” iterated integrals from the signature as follows
The space \(T((\mathbb{R}^{d}))\) becomes an algebra by extending the usual product of monomials, denoted ⋅, to the whole space by bilinearity. Note that ⋅ is non-commutative.
On \(T(\mathbb{R}^{d})\) we often use the shuffle product which, on monomials, interleaves them in all order-preserving ways, so for example
Note that is commutative.
Monomials, and hence homogeneous polynomials, have the usual concept of order or homogeneity. For \(n \ge 0\) we denote the projection on polynomials of order \(n\) by \(\pi _{n}\), so for example
See [41] for more background on these spaces.
As mentioned above, every polynomial expression in terms of the signature can be re-written as a linear expression in (different) terms of the signature. This is the content of the following lemma, which is proven in [39] (see also [41, Corollary 3.5]).
Lemma 1
(Shuffle identity)
Let\(X: [0,T] \to \mathbb{R}^{d}\)be a continuous curve of bounded variation, then for every\(a,b \in T(\mathbb{R}^{d})\)
Remark 2
We have used this fact already in the introduction, where we confirmed by hand that
The concatenation of curves is compatible with the product on \(T((\mathbb{R}^{d}))\) in the following sense (for a proof, see for example [16, Theorem 7.11]).
Lemma 3
(Chen’s relation)
For curves\(X: [0,T] \to \mathbb{R}^{d}\), \(Y: [0,T] \to \mathbb{R}^{d}\)denote their concatenation
as\(X_{\cdot }\)on\([0,T]\)and\(Y_{\cdot -T} - Y_{0} + X_{T}\)on\([T,2T]\). Then
We will use the following fact repeatedly, which also explains the commonly used name tensor algebra for \(T(\mathbb{R}^{d})\).
Lemma 4
The space of all multilinear maps on\(\mathbb{R}^{d} \times \cdots \times \mathbb{R}^{d}\) (\(n\)-times) is in a one-to-one correspondence with homogeneous polynomials of order\(n\)in the non-commuting variables\(x_{1},\dots ,x_{d}\)by the following bijective linear map
with\(e_{i}\)being the\(i\)-th canonical basis vector of\(\mathbb{R} ^{d}\).
For example, with \(d=2\) and \(n=3\), we can consider the multilinear map \(\psi\) which takes \(\bigl((a_{1},b_{1}),(a_{2},b_{2}),(a_{3},b_{3})\bigr) \in \mathbb{R}^{2} \times \mathbb{R}^{2} \times \mathbb{R}^{2}\) to the number \(a_{1}a_{2}b_{3}\). It maps to \(\mathsf{poly}(\psi)=x_{1}x_{1}x_{2}\).
3 General Linear Group
Let
be the general linear group of \(\mathbb{R}^{d}\).
Definition 1
For \(w \in \mathbb{N}\), we call \(\phi \in T(\mathbb{R}^{d})\) a \(\operatorname{GL}\)invariant of weight\(w\) if
for all \(A \in \operatorname{GL}(\mathbb{R}^{d})\) and all curves \(X\).
Definition 2
Define a linear action of \(\operatorname{GL}(\mathbb{R}^{d})\) on \(T((\mathbb{R}^{d}))\) and \(T(\mathbb{R}^{d})\), by specifying on monomials
Lemma 3
For all\(A \in \mathbb{R}^{d\times d}\)and any curve\(X\),
Proof
It is enough to verify this on monomials \(\phi = x_{\ell _{1}} .. x _{\ell _{m}}\). Then, since the \(\ell _{r}\)-th component of the curve \(A X\) is equal to \((A X)^{\ell _{r}} = \sum_{j_{r}} A_{\ell _{r} j_{r}} X^{j_{r}}\), we get
□
We can simplify the concept of \(\operatorname{GL}\) invariants further, using the next lemma. Owing to the shuffle identity, signatures of curves live in a nonlinear subset of the whole tensor algebra \(T((\mathbb{R}^{d}))\), the set of “grouplike elements” (compare [41, Sect. 3.1]). It turns out though that they linearly span all of \(T((\mathbb{R}^{d}))\).
Lemma 4
For \(n\ge 1\)
Proof
It is clear by definition that the left hand side of (1) is included in \(\pi _{n} T((\mathbb{R}^{d}))\). We show the other direction and use ideas of [3, Proposition 4]. Let \(x_{i_{n}} \cdot \ldots \cdot x_{i_{1}} \in \pi _{n} T((\mathbb{R} ^{d}))\) be given. Let \(X\) be the piecewise linear path that results from the concatenation of the vectors \(t_{1} e_{i_{1}}\), \(t_{2} e_{i_{2}}\) up to \(t_{n} e_{i_{n}}\), where \(e_{i}\), \(i=1,..,d\) is the standard basis of \(\mathbb{R}^{d}\). Its signature is given by (see for example [16, Chap. 6])
where the exponential function is defined by its power series. Then
Combining this with the fact that left hand side of (1) is a closed set we get that
These elements span \(\pi _{n} T((\mathbb{R}^{d}))\), which finishes the proof. □
Hence \(\phi \) is a \(\operatorname{GL}\) invariant of weight \(w\) in the sense of Definition 3.1 if and only if for all \(A \in \operatorname{GL}(\mathbb{R}^{d})\)
Since the action respects homogeneity, we immediately obtain that projections of invariants are invariants (take \(B = (\det A)^{-w} A ^{\top }\) in the following lemma):
Lemma 5
If\(\phi \in T(\mathbb{R}^{d})\)satisfies
for some\(B \in \operatorname{GL}(\mathbb{R}^{d})\)then
for all\(n \ge 1\).
Proof
By definition, the action of \(\operatorname{GL}\) on \(T(\mathbb{R}^{d})\) commutes with \(\pi _{n}\). □
In order to apply classical results in invariant theory, we use the bijection \(\mathsf{poly}\) between multilinear functions and non-commuting polynomials, given in Lemma 2.4.
Lemma 6
For\(\psi : (\mathbb{R}^{d})^{\times n} \to \mathbb{R}\)multilinear and\(A \in \operatorname{GL}(\mathbb{R}^{d})\),
Proof
□
The simplest multilinear function
satisfying \(\varPsi ( A v_{1},.., A v_{n} ) = \det ( A ) \varPsi (v_{1},.., v_{n})\) that one can maybe think of, is the determinant itself. That is, \(n=d\) and
where \(v_{1} v_{2} .. v_{n}\) is the \(d \times d\) matrix with columns \(v_{i}\). Up to a scalar this is in fact the only one, and it turns out that invariants of higher weight are built only using determinants as a building block.
To state the following classical result, we introduce the notion of Young diagrams, which play an important role in the representation theory of the symmetric group.
Let \(\lambda = (\lambda _{1},.., \lambda _{r})\) be a partition of \(n \in \mathbb{N}\), which we assume ordered as \(\lambda _{1} \ge \lambda _{2} \ge .. \ge \lambda _{r}\). We associate to it a Young diagram, which is an arrangement of \(n\) boxes into left-justified rows. There are \(r\) rows, with \(\lambda _{i}\) boxes in the \(i\)-th row. For example, the partition \((4,2,1)\) of 7 gives the Young diagram
A Young tableau is obtained by filling these boxes with the numbers \(1,.., n\). Continuing the example, the following is a Young tableau
A Young tableau is standard if the values in every row are increasing (from left to right) and are increasing in every column (from top to bottom). The previous tableau was not standard; the following is.
The following result is classical, see for example Dieudonné [10, Sect. 2.5], [50] and [18], none of which explicitly give a basis for the invariants though. See [47, Theorem 4.1.12] for a slightly different basis.
Theorem 7
The space of multilinear maps
that satisfy
for all\(A \in \operatorname{GL}(\mathbb{R}^{d})\)and\(v_{1}, \dots , v_{n} \in \mathbb{R}^{d}\)is non-empty if and only if\(n = w d\)for some integer\(w \ge 1\).
In that case, a linear basis is given by
where\(C_{i}\)are the columns of\(\varSigma \), and\(\varSigma \)ranges over all standard Young tableaux corresponding to the partition\(\lambda = \overbrace{(w, w,.., w)}^{d~\textit{times}}\)of\(n\).
Here, for a sequence\(C = (c_{1},..,c_{d})\), \(v_{C}\)denotes the matrix of column vectors\(v_{c_{i}}\), i.e.
Remark 8
A consequence of this theorem is the existence of identities between products of determinants. For example, for vectors \(v_{1},.., v_{4} \in \mathbb{R}^{2}\), one can check by hand
This is why the product on the left-hand side here is not part of the basis in the previous lemma for \(d=2\), \(w=2\) (compare Sect. 3.1).
Identities of this type are called Plücker identities. They have a long history and are a major ingredient in the representation theory of the symmetric group. The procedure of reducing certain products of determinants to a basic set of such products is called the straightening algorithm [44, Sect. 2.6]. See also [30] and [48].
Remark 9
The only invariant for \(d=2\), \(w=1\) is
a Lie polynomial. One can generally ask for invariant Lie polynomials [41, Sect. 8.6.2]. This seems to be of no relevance to the application of invariant feature extraction for curves though.
Remark 10
Let \(C^{(d)}_{w}\) be the number of linear independent invariants of weight \(w\). By Theorem 3.7, this is the number of standard Young tableaux of shape \((w, w,.., w)\). By the Hook formula [44, Theorem 3.10.2]
For example for \(d=2\), the number of invariants for weights \(w=0,1,2,3,\ldots\) (and hence for levels \(n=0,2,4,6,\ldots\)) are (the Catalan numbers, https://oeis.org/A000108)
For \(d=3\), the number of invariants for weights \(w=0,1,2,3,\ldots\) (and hence for levels \(n=0,3,6,9,\ldots\)) are (the 3-dimensional Catalan numbers, https://oeis.org/A005789)
Proof of Theorem 3.7
Write \(V = (\mathbb{R}^{d})^{*}\), the dual space of \(\mathbb{R}^{d}\). Every \(\phi \in V^{\otimes n}\) that satisfies
clearly spans a one-dimensional irreducible representation of \(\operatorname{GL}(V)\). Hence we need to investigate all one-dimensional irreducible representation of \(\operatorname{GL}(V)\) contained in \(V^{\otimes n}\) (and it will turn out that all of them satisfy (2)).
The (diagonal) action of \(\operatorname{GL}(V)\) on \(V^{\otimes n}\) is best understood by simultaneously studying the left action of \(S_{n}\) on \(V^{\otimes n}\) given by
By Schur-Weyl duality, [29, Theorem 6.4.5.2], as \(S_{n} \times \operatorname{GL}(V)\) modules,
where the sum is over integer partitions \(\lambda \) of \(n\), the \(S^{\lambda }\) are irreducible representations of \(S_{n}\), to be detailed below and the \(V^{\lambda }\) are irreducible representations of \(\operatorname{GL}(V)\). The exact form of the latter is irrelevant here, we only need to know that \(V^{\lambda }\) is one-dimensional if and only if \(\lambda = (w,..,w)\), \(d\)-times, for some integer \(w \ge 1\), [10, p. 21]. This gives the condition \(n = w d\) in the statement. We assume this to hold from now on.
We are hence left with understanding the unique copy of the “Specht module” \(S^{\lambda }\) inside of \(V^{\otimes n}\). We sketch its classical construction. Let us recall that a tabloid is an equivalence class of Young tableaux modulo permutations leaving the set of entries in each row invariant [44, Chap. 2].Footnote 6 For \(t\) a Young tableau denote \(\{ t \}\) its tabloid, so for example
The symmetric group \(S_{n}\) acts on Young tableaux as
For example
It then acts on tabloids by \(\tau \cdot \{t\} := \{ \tau \cdot t \}\). Define for a Young tableau \(t\)
where the sum is over all \(\pi \in S_{n}\) that leave the set of values in each column invariant. For example with
we get
Then
is an irreducible representation of \(S_{n}\) and
forms a basis [44, Theorem 2.5.2]. This concludes the reminder on representation theory for \(S_{n}\).
Define the map \(\iota \) from the space of tabloids of shape \((w,..,w)\) into \(V^{\otimes n}\) as follows,
where \(e_{i}^{*}\) is the canonical basis of \(V\) and
For example
This is a homomorphism of \(S_{n}\) representations. Indeed,
with
On the other hand
with \(p_{\ell }:= r_{\tau ^{-1}(\ell )}\) and
So indeed \(\iota ( \tau \cdot \{t\} ) = \tau \cdot \iota ( \{t\} )\), and \(\iota \) is a homomorphism of \(S_{n}\) representations. It is a bijection from the space of \((w,..,w)\) tabloids into the space spanned by the vectors
Restricting to \(\operatorname{Irrep}_{(w,..,w)}\) then yields an isomorphism of irreducible \(S_{n}\) representations. Hence \(\iota ( \operatorname{Irrep}_{(w,..,w)} )\) is the (unique) realization of \(S^{\lambda }\) inside of \(V^{\otimes n}\) in (3). We finish by describing its image.
Consider the standard Young tableau \(t_{\mathrm{first}}\) of shape \((w,w,..,w)\) obtained by filling the columns from left to right, i.e.
Clearly, for any (standard) Young tableau \(t\) there exists a unique \(\sigma _{t} \in S_{n}\) such that
We claim
Indeed, since \(\iota \) is a homomorphism of \(S_{n}\) representation,
It remains to check
Every \(\pi \in S^{n}\) that is column-preserving for \(t_{\mathrm{first}}\) can be written as the product \(\pi _{1} \cdot .. \cdot \pi _{w}\), with \(\pi _{j}\) ranging over the permutations of the entries of the \(j\)-th column \(t_{\mathrm{first}}\). Then
as desired. □
Applying Lemma 2.4 to Theorem 3.7 we get the invariants in \(T(\mathbb{R}^{d})\).
Proposition 11
A linear basis for the space of\(\operatorname{GL}\)invariants of order\(n = w d\)is given by
where
where\(C_{i}\)are the columns of\(\varSigma \), \(\varSigma \)ranges over all standard Young tableaux corresponding to the partition\(\lambda = \underbrace{(w, w,.., w)}_{d~\textit{times}}\)of\(n\), and the notation\(v_{C}\)is as introduced in Theorem 3.7.
Remark 12
By Lemma 3.5, for any invariant \(\phi \in T(\mathbb{R}^{d})\) and \(n\ge 1\) we have that \(\pi _{n} \phi \) is also invariant. Hence the previous theorem characterizes all invariants we are interested in (Definition 3.1), not just homogeneous ones.
Remark 13
Note that each of these invariants \(\phi \) consists only of monomials that contain every variable \(x_{1}, \dots , x_{d}\) at least once. This implies that \(\langle S(X)_{0,T}, \phi \rangle \) consists only of iterated integrals that contain every component \(X^{1},\dots ,X^{d}\) of the curve at least once. Hence, if at least one of these components is constant, the whole expression will be zero.
Since \(\phi \) is invariant, this implies that \(\langle S(X)_{0,T}, \phi \rangle = 0\) as soon as there is some coordinate transformation under which one component is constant, that is whenever the curve \(X\) stays in a hyperplane of dimension strictly less then \(d\).
One of the simplest curves in \(d\) dimensions that does not lie in any hyperplane of lower dimension is the moment curve
We will come back to this example in Lemma 3.29.
3.1 Examples
We will use the following short notation:
so, for example
We present the invariants described in Sect. 2 for some special cases of \(d\) and \(w\).
The case \(d=2\)
Level 2 (\(w=1\))
Remark 14
Let us make clear that from the perspective of data analysis, the “invariant” of interest is really the action of this element in \(T(\mathbb{R}^{d})\) on the signature of a curve.
In this example, the real number
changes only by the determinant of \(A \in \operatorname{GL}( \mathbb{R}^{2})\) when calculating it for the transformed curve \(A X\):
Level 4 (\(w=2\))
Remark 15
This is a linear basis of invariants in the fourth level. If one takes algebraic dependencies into consideration, the set of invariants becomes smaller. To be specific, assume that one already has knowledge of the invariant of level 2 (i.e. \(\langle S(X)_{0,T}, \mathtt {12} - \mathtt {21} \rangle \)). If, say in a machine learning application, the learning algorithm can deal sufficiently well with nonlinearities, one should not be required to provide additionally the square of this number. In other words \(|\langle S(X)_{0,T}, \mathtt {12} - \mathtt {21} \rangle |^{2}\) can also be assumed to be “known”. But, by the shuffle identity (Lemma 2.1), this can be written as
Now, seeing that \(4\cdot \mathtt {1122} - 4\cdot \mathtt {1221} - 4\cdot \mathtt {2112} + 4\cdot \mathtt {2211}\) is invariant, there is only one “new” independent invariant in the fourth level, namely \(\mathtt {1212} - \mathtt {1221} - \mathtt {2112} + \mathtt {2121}\).
A similar analysis can also be carried out for the following invariants, but we refrain from doing so, since it can be easily done with a computer algebra system.
Level 6 (\(w=3\))
The case\(d=3\)
Level \(n=3\) (\(w=1\))
Level \(n=6\) (\(w=2\))
The case\(d=4\)
Level \(n=4\) (\(w=1\))
3.2 The Invariant of Weight One, in Dimension Two
Geometric Interpretation
The invariant for \(d=2\), \(w=1\), namely \(\phi = x_{1} x_{2} - x_{2} x_{1}\) has a simple geometric interpretation: it picks out (two times)Footnote 7 the area (signed, and with multiplicity) between the curve \(X\) and the cord spanned between its starting and endpoint (compare Fig. 1). For (smooth) non-intersecting curves, this follows from Green’s theorem [43, Theorem 10.33]. For self-intersecting curves, the mathematically most convenient definition of “signed area” is the integral (in the plane) of its winding number. The claimed relation to the invariant \(\phi \) is for example proven in [34, Proposition 1].
Connection to Correlation
Assume that \(X\) is a continuous curve, piecewise linear between some time points \(t_{i}\), \(i=0,\dots ,n\).Footnote 8 The area is then explicitly calculated as
Here, for two vectors \(a\), \(b\) of length \(n\)
the lag-one cross-correlation, which is a commonly used feature in signal analysis, see for example [38, Chap. 13.2].Footnote 9 In particular, if the curve starts at 0, we have
which is an antisymmetrized version of the lag-one cross-correlation.
Remark 16
The antisymmetrized version of the lag \(\tau \) cross-correlation, for each \(\tau \ge 2\), is also a \(\operatorname{GL}(\mathbb{R}^{2})\) invariant of the curve. In general these invariants cannot be found in the signature, and we thank the anonymous referee for pointing out the following example. Consider the treelike curve which linearly interpolates the following points
Its signature is trivial, but
3.3 The Invariant of Weight One, in Any Dimension
Whatever the dimension \(d\) of the curve’s ambient space, the space of invariants of weight 1 has dimension 1 and is spanned by
Here, for a matrix \(C\) of non-commuting variables, (compare [14, Definition 3.1])
This invariant is of homogeneity \(d\). The following lemma tells us that we can write \(\operatorname{Inv}_{d}\) in terms of expressions on lower homogeneities.
To state it, we first define the operation \(\mathsf{InsertAfter}(x _{i},r)\) on monomials of order \(n \ge r\), as the insertion of the variable \(x_{i}\) after position \(r\), and extend it linearly. For example
Lemma 17
In any dimension\(d\)and for any\(r=0,1,..,d-1\)
where\(\widehat{x_{j}}\)denotes the omission of that argument.
For\(d\)odd,
Remark 18
For completeness, we also note the related de Bruijn’s formula. For \(d\) even,
where
and the Pfaffian (with respect to the shuffle product), is
Proof
The first statement follows from expressing the determinant in (4) in terms of minors with respect to the row \(r+1\) (since the \(x_{i}\) are non-commuting, this does not work with columns!).
Regarding the second statement, since \(d\) is odd and then using the first statement
as claimed. □
An immediate consequence is the following lemma.
Lemma 19
If the ambient dimension\(d\)is odd and the curve\(X\)is closed (i.e. \(X_{T} = X_{0}\)) then
Proof
By Lemma 3.17 and then by the shuffle identity (Lemma 2.1)
since the increment \(\langle S(X)_{0,T}, x_{j} \rangle = X ^{j}_{T} - X^{j}_{0}\) is zero for all \(j\) by assumption. □
In even dimension we have the phenomenon that closing a curve does not change the value of the invariant.
Lemma 20
If the ambient dimension\(d\)is even, then for any curve\(X\)
where\(\bar{X}\)is\(X\)concatenated with the straight line connecting\(X_{T}\)to\(X_{0}\).
Proof
Let \(\bar{X}\) be parametrized on \([0,2T]\) as follows: \(\bar{X} = X\) on \([0,T]\) and it is the linear path connecting \(X_{T}\) to \(X_{0}\) on \([T,2T]\). By translation invariance we can assume \(X_{0} = 0\) and by \(\operatorname{GL}(\mathbb{R}^{d})\)-invariance that \(X_{T}\) lies on the \(x_{1}\) axis. Then the only component of \(\bar{X}\) that is non-constant on \([T,2T]\) is the first one, \(\bar{X}^{1}\).
By Lemma 3.17
Letting the summands act on \(S(\bar{X})_{0,t}\) we get \(\pm 1\) times
For \(j\ne 1\) these expressions are constant on \([T,2T]\), since we arranged things so that those \(\bar{X}^{j}\) do not move on \([T,2T]\). But also for \(j=1\) this expression is constant on \([T,2T]\). Indeed, the integrand
is zero on \([T,2T]\), since \(X\), projected on the \(x_{2}-..-x_{d}\) hyperplane, is a closed curve, and so Lemma 3.19 applies. □
Lemma 21
Let\(X\)be the piecewise linear curve through\(p_{0},..,p_{d} \in \mathbb{R}^{d}\). Then
Proof
First, for any \(v \in \mathbb{R}^{d}\),
Since the signature is also invariant to translation, we can assume \(p_{0} = 0\). Now both sides of the statement transform the same way under the action of \(\operatorname{GL}(\mathbb{R}^{d})\) on the points \(p_{1},..,p_{d}\). It is then enough to prove this for
Now, for this particular choice of points the right hand side is clearly equal to 1. For the left hand side, the only non-zero term is
□
The modulus of the determinant
gives the Lebesgue measure of the parallelepiped spanned by the vectors \(p_{1}-p_{0},..,p_{d}-p_{0}\). The polytope spanned by the points \(p_{0},p_{1},..,p_{d}\) fits \(d!\) times into that parallelepiped. We hence have the relation to classical volume as follows.
Lemma 22
Let\(p_{0},..,p_{d} \in \mathbb{R}^{d}\), then
We now proceed to piecewise linear curves with more than \(d\) vertices.
Lemma 23
Let\(X\)be the piecewise linear curve through, \(p_{0},..,p_{n} \in \mathbb{R}^{d}\), with\(n \ge d\). Then,
For\(d\) even, the subsequences\(i\)are chosen as follows:
and\(i_{1},..,i_{d}\)ranges over all possible increasing subsequences of\(1,2,..,n\)such that for\(\ell \)odd: \(i_{\ell }+ 1 = i_{\ell +1}\).
For\(d\) odd, they are chosen as follows:
and\(i_{1},..,i_{d-1}\)ranges over all possible increasing subsequences of\(1,2,..,n-1\)such that for\(\ell \)odd: \(i_{\ell }+ 1 = i_{\ell +1}\)
Remark 24
The number of indices is easily calculated. In the even case, we have \(B := d/2\) “groups of two” to place, \(A := n - d\) “fillers” in between. This gives
where \(\lfloor r \rfloor \) is the largest integer less than or equal to \(r\).
In the odd case, we have \(B :=(d-1)/2\) “groups of two” to place, with \(A := n-1 - (d-1)\) “fillers” in between. This gives
Remark 25
Consider the case \(d=2\), and a curve \(X\) through the points \(p_{0}, p_{1},.., p_{n} \in \mathbb{R}^{d}\), with \(p_{0} = 0\). Then
We can express \(\operatorname{Inv}_{d}\) as a linear combination of the \(2\times 2\) minors \(P_{i,j}\) of the \(2 \times n\) matrix \((p_{1},p_{2},..,p _{n})\). Generally, it is well-known that all invariants to \(\operatorname{GL}(\mathbb{R}^{2})\) of a tuple of points are expressible in terms of these minors [47, Sect. 3.2]. So, for a piecewise linear curve through \(0,p_{1},..,p_{n}\), all our integral invariants are—a fortiori—expressible in terms of them. In the simple case shown here, this expression is just a linear combination. Experimentally, for higher order invariants, polynomial combinations appear with a lot of structure. This poses the question on whether one can set up some kind of “\(\operatorname{GL}\) invariant integration”, where, instead of the classical Riemann integration that uses increments, one “integrates” using only these \(P_{i,j}\).
Example 26
For \(d=2\), \(n=5\) we get the subsequences
For \(d=4\), \(n=7\) we get the subsequences
For \(d=5\), \(n=8\) we get the subsequences
Proof of Lemma 3.23
The case \(d=2\)
Let \(X\) be the curve through the points \(p_{0},p_{1},..,p_{n}\). We can write it as concatenation of the curves \(X^{(i)}\), where \(X^{(i)}\) is the curve through the points \(p_{0}\), \(p_{i}\), \(p_{i+1}\), \(p_{0}\). The time-interval of definition for these curves (and all curves in this proof) do not matter, so we omit the subscript of \(S(.)\). Then, by Chen’s lemma (Lemma 2.3)
For the last equality we used that
and that the increments of all curves \(X^{(i)}\) are zero. Now by Lemma 3.20 we can omit the last straight line in every \(X^{(i)}\) and hence by Lemma 3.21
which finishes the proof for \(d=2\).
Now assume the statement is true for all dimensions strictly smaller than some \(d\). We show it is true for \(d\). \(d\)is odd
As before we can assume \(p_{0} = 0\) and that \(p_{n}\) lies on the \(x_{1}\) axis. Every sequence summed over on the right-hand side of (5) is of the form \(i = (0,\ldots,n)\). For each of those, we calculate
Here \(\bar{p}_{j} \in \mathbb{R}^{d-1}\) is obtained by deleting the first coordinate of \(p_{j}\), \(e_{1}\) is the first canonical coordinate vector in \(\mathbb{R}^{d}\) and \(\Delta := (p_{0} - p_{n})_{1} = \langle S(X), x_{1} \rangle \) is the total increment of \(X\) in the \(x_{1}\) direction. Here we used that \(d\) is odd (otherwise we would get a prefactor −1).
The last determinant is the expression for the summands of the right-hand side of (5), but with dimension \(d-1\) and points \(0 = \bar{p}_{0}, \bar{p}_{1},.., \bar{p}_{n-1}\). By assumption, summing up all these determinants gives
where \(\bar{X}\) is the curve in \(\mathbb{R}^{d-1}\) through the points \(\bar{p}_{0},.., \bar{p}_{n-1}\). Since \(\bar{p}_{n} = \bar{p}_{0} = 0\), we can attach the additional point \(\bar{p}_{n}\) to \(\bar{X}\) without changing the value here (Lemma 3.20). Hence the sum of determinants is equal to
Since we arranged matters such that \(\langle S(X), x_{i} \rangle = 0\) for \(i\ne 1\), this is equal to
where we used the shuffle identity, Lemma 2.1. By the second part of Lemma 3.17 this is equal to \(\langle S(X), \operatorname{Inv}_{d} \rangle \), which finishes the proof for odd \(d\).
\(d\) is even
We proceed by induction on \(n\). For \(n=d\) the statement follows from Lemma 3.21.
Let it be true for some \(n\), we show it for a piecewise linear curve through some points \(p_{0},.., p_{n+1}\). Write \(X = X' \sqcup X''\) where \(X'\) is the linear interpolation of \(p_{0},.., p_{n}\), \(X''\) is the linear path from \(p_{n}\) to \(p_{n+1}\) and we recall concatenation ⊔ of paths from Lemma 2.3. By assumption, (5) is true for the curve \(X'\). Adding an additional point \(p_{n+1}\), the sum on the right hand side of (5) gets additional indices of the form
where
and where \(j_{1},..,j_{d-2}\) ranges over all possible increasing subsequences of \(1,2,..,n-1\) such that for \(\ell \) odd \(j_{\ell }+ 1 = j_{\ell +1}\).
Assume \(p_{n+1} - p_{n} = \Delta \cdot e_{1}\) lies on the \(x_{1}\)-axis. Then, summing over those \(j\),
Here \(\bar{X}'\) is the curve in \(\mathbb{R}^{d-1}\) through the points \(\bar{p}_{0},.., \bar{p}_{n}\), and we used the fact that the indices \(j\) here range over the ones used for (5) in dimension \(d-1\) on the points \(\bar{p}_{0},.., \bar{p}_{n}\). On the other hand,
Here we used that \(S(X'') = \exp ( \Delta \cdot x_{1} ) = 1 + \Delta \cdot x_{1} + O(x_{1}^{2})\) [16, Example 7.21], the fact that each monomial in \(\operatorname{Inv}_{d}\) has exactly one occurrence of \(x_{1}\) and Lemma 3.17. This finishes the proof. □
Definition 27
Let \(X: [0,T] \to \mathbb{R}^{d}\) be any curve. Define its signed volume to be the following limit, if it exists,
Here \(\pi = (0=t^{\pi }_{0},.., t^{\pi }_{n^{\pi }}=T)\) is a partition of the interval \([0,T]\) and \(|\pi |\) denotes its mesh size. The indices \(i\) are chosen as in Lemma 3.23.
Theorem 28
Let\(X: [0,T] \to \mathbb{R}^{d}\)a continuous curve of bounded variation. Then its signed volume exists and
Proof
Fix some sequence \(\{\pi ^{n}\}_{n\in \mathbb{N}}\), of partitions of \([0,T]\) with \(|\pi ^{n}| \to 0\) and interpolate \(X\) linearly along each \(\pi ^{n}\) to obtain a sequence of linearly interpolated curves \(X^{n}\). Then by Lemma 3.23
By stability of the signature in the class of continuous curves of bounded variation [16, Proposition 1.28, Proposition 2.7], we get convergence
and this is independent of the particular sequence \(\pi ^{n}\) chosen. □
The previous theorem is almost a tautology, but there are relations to classical objects in geometry. For \(d=2\), as we have seen in Sect. 3.2,
is equal to the signed area of the curve \(X\). In general dimension, the value of the invariant is related to some kind of classical “volume” if the curve satisfies some kind of monotonicity. This is in particular satisfied for the “moment curve”.
Lemma 29
Let \(X\) be the moment curve
Then for any \(T > 0\)
Remark 30
It is easily verified that for integers \(n_{1} .. n_{d}\) one has
We deduce that
In [24, Sect. 15], the value of this volume is determined, for \(T=1\), as
We hence get the combinatorial identity
Proof
For \(n\ge d\) let \(0 = t_{0} < .. < t_{n} \le T\) be time-points, let \(p_{i} := X_{t_{i}}\) be the corresponding points on the moment curve and denote by \(X^{n}\) the piecewise linear curve through those points. We will show
First note that for any \(1 \le i_{0} < i_{1} < .. \le i_{d}\le n\),
since it is a Vandermonde determinant.
We will decompose \(P := \{p_{0},..,p_{n}\}\) into (overlapping) sets \(S_{\ell }\) with cardinality \(d+1\) and such thatFootnote 10
A face of \(P\) is a subset \(F \subset P\) such that its convex hull \(\operatorname{Convex-Hull}( F )\) equals the intersection of \(\operatorname{Convex-Hull}(P)\) with some affine hyperspace. A face is a facet, if its affine span has dimension \(d-1\). The following is a fact that is true for any polytope spanned by some points \(P\): up to a set of measure zero, for every point \(x\) in \(\operatorname{Convex-Hull}( P )\), the line connecting \(p_{0}\) to \(x\) exits \(\operatorname{Convex-Hull}(p_{0},..,p_{n})\) through a unique facet of \(\operatorname{Convex-Hull}( p_{0},..,p_{n} )\) contained in \(\{ p_{1},.., p_{n} \}\). Hence
where the sum is over all such facets.
Our points \(p_{i}\) lie on the moment curve. Then, by (6), any collection of points \(p_{i_{0}}, p _{i_{1}},.., p_{i_{d}}\) is in general position. This means that every facet of \(P\) must have exactly \(d\) points (and not more). Facets of \(\operatorname{Convex-Hull}(P)\) with \(d\) points are characterized by Gale’s criterion ([17, Theorem 3], [53, Theorem 0.7]):
the points \(p_{i_{1}},.., p_{i_{d}}\), with distinct \(i_{j} \in \{0,..,n \}\) form a facet of \(P\) if and only if any two elements of \(\{0,..,n \} \setminus \{i_{1},.., i_{d}\}\) are separated by an even number of elements in \(\{i_{1},.., i_{d}\}\).Footnote 11
\(d\) odd
We are looking for such \(\{i_{j}\}\) such that \(i_{1} \ge 1\). Those are exactly the indices with
-
\(i_{\ell +1} = i_{\ell }+ 1\) for \(\ell \) odd
-
\(i_{d}= n\).
Together with \(i_{0} := 0\) these form the indices of Lemma 3.23.
\(d\) even
We are looking for such \(\{i_{j}\}\) such that \(i_{1} \ge 1\). Those are exactly the indices with
-
\(i_{\ell +1} = i_{\ell }+ 1\) for \(\ell \) odd.
Together with \(i_{0} := 0\) these form the indices of Lemma 3.23.
Hence
Now by Lemma 3.22
The determinant is in fact positive here, by (6). We can hence omit the modulus and get
by Lemma 3.23.
The statement of the lemma now follows by piecewise linear approximation of \(X\) using continuity of the convex hull, which follows from [11, Lemma 3.2], and of iterated integrals [16, Proposition 1.28, Proposition 2.7]. □
4 Rotations
Let
be the group of rotations of \(\mathbb{R}^{d}\).
Definition 1
We call \(\phi \in T(\mathbb{R}^{d})\) an \(\operatorname{SO}\)invariant if
for all \(A \in \operatorname{SO}(\mathbb{R}^{d})\) and all curves \(X\). Alternatively, as explained in Sect. 3,
for all \(A \in \operatorname{SO}(\mathbb{R}^{d})\), where the action on \(T(\mathbb{R}^{d})\) was given in Definition 3.2.
Since \(\det (A) = 1\), any \(\operatorname{GL}\) invariant of weight \(w \ge 1\) (Sect. 3) is automatically an \(\operatorname{SO}\) invariant. But there are \(\operatorname{SO}\) invariants that are not \(\operatorname{GL}\) invariants (of any weight), for example, for \(d=2\), \(\phi := x_{1} x_{1} + x_{2} x_{2}\).
Switching to the perspective of multilinear maps, this is the map \((v_{1},v_{2}) \mapsto \langle v_{1}, v_{2} \rangle \). It is shown, see for example [50, Theorem 2.9.A], that all invariants are built from the inner product and the determinant.
Recently, a linear basis for these invariants has been constructed. To formulate the result, we need to introduce some notation from [28]. Define
Use the following partial order on these sequences: for \(a \in I(r,n)\), \(a' \in I(r',n)\)
if \(r \le r'\) and \(a_{j} \ge a'_{j}\) for \(j \le r\).
For \(c \in I(d,n)\) and \(v_{1},.., v_{n} \in \mathbb{R}^{d}\), define
For \(a,b \in I(r,n) \times I(r,n)\) with \(r \le d\) and \(v_{1},..,v_{n} \in \mathbb{R}^{d}\), define
Theorem 2
([28, Theorem 12.5.0.8])
Let\(V\)be a\(d\)-dimensional vector space with inner product\(\langle \cdot , \cdot \rangle \). A basis for the space of multilinear maps
that satisfy
for all\(A \in \operatorname{SO}(V)\)and\(v_{1}, \dots , v_{n} \in V\)is given by the maps
satisfying
-
\(c^{(j)} \in I(d,n)\) for each \(j=1,..,s\)
-
\(a^{(j)},b^{(j)} \in I(t_{j},n)\) for some \(1 \le t_{j} \le d-1\) for each \(j=1,..,r\)
-
\(a^{(1)} \ge b^{(1)} \ge a^{(2)} \ge .. \ge b^{(r)} \ge c^{(1)} \ge .. \ge c^{(s)}\)
-
every number\(1,..,n\)appears in exactly one of the sequences\(a^{(1)},.., a^{(r)}, b^{(1)},.., b^{(r)}, c^{(1)},.., c^{(s)}\); (in particular\(n = 2 \cdot C_{1} + d\cdot C_{2}\)for some\(C_{1}\), \(C_{2}\)non-negative integers)
Example 3
We give examples of these sequences for \(d=2\).
\(n=1\): There is no such set of sequences, since non-negative integers \(C_{1}\), \(C_{2}\) with \(2\cdot C_{1} + 2 \cdot C_{2} = 1\) cannot be found.
\(n=2\): Allowed sets of sequences are
-
\(c^{(1)} = (1,2)\); meaning that \(F(v_{1},v_{2}) = \langle v_{1}, v _{2} \rangle \)
-
\(a^{(1)} = (2)\), \(b^{(1)} = (1)\); meaning that \(F(v_{1},v_{2}) = \det [ v_{1} v_{2} ]\)
\(n=3\): There is no such set of sequences.
\(n=4\): Allowed sets of sequences are
-
\(a^{(1)} = (4)\), \(b^{(1)} = (3)\), \(a^{(2)} = (2)\), \(b^{(2)} = (1)\); meaning that \(F(v_{1},v_{2},v_{3},v_{4}) = \langle v_{4}, v_{3} \rangle \langle v_{2}, v_{1} \rangle \).
-
\(a^{(1)} = (4)\), \(b^{(1)} = (3)\), \(c^{(1)} = (1,2)\); meaning that \(F(v_{1},v_{2},v_{3},v_{4}) = \langle v_{4}, v_{3} \rangle \det [ v _{1} v_{2} ]\).
-
\(a^{(1)} = (4)\), \(b^{(1)} = (2)\), \(c^{(1)} = (1,3)\)
-
\(a^{(1)} = (3)\), \(b^{(1)} = (2)\), \(c^{(1)} = (1,4)\)
-
\(c^{(1)} = (3,4)\), \(c^{(2)} = (1,2)\)
-
\(c^{(1)} = (2,4)\), \(c^{(2)} = (1,3)\)
In the setting of \(T(\mathbb{R}^{d})\) we have
Proposition 4
The\(\operatorname{SO}\)invariants of homogeneity\(n\)are spanned by
where\(\varPsi \)ranges over the invariants of the previous theorem and\(\mathsf{poly}\)is given in Lemma 2.4.
In the case \(d=2\), there is another way to arrive at a basis for the invariants. Taking inspiration from [15], which concerns rotation invariants of images, we work in the complex vector space \(T(\mathbb{C}^{2})\).Footnote 12
Theorem 5
Define
The space of \(\operatorname{SO}\) invariants on level \(n\) in \(T(\mathbb{C}^{2})\) is spanned freely by
The space of \(\operatorname{SO}\) invariants on level \(n\) in \(T(\mathbb{R}^{2})\) is spanned freely by
Remark 6
In particular for \(d=2\) and \(n\) even, the dimension of rotation invariants on level \(n\) in \(T(\mathbb{R}^{2})\) is equal to \(\binom{n}{n/2}\).
Proof
1. The elements\(z\)are invariant
Let
Then (recall Definition 3.2)
Hence
2. The elements\(z\)form a basis
Now \(x_{j_{1}} .. x_{j_{n}}: j_{\ell }\in \{1,2\}\) is a basis of \(\pi _{n} T(\mathbb{C}^{2})\) with respect to ℂ. Hence \(z_{j_{1}} .. z_{j_{n}}\) is (the map \((x_{1},x_{2}) \mapsto (z_{1},z _{2})\) is invertible). By Step 1 we have hence exhibited a basis (with respect to ℂ) for all invariants in \(\pi _{n} T(\mathbb{C} ^{2})\).
3. Real invariants
The space of \(\operatorname{SO}\) invariants on level \(n\) in \(T(\mathbb{C}^{2})\) is spanned freely by the set of
Adding and subtracting the elements with \(j_{1}=2\) from the elements with \(j_{1}=1\), we get that the space of \(\operatorname{SO}\) invariants on level \(n\) in \(T(\mathbb{C}^{2})\) is spanned freely by the set of
Because \(z_{3-j_{1}} \cdot .. \cdot z_{3-j_{n}}\) is the complex conjugate of \(z_{j_{1}} \cdot .. \cdot z_{j_{n}}\), this means that the space of \(\operatorname{SO}\) invariants on level \(n\) in \(T(\mathbb{C} ^{2})\) is spanned freely by the set of
This is an expression for a basis of the \(\operatorname{SO}\) invariants in terms of real combinations of basis elements of the tensor space. They thus form a basis for the \(\operatorname{SO}\) invariants for the free real vector space on the same set, namely \(\pi _{n} T( \mathbb{R}^{2})\). □
Example 7
Consider \(d=2\), level \(n=2\)
Level \(n=4\)
Consider \(d=3\), level \(n=2\)
Level \(n=3\).
Consider \(d=4\), level \(n=2\)
Level \(n=4\)
5 Permutations
Denote by \(S_{d}\) the group of permutations of \([d] := \{1,.., d\}\).
Lemma 1
For\(\sigma \in S_{d}\), define\(M(\sigma ) \in \operatorname{GL}( \mathbb{R}^{d})\)as
Then\(M: S_{d}\to \operatorname{GL}(\mathbb{R}^{d})\)is a group homomorphism and moreover\(M(\sigma ^{-1}) = M(\sigma )^{\top }\).Footnote 13
Proof
Regarding the first point, for \(i=\{1,..,d\}\),
Regarding the last point, note the following sequence of equivalences.
This proves the claim. □
\(S_{d}\) then acts on \(T((\mathbb{R}^{d}))\) and \(T(\mathbb{R}^{d})\) via Definition 3.2. Explicitly,
Definition 2
We call \(\phi \in T(\mathbb{R}^{d})\) a permutation invariant if
for all \(\sigma \in S_{d}\) and all curves \(X\). Alternatively, as explained in Sect. 3,
for all \(\sigma \in S_{d}\). Equivalently,
for all \(\sigma \in S_{d}\).
We follow [1, Sect. 3]. To a monomial
we associate the following set partition of \([n] := \{1,.., n \}\)
Example 3
Let \(d=3\), then
Note that for every permutation \(\sigma \in S_{d}\),
Proposition 4
([1, Sect. 3])
Define
Then\(\{ M_{A} : A~\textit{is set partition of}~[n]~\textit{and}~|A| \le d\}\)is a linear basis for the space of permutation invariants of homogeneity\(n\).
Remark 5
The generating function for partitions with at most \(d\) blocks is given by
This follows from summing up [45, (1.94c)].
For example \(d=2\),
which is the generating function of the sequence (https://oeis.org/A011782)
For \(d=3\) one gets, the generating function
which is the generating function of the sequence (https://oeis.org/A124302)
We are not aware of a general explicit formula for the number of partitions (i.e. the coefficients of the generating function).
Proof of Proposition 5.4
By (7), each \(M_{A}\) is permutation invariant. Moreover, since \(|A| \le d\), \(M_{A}\) is nonzero.
For \(A\), \(A'\) distinct set partitions of \([n]\), the monomials in \(M_{A}\) and the monomials in \(M_{A'}\) do not overlap. Hence the proposed basis is linearly independent.
Now, if \(\phi \) is permutation invariant and if for some \(i\), \(i'\), \(\nabla ( x_{i_{1}} .. x_{i_{n}} ) = \nabla ( x_{i'_{1}} .. x_{i'_{n}} )\) then the coefficient of \(x_{i}\) and \(x_{i'}\) must coincide. Hence the proposed basis spans invariants of homogeneity \(n\). □
Example 6
Consider \(d=3\)
Order \(n=1\)
Order \(n=2\)
Order \(n=3\)
6 An Additional (Time) Coordinate
Assume now that \(X = (X^{0},X^{1},..,X^{d}): [0,T] \to \mathbb{R}^{1+d}\). Here \(X^{0}\) plays a special role, in that we assume that it is not affected by the space transformations under consideration.
Adding an “artificial” 0-th component, usually keeping track of time, \(X^{0}_{t} := t\), is a common trick to improve the expressiveness of the signature. In particular, if such an \(X^{0}\) is monotonically increasing, the enlarged curve \((X^{0},X^{1},..,X^{d})\) never has any “tree-like” components (compare Sect. 7), no matter what the original \((X^{1},..,X^{d})\) was.
Consider \(\operatorname{GL}\) invariants for the moment.
Definition 1
Let
the space of invertible maps of \(\mathbb{R}^{1+d}\) leaving the first direction unchanged. We call \(\phi \in T(\mathbb{R}^{1+d})\) a \(\operatorname{GL}_{0}\)invariant of weight\(w\) if
for all \(A \in \operatorname{GL}_{0}(\mathbb{R}^{d})\).
Consider the \(\operatorname{GL}(\mathbb{R}^{2})\) invariant of weight 1
Since elements of \(\operatorname{GL}_{0}(\mathbb{R}^{2})\) leave the variable \(x_{0}\) unchanged, a straightforward way to produce \(\operatorname{GL}_{0}\) invariants presents itself: insert \(x_{0}\) at the same position in every monomial. For example
is a \(\operatorname{GL}_{0}(\mathbb{R}^{2})\) invariant of weight 1. We now formalize this idea and show that we get every \(\operatorname{GL}_{0}\) invariant this way.
Define the linear map \(\mathsf{Remove}\) of “removing instances of \(x_{0}\)” on monomials, as
so for example
Define for \(U \subset [m]\) and \(i = (i_{1},.., i_{m})\)
Define the linear map of restriction to \(U\) on polynomials of order \(m\) by defining on monomials
so for example
For \(z = (z_{1},..,z_{m+1}) \in \mathbb{N}^{m+1}\) denote by \(\mathsf{Insert}_{z}\) the linear operator on polynomials of order \(m\) by defining it on monomials as follows. For a monomial \(x_{i_{1}} .. x_{i_{m}}\) of order \(m\), \(\mathsf{Insert}_{z}\) inserts \(z_{1}\) occurrences of \(x_{0}\) before \(x_{i_{1}}\), \(z_{2}\) occurrences of \(x_{0}\) before \(x_{i_{2}},.., z_{m}\) occurrences of \(x_{0}\) before \(x_{i_{m}}\) and \(z_{m+1}\) occurrences of \(x_{0}\) after \(x_{i_{m}}\). For example
Theorem 2
A basis for the space of\(\operatorname{GL}_{0}\)invariants of weight\(w\), homogeneous of degree\(m\), is given by the polynomials
with\(0 \le n \le m\), \(\psi \)ranging over the basis for\(\operatorname{GL}\)invariant of weight\(w\)and homogeneity\(n\) (Proposition3.11) and\(z \in \mathbb{N}^{n+1}\)such that\(\sum_{\ell }z_{\ell }= m - n\).
Proof
Let \(n\), \(\psi \), \(z\) be as in the statement. Then, for \(A_{0} = \operatorname{diag}(1, A) \in \operatorname{GL}_{0}(\mathbb{R}^{d})\), with \(A \in \operatorname{GL}(\mathbb{R}^{d})\),
Therefore \(\mathsf{Insert}_{z} \psi \) is \(\operatorname{GL}_{0}\) invariant of weight \(w\).
On the other hand, let \(\phi \) of order \(m\) be a \(\operatorname{GL} _{0}\) invariant modulo time of weight \(w\). Define for \(U \subset [m]\)
which collects all monomials having \(x_{0}\) exactly at the positions in \(U\). Then
Now, since \(\phi \) is \(\operatorname{GL}_{0}\) invariant of weight \(w\) and since \(\operatorname{GL}_{0}\) leaves
invariant, we get that \(\phi ^{U}\) is \(\operatorname{GL}_{0}\) invariant of weight \(w\). Clearly, there is \(0 \le n \le m\) and \(z \in \mathbb{N}^{n+1}\) such that
Lastly, \(\mathsf{Remove}\,\phi ^{U}\) is \(\operatorname{GL}\) invariant, since for \(A_{0} = \operatorname{diag}(1, A) \in \operatorname{GL} _{0}(\mathbb{R}^{d})\), with \(A \in \operatorname{GL}(\mathbb{R}^{d})\),
Hence every invariant is in the span of the set given in the statement. They are linearly independent, and hence form a basis. □
The corresponding statements for rotations and permutations are completely analogous, so we omit them.
7 Discussion and Open Problems
We have presented a novel way to extract invariant features of \(d\)-dimensional curves, based on the iterated-integral signature. We have identified all those features that can be written as a finite linear combination of terms in the signature.
Among the techniques used previously for finding invariants of curves, the method of “integral invariants” [13] is closest to ours (it has been used for example in [19] for character recognition). In that work, for a curve \(X: [0,T] \to \mathbb{R}^{d}\), \(d=2,3\), the building blocks for invariants are expressions of the form
Using an algorithmic procedure, some invariants to certain subgroups of \(G \subset \operatorname{GL}(\mathbb{R}^{d})\) are derived. In particular for \(d=2\) and \(G=\operatorname{GL}(\mathbb{R}^{d})\) the following invariants are given
By the shuffle identity (Lemma 2.1), we can write these as \(I_{i} = \langle S(X)_{0,T}, \phi _{i} \rangle \), with
One can easily check that these lie in the linear span of the invariants given in Proposition 4.4 (or Theorem 4.5), as expected.
We note that expressions of the form (8) are not enough to uniquely characterize a path. Indeed, the following lemma gives a counterexample to the conjecture on p. 906 in [13] that “signatures of non-equivalent curves are different” (here, the “signature” of a curve means the set of expressions of the form (8)).
Lemma 1
Consider the two closed curves\(X^{+}\)and\(X^{-}\)in\(\mathbb{R}^{2}\), given for\(t\)in\([0,2\pi ]\)as
Then all the expressions (8) coincide on\(X^{+}\)and\(X^{-}\).Footnote 14
These curves both trace a figure called the lemniscate of Gerono which is illustrated in Fig. 2.
Proof
Consider the function \(f^{m}_{n}(t):=\cos ^{m} t\sin ^{n} t\), where \(m\) and \(n\) are nonnegative integers. If \(n\) is odd, then \(f^{m}_{n}(t)=-f ^{m}_{n}(2\pi - t)\) so \(\int _{0}^{2\pi }f^{m}_{n}(t)\,dt\) is zero. If \(m\) is odd, then
Thus \(\int _{0}^{2\pi }f^{m}_{n}(t)\,dt\) can only be nonzero if \(m\) and \(n\) are both even.
Any expression like (8) is either of the form
or of the form
Both these expressions are free from the symbols ± and ∓. Therefore these two curves have the same values on terms of the form (8). □
Moreover, the algorithmic nature of the construction in [13] makes it difficult to proceed to invariants of higher order. In contrast, our method gives an explicit linear basis for the invariants under consideration up to any order.
Regarding the question of whether our invariants are complete we propose the following conjecture. As shown in [22], if \(S(X)_{0,T} = S(Y)_{0,T}\) for some curves \(X\) and \(Y\), then \(X\) is “tree-like equivalent” to \(Y\). For the concrete definition of this equivalence we refer to their paper, but let us give one example. Consider in \(d=2\), the constant path \(X_{t} := (0,0)\), \(t \in [0,T]\) and the piecewise linear path \(Y\), between the points \((0,0)\), \((1,0)\) and \((0,0)\). One can check that
The signature has no chance of picking up these kind of “excursions” in a path; this concept is formalized in “tree-like equivalence”. We suspect that the following holds true (with corresponding formulations for the other subgroups of \(\operatorname{GL}(\mathbb{R}^{d})\)).
Conjecture 2
Let\(X, Y: [0,T] \to \mathbb{R}^{d}\)be two curves such that
for all\(\operatorname{SO}\)invariants given in Proposition 4.4. Then there is a curve\(\bar{X}\), tree-like equivalent to\(X\), and a rotation\(A \in \operatorname{SO}(\mathbb{R}^{d})\), such that
In Proposition 3.11, Proposition 4.4 and Proposition 5.4 we have established a linear basis for invariants for every homogeneity. As already mentioned in Remark 3.15, owing to the shuffle identity, there are algebraic relations between elements of different homogeneity. An interesting open problem is then to find a minimal set of generators for the set of invariants, considered as a subalgebra of the shuffle algebra. This applies to all subgroups of \(\operatorname{GL}( \mathbb{R}^{d})\) and their corresponding invariants.
Lastly, a word on (computational) complexity. We have seen in Remark 3.10 the dimensions of \(\operatorname{GL}\) invariant elements (which is a lower bound on the dimensions of \(\operatorname{SO}\) invariant elements).Footnote 15 In Remark 5.5 we have seen the dimensions for the permutation invariant elements.
Computing the signature itself up to level \(n\) has complexity \(\varOmega (d^{n})\), since \(d+ .. + d^{n}\) integrals need to be calculated. So any method that calculates the invariant features of a curve \(X\) by first calculating its signature and extracting them (see Remark 3.14) will have computational complexity dominated by the calculation of the signature. Furthermore, the calculation of the invariant elements is a computation that can be done offline (they do not depend on the curve \(X\)).
This leaves several directions of future research.
-
Is it possible to apply kernelization techniques similar to the ones used for the entire (non-invariant) signature in [25]? These techniques, in the non-invariant setting, allow to use information of the signature up to high levels and dimension for certain learning algorithms.
-
We have studied in this paper linear expressions on the signature that are invariant to a group action. This was justified by using the shuffle identity (Lemma 2.1), which tells us that any polynomial functional on the signature can in fact be linearized. One can also consider a fixed level \(n\) of the signature and look for all nonlinear expressions that are invariant under the group action. This is the classical problem of invariant theory for polynomial rings [47, Sect. 4]. On the one hand, this makes it possible to “peek ahead” in the signature, since one gets invariant information that would only be seen in linear expressions of higher levels than \(n\). On the other hand, except for special cases, there are no explicit expressions for these nonlinear invariant. One has to proceed algorithmically (for example via Derksen’s algorithm, [8]) which only works for low dimension \(d\) and low levels \(n\). Since the calculation of those nonlinear invariant elements can also be done offline it would nonetheless be nice to have a tabulation of nonlinear invariants (as far as existing algorithms can reach).
-
For \(\operatorname{GL}\) invariants, in Remark 3.25 we conjecture the existence of a “\(\operatorname{GL}\) invariant” signature. This could improve computation time, since no non-invariant integrals have to be computed.
Notes
The signature is notorious for being hard to interpret in geometric terms.
The reader might prefer to just think of a (piecewise) smooth curve.
Since \(X\) is of bounded variation the integrals are well-defined using classical Riemann-Stieltjes integration (see for example Chap. 6 in [43]). This generalizes the notation in the introduction above beyond smooth curves. It can be pushed much further though. In fact the following considerations are purely algebraic and hence hold for any curve for which a sensible integration theory (in particular: obeying integration by parts) exists. A relevant example is Brownian motion which, although being almost surely nowhere differentiable, nonetheless admits a stochastic (Stratonovich) integral.
Also called the “rough path signature”.
In contrast to a power series, a polynomial only has finitely many terms.
One can also think of a tabloid as the following element of the vector space spanned by Young tableaux,
$$\begin{aligned} \{ t \} = \sum_{\pi } \pi t. \end{aligned}$$Here the sum is over all permutations \(\pi \) that leave the elements of each row of \(t\) unchanged.
The prefactor \(1/2\) is irrelevant, so we will speak of \(\phi \) and also of \(\tfrac{1}{2} \phi \) as picking out the area.
The standard example is a time series that is discretely observed at times \(t_{i}\) and linearly interpolated in between.
Note the nomenclature used in signal analysis. A probabilist or statistician would tend to call this a “covariance” and not a “correlation”.
For example, with \(n=4\) and dimension \(d=3\), the indices \(\{0,1,2\}\), \(\{0,2,3\}\), \(\{0,3,4 \}\), \(\{0,1,4\}\), \(\{1,2,4\}\), \(\{2,3,4\}\) lead to the facets, which in this dimension are triangles.
One may think of this as the space of noncommuting polynomials in \(x_{1}\), \(x_{2}\) with complex coefficients, or, equivalently, as the complexification [42, Chap. 1] of the vector space \(T(\mathbb{R}^{2})\). An element \(A \in \operatorname{GL}(\mathbb{R}^{2})\) then acts on \(T(\mathbb{C} ^{2})\) by the prescription in Definition 3.2. More abstractly: this is the diagonal action of the complexification of \(A\) on \(T(\mathbb{C}^{2})\).
\(M\) is sometimes called the defining representation of \(S_{d}\).
Note that \(X^{+}\) and \(X^{-}\) are not tree-like equivalent and therefore have different (iterated-integral) signatures. The lowest level on which they differ is level 4.
In this section we take “invariant element” to mean the elements of the space \(T(\mathbb{R}^{d})\), incarnations of which can be seen in Sect. 3.1. They do not depend on any curve \(X\) one might be interested in.
References
Bergeron, N., Reutenauer, C., Rosas, M., Zabrocki, M.: Invariants and coinvariants of the symmetric group in noncommuting variables. Can. J. Math. 60(2), 266–296 (2008)
Calabi, E., Olver, P.J., Shakiban, C., Tannenbaum, A., Haker, S.: Differential and numerically invariant signature curves applied to object recognition. Int. J. Comput. Vis. 26(2), 107–135 (1998)
Cass, T., Friz, P.: Densities for rough differential equations under Hörmander’s condition. Ann. Math. 171(3), 2115–2141 (2010)
Chen, K.-T.: Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula. Ann. Math. 65(1), 163–178 (1957)
Chevyrev, I., Kormilitzin, A.: A primer on the signature method in machine learning. arXiv:1603.03788 (2016)
Chuang, G.C.H., Kuo, C.-C.J.: Wavelet descriptor of planar curves: theory and applications. IEEE Trans. Image Process. 5(1), 56–70 (1996)
De Bruijn, N.G.: On some multiple integrals involving determinants. J. Indian Math. Soc. 19, 133–151 (1955)
Derksen, H.: Computation of invariants for reductive groups. Adv. Math. 141(2), 366–384 (1999)
Diehl, J.: signature-invariants-py, GitHub repository. https://github.com/diehlj/signature-invariants-py
Dieudonné, J.A., Carrell, J.B.: Invariant theory, old and new. Adv. Math. 4(1), 1–80 (1970)
Engström, A., Patrik, N.: Polytopes from subgraph statistics. In: 23rd International Conference on Formal Power Series and Algebraic Combinatorics. Discrete Mathematics & Theoretical Computer Science, pp. 305–316 (2011)
Ewald, A., Marzetti, L., Zappasodi, F., Meinecke, F.C., Nolte, G.: Estimating true brain connectivity from EEG/MEG data invariant to linear and static transformations in sensor space. NeuroImage 60(1), 476–488 (2012)
Feng, S., Kogan, I., Krim, H.: Classification of curves in 2D and 3D via affine integral signatures. Acta Appl. Math. 109(3), 903–937 (2010)
Fillmore, J.P., Williamson, S.G.: Permanents and determinants with generic noncommuting entries. Linear Multilinear Algebra 19(4), 321–334 (1986)
Flusser, J.: On the independence of rotation moment invariants. Pattern Recognit. 33(9), 1405–1410 (2000)
Friz, P., Victoir, N.: Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics, vol. 120. Cambridge University Press, Cambridge (2010)
Gale, D.: Neighborly and cyclic polytopes. In: Convexity. Sympos. Pure Math., vol. 7, pp. 225–233. AMS, Providence (1963)
Gardner, R.B.: The fundamental theorem of vector relative invariants. J. Algebra 36(2), 314–318 (1975)
Golubitsky, O., Vadim, M., Watt, S.M.: Orientation-independent recognition of handwritten characters with integral invariants. In: Proc. Joint Conf. ASCM (2009)
Graham, B.: Sparse arrays of signatures for online character recognition. arXiv:1308.0371 (2013)
Granlund, G.H.: Fourier preprocessing for hand print character recognition. IEEE Trans. Comput. 100(2), 195–201 (1972)
Hambly, B., Lyons, T.: Uniqueness for the signature of a path of bounded variation and the reduced path group. Ann. Math. 171(1), 109–167 (2010)
Johnson, H.H.: A generalization of KT Chen’s invariants for paths under transformation groups. Trans. Am. Math. Soc. 105(3), 453–461 (1962)
Karlin, S., Shapley, L.S.: Geometry of Moment Spaces. Memoirs of the American Mathematical Society, vol. 12. AMS, Providence (1953)
Király, F.J., Oberhauser, H.: Kernels for sequentially ordered data. arXiv:1601.08169 (2016)
Kormilitzin, A., Saunders, K.E.A., Harrison, P.J., Geddes, J.R., Lyons, T.: Detecting early signs of depressive and manic episodes in patients with bipolar disorder using the signature-based model. arXiv:1708.01206 (2017)
Kuhl, F.P., Giardina, C.R.: Elliptic Fourier features of a closed contour. Comput. Graph. Image Process. 18(3), 236–258 (1982)
Lakshmibai, V., Komaranapuram, N.R.: Standard Monomial Theory: Invariant Theoretic Approach. Encyclopaedia of Mathematical Sciences, vol. 137. Springer, Berlin (2007)
Landsberg, J.M.: Tensors: geometry and applications. Represent. Theory 381, 402 (2012)
Leclerc, B.: On identities satisfied by minors of a matrix. Adv. Math. 100(1), 101–132 (1993)
Lee, C.: Regular triangulations of convex polytopes. In: Gritzmann, P., Sturmfels, B. (eds.) Applied Geometry and Discrete Mathematics: The Victor Klee Festschrift, pp. 443–456. American Mathematical Society, Providence (1991)
Levin, D., Lyons, T., Ni, H.: Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv:1309.0260 (2013)
Luque, J.-G., Thibon, J.-Y.: Pfaffian and Hafnian identities in shuffle algebras. Adv. Appl. Math. 29(4), 620–646 (2002)
Lyons, T.J., Phillip, S.Y.: On Gauss-Green theorem and boundaries of a class of Hölder domains. J. Math. Pures Appl. 85(1), 38–53 (2006)
Manay, S., Cremers, D., Hong, B.W., Yezzi, A.J., Soatto, S.: Integral invariants for shape matching. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1602–1618 (2006)
Mokhtarian, F., Mackworth, A.: Scale-based description and recognition of planar curves and two-dimensional shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1, 34–43 (1986)
Morales, J., Akopian, D.: Physical activity recognition by smartphones, a survey. Biocybern. Biomed. Eng. 37(3), 388–400 (2017)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes: The Art of Scientific Computing, 3rd edn. Cambridge University Press, Cambridge (2007)
Ree, R.: Lie elements and an algebra associated with shuffles. Ann. Math. 68(2), 210–220 (1958)
Reizenstein, J.: Calculation of iterated-integral signatures and log signatures. arXiv:1712.02757 (2017)
Reutenauer, C.: Free Lie Algebras. London Mathematical Society Monographs. New Series, vol. 7. Clarendon, Oxford (1993)
Roman, S.: Advanced Linear Algebra, 3rd edn. Graduate Texts in Mathematics, vol. 135. Springer, New York (2005)
Rudin, W.: Principles of Mathematical Analysis, 3rd edn. International Series in Pure & Applied Mathematics. McGraw-Hill, New York (1964)
Sagan, B.: The Symmetric Group: Representations, Combinatorial Algorithms, and Symmetric Functions. Graduate Texts in Mathematics, vol. 203. Springer, Berlin (2013)
Stanley, R.P.: Enumerative Combinatorics, Volume 1, 2nd edn. Cambridge Studies in Advanced Mathematics, vol. 49. Cambridge University Press, Cambridge (2011)
Sturmfels, B.: Gröbner Bases and Convex Polytopes. University Lecture Series, vol. 8. AMS, Providence (1996)
Sturmfels, B.: Algorithms in Invariant Theory. Springer, Berlin (2008)
Sturmfels, B., White, N.: Gröbner bases and invariant theory. Adv. Math. 76(2), 245–259 (1989)
Toth, C.D., O’Rourke, J., Goodman, J.E. (eds.): Handbook of Discrete and Computational Geometry. CRC Press, Boca Raton (2004)
Weyl, H.: The Classical Groups, Their Invariants and Representations. Princeton University Press, Princeton (1946)
Yang, W., Lyons, T., Ni, H., Schmid, C., Jin, L., Chang, J.: Leveraging the path signature for skeleton-based human action recognition. arXiv:1707.03993 (2017)
Zahn, C.T., Roskies, R.Z.: Fourier descriptors for plane closed curves. IEEE Trans. Comput. 100(3), 269–281 (1972)
Ziegler, G.: Lectures on Polytopes. Graduate Texts in Mathematics, vol. 152. Springer, Berlin (2012)
Acknowledgements
The authors thank Bernd Sturmfels for discussion on the topics of this paper. Open access funding provided by Max Planck Society.
Author information
Authors and Affiliations
Corresponding author
Additional information
J. Reizenstein was supported by the Engineering and Physical Sciences Research Council.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Diehl, J., Reizenstein, J. Invariants of Multidimensional Time Series Based on Their Iterated-Integral Signature. Acta Appl Math 164, 83–122 (2019). https://doi.org/10.1007/s10440-018-00227-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10440-018-00227-z