1 Introduction

Groups of organisms competing and cooperating in nature are assumed to behave as complex and nonlinear dynamical systems, which currently elude formulation [7, 9]. Understanding the complex dynamics of living organisms or artificial agents (and the component parts) is a challenging research area in biology [5], physics [7], and machine learning. In the field of physics, decomposition or spectral methods that factorize the dynamics into modes from the data are used such as proper orthogonal decomposition (POD) [1, 25] or dynamic mode decomposition (DMD) [23, 24]. The problem of learning dynamical systems in machine learning has been discussed such as in terms of Bayesian approaches [10] and predictive state representation [19]. This topic is closely related to the decomposition technique in physics, aiming to estimate a prediction model by examining the obtained modes.

In this paper, we consider the following discrete-time nonlinear dynamical system:

$$\begin{aligned} \varvec{x}_{t+1}=\varvec{f}\left( \varvec{x}_t\right) \end{aligned}$$
(1)

where \(\varvec{x}_{i}\) is a state vector on the state space \(\mathcal{M}\) (i.e., \(x\in \mathcal {M}\subset \mathbb {R}^{ d}\)) and \(\varvec{f}\) is a state transition function that assumes the dynamical system to be nonlinear. A recent development is the use of Koopman spectral analysis with reproducing kernels (called kernel DMD). This defines a mode that can yield direct information about the nonlinear latent dynamics [16]. However, to compare or classify these complex dynamics, it is necessary to incorporate their Koopman spectrum into a metric appropriate for representing the similarity between the nonlinear dynamical systems.

Several works have applied approximation with a low-dimensional linear subspace to represent this similarity [12, 30, 33]. One approach has used the Binet-Cauchy (Riemannian) distance with a variety of kernels on a Grassman manifold [12], such as the kernel principal angle [33], and the trace and determinant kernel [30], which were designed for application in face recognition [33] and movie clustering [30]. The algorithm essentially calculates the Binet-Cauchy distance between two subspaces in the feature space, defined by the product of the canonical correlations. However, the main applications assumed a linear dynamical model [12, 30, 33] and thus generalization to nonlinear dynamics without specifying an underlying model remains to be addressed. In this paper, we map the latent dynamics to the feature space using the kernels, allowing binary classification to be applied to real-world complex dynamical systems.

Organized human group tasks such as navigation [13] or ballgame teams [8] provide excellent examples of complex dynamics and pose challenges in machine learning because of their switching and overlapping hierarchical subsystems [8], characterized by recursive shared intentionality [28]. Measurement systems have been developed that capture information regarding the position of a player in a ballgame, allowing analysis of particular shots [11]; however, plays involving collaboration between several teammates have not yet been addressed. In games such as basketball or football, coaches analyze team formations and players repeatedly practice moves that increase the probability of scoring (“scorability”). However, the selection of tactics is an ill-posed problem, and thus basically requires the implicit experience-based knowledge of the coach. An algorithm is needed that clarifies scorable moves involving multiple players in the team.

Previous research has classified team moves on a global scale by directly applying machine learning methods derived mainly from natural language processing. These include recursive neural networks (RNN) using optical flow images of the trajectories of all players [31] or the application of latent Dirichlet allocation (LDA) to the arrangement of individual trajectories [22]. However, the contribution of team movement to the success of a play remains unclear. Previously, we reported that three maximum attacker-defender distances separately explained scorability [8], but the study addressed only the outcome of a play, rather than its time evolution and the interactions that it comprised. An algorithm is required that uses mapping to feature space to discriminate between successful and unsuccessful moves while accounting for these complex factors. In this paper, we map the latent dynamic characteristics of multiple attacker-defender distances [8] to the feature space using our kernels acquired by kernel DMD and then evaluated scorability.

The rest of the paper is organized as follows. Section 2 briefly reviews the background of Koopman spectral kernels, while Sect. 3 discusses methods for computing them. We then extended this to empirical example of actual human locomotion in Sect. 4. For application to multiple sporting agents, Sect. 5 reports our findings using the data on actual basketball games. Our approach proved capable of capturing complex team moves. Finally, Sect. 6 presents our discussion and conclusions.

2 Background

2.1 Koopman Spectral Analysis and Dynamic Mode Decomposition

Spectral analysis (or decomposition) for analyzing dynamical systems is a popular approach aimed at extracting low-dimensional dynamics from the data. Common techniques include global eigenmodes for linearized dynamics, discrete Fourier transforms, and POD for nonlinear dynamics [25], as well as multiple variants of these techniques. DMD has recently attracted particular attention in areas of physics such as fluid mechanics [23] and several engineering fields [2, 26] because of its ability to define a mode that can yield direct information even when applied to time series with nonlinear latent dynamics [23, 24]. However, the original DMD has numerical disadvantages, related to the accuracy of the approximate expressions of the Koopman eigenfunctions derived from the data. A number of variants have been proposed to address this shortcoming, including exact DMD [29], optimized DMD [4], and baysian DMD [27]. Sparsity-promoting DMD [14] provides a framework for the approximation of the Koopman eigenfunctions with fewer bases. Extended DMD [32], which works on predetermined kernel basis functions, has also been proposed. These Koopman spectral analyses have been generalized to a reproducing kernel Hilbert space (RKHS) [16], an approach which is called kernel DMD.

In Koopman spectral analysis, the Koopman operator \(\mathcal{K}\) [18] is an infinite dimensional linear operator acting on the scalar function \({g_{i}: \mathcal{M}\rightarrow {\mathbb {C}}}\). That is, it maps \({g_{i}}\) to the new function \({\mathcal{K}g_{i}}\) as follows:

$$\begin{aligned} \left( \mathcal{{K}}{g_i}\right) \left( \varvec{x}\right) =\left( {g_i}\circ \varvec{f}\right) \left( \varvec{x}\right) , \end{aligned}$$
(2)

where \(\mathcal{K}\) denotes the composition of \({g_{i}}\) with f. We can see that \(\mathcal{K}\) acts linearly on the function \({g_{i}}\). The dynamics defined by f may be nonlinear. Since \(\mathcal{K}\) is a linear operator, it can generally perform eigenvalue decomposition:

$$\begin{aligned} \mathcal {K}{\varphi }_{j}\left( \varvec{x}\right) ={\lambda }_{j}{\varphi }_{j}\left( \varvec{x}\right) , \end{aligned}$$
(3)

where \(\lambda _{j}\in \mathbb {C}\) is the jth eigenvalue (called the Koopman eigenvalue) and \(\varphi _{j}\) is the corresponding eigenfunction (called the Koopman eigenfunction). We denote the concatenation of \(g_{j}\) to \(\varvec{g}:= [g_{1},\ldots , g_{p}]^{\mathrm {T}}\). If each \(g_{j}\) lies within the space spanned by the eigenfunction \(\varphi _{j}\), we can expand the vector-valued \(\varvec{g}\) in terms of these eigenfunctions as \(\varvec{g}(\varvec{x})=\sum _{j=1}^{\infty }{\varphi _{j}(\varvec{x})\varvec{\psi }_{j}}\), where \(\varvec{\psi }_{j}\) is a set of vector coefficients called Koopman modes. By iterative application of Eqs. (2) and (3), the following equation is obtained:

$$\begin{aligned} \left( \varvec{g}\circ \varvec{f}\right) \left( \varvec{x}\right) =\sum _{j=1}^{\infty }{\lambda }_j{\varphi }_j\left( \varvec{x}\right) {\varvec{\psi }}_j. \end{aligned}$$
(4)

Therefore, \(\lambda _{j}\) characterizes the time evolution of the corresponding Koopman mode \({\varvec{\psi }}_j\), i.e., the phase of \(\lambda _{j}\) determines its frequency and the magnitude determines the growth rate of its dynamics.

DMD is a popular approach for estimating the approximations of \(\lambda _{j}\) and \({\varvec{\psi }}_j\) from a finite length observation data sequence \(y_0,y_1,\ldots , y_{\tau }\) (\({\in \mathbb {R}}^p\)), where \(y_{t }:= {\varvec{g}}(\varvec{x}_t)\). Let \(\varvec{A}= [y_0,y_1,\ldots , y_{\tau -1}]\) and \(\varvec{B}= [y_1,y_2,\ldots , y_{\tau }]\). Then, DMD basically approximates those by calculating the eigendecomposition of the least-squares solution to

$$\begin{aligned} \min _{P'\in {\mathbb {R}}^{p\times p}}\left( 1/\tau \right) {\sum }_{t=0}^{\tau }{\Vert {\varvec{y}}_{t+1}-{\varvec{P}}'{\varvec{y}}_t\Vert }^2, \end{aligned}$$
(5)

i.e., \(\varvec{BA}^{\dagger }(:=\varvec{P})\) (\(\bullet ^{\dagger }\) is the pseudo-inverse of \(\bullet \)). Let the j-th right and left eigenvector of \(\varvec{P}\) be \(\varvec{\psi }_{j} \) and \(\varvec{\kappa }_{j}\), respectively, and assume that these are normalized so that \(\varvec{\kappa } _{i}^{*}\varvec{\psi }_{j}=\delta _{ij}\) (\(\delta _{ij}\) is the Kronecker’s delta). Then, since any vector \({\varvec{b}\in \mathbb {C}}^p\) can be written as \( {\varvec{b}=\sum _{j=1}^{p}{(\varvec{\kappa }_{i}^{*}\varvec{b})\varvec{\psi }_{j}}}\), we have \(\varvec{g}(\varvec{x})=\sum _{j=1}^{p}{\varphi _{j}(x)\varvec{\psi } _{j}}\) by applying it to \(\varvec{g}(\varvec{x})\). Therefore, by applying \(\mathcal {K}\) to both sides, we have

$$\begin{aligned} \left( \varvec{g}\circ \varvec{f}\right) \left( \varvec{x}\right) =\sum _{j=1}^{p}{\lambda }_j{\varphi }_j\left( \varvec{x}\right) \varvec{\psi }_j, \end{aligned}$$
(6)

indicating a modal representation corresponding to Eq. (4) for the finite sum.

2.2 Kernels for Comparing Nonlinear Dynamical Systems

Selection of an appropriate representation of the data is a fundamental issue in pattern recognition. The important point is to design the features (i.e., kernels) that reflect structure of the data. Time series data is challenging to design the features because of the difficulty in reflecting the data structure (including time length). Researchers have developed alternative kernel methods, including the use of graphs [15, 17], subspaces [12, 33] or trajectories [30]. In this paper, a kernel design applicable to dynamical systems was required. Several methods were proposed, based on the subspace angle with kernel methods such as for an auto-regressive moving average (ARMA) model [30]. These methodologies were previously reviewed [12], from the viewpoint of the Riemannian distance (or metric) on the Grassman manifold.

The Grassmann manifold \(\mathcal {G}\left( m,D\right) \) is the set of m-dimensional linear subspaces of \({\mathbb {R}}^D\). Formally, the Riemannian distance between two subspaces is the geodesic distance on the Grassmann manifold. However, a more intuitive and computationally efficient way of defining the distances uses the principal angles [20]. A previous review [12] categorized the various Riemannian distances into the projection and Binet-Cauchy distance. The former has been used in applications such as face recognition [3, 12], and the latter has been applied in video clustering [30] and face recognition [33], and has been generalized to (specific nonlinear) dynamical systems [30]. We then adopted the Binet-Cauchy distance when comparing complex systems.

The Binet-Cauchy distances were basically obtained with the product of canonical correlations using a variety of kernels [30]. However, the main applications assumed linear dynamical model [12, 30, 33] such as ARMA model. Thus, it is necessary to generalize to nonlinear dynamics without any specific underlying model, into which the Koopman spectrum of dynamics is incorporated. We called the kernels Koopman spectral kernels.

3 Design of Koopman Spectral Kernels

3.1 DMD with Reproducing Kernel

Conceptually, DMD can be considered as producing a local approximation of the Koopman eigenfunctions using a set of linear monomials of the observables as the basis functions. In practice, however, this is certainly not applicable to all systems (in particular, beyond the region of validity for local linearization). Then, DMD with reproducing kernels [16] approximates the Koopman eigenfunctions with richer basis functions.

Let \(\mathcal H\) be the RKHS embedded with the dot product determined by a positive definite kernel k. Additionally, let \(\phi : \mathcal M\rightarrow \mathcal H\) be a feature map, and an instance of \(\phi \) with respect to \(\varvec{x}\) is denoted by \(\phi _{\varvec{x}}\) (i.e., \(\phi _{\varvec{x}}:=\phi {({\varvec{x}})}\)). Then, we define the Koopman operator \(\mathcal K_{\mathcal H} : \mathcal H\rightarrow \mathcal H\) in the RKHS by

$$\begin{aligned} \mathcal {K}_{\mathcal H}{\phi }_{\varvec{x}}=\phi _{\varvec{x}}\circ \varvec{f}. \end{aligned}$$
(7)

Note that almost of the theoretical claims in this study do not necessarily require \(\phi \) to be in the RKHS (it is sufficient to consider that \(\phi \) stays within a Hilbert space), but this assumption should perform the calculation in practice.

In this paper, we robustify the kernel DMD by projecting data onto the direction of POD [4, 16, 29]. First, a centered Gram matrix is defined by \(\bar{G}=\mathbf {H}{} { G}\mathbf {H}\), where G is a Gram matrix, \(\mathbf {H} =\mathbf {I}-\varvec{1}\tau \), \(\mathbf {I}\) is a unit matrix, and \(\varvec{1}\tau \) is a \(\tau \)-by-\(\tau \) matrix, for which each element takes the value \(1/\tau \). The Gram matrix \(G_{xx}\) of the kernel \(k(\varvec{y}_{i},\varvec{y}_{j})\) is defined at \(\varvec{y}_{i}\) and \(\varvec{y}_{j}\) (i and j dimensions) of the observation data matrix \(\varvec{A}\). Similarly, the Gram matrix \(G_{xy}\) of the kernel between \(\varvec{A}\) and \(\varvec{B}\) can be calculated. At this time, \(G_{xx}=\mathcal M_{\tau }^{*}\mathcal M_{\tau }\) and \(G_{xy}=\mathcal M_{\tau }^{*}\mathcal M_{+}\), where \(\mathcal M_{\tau }^{*}\) indicates the Hermitian transpose of \(\mathcal M_{\tau }\). Also, \(\mathcal M_{\tau }:=[\phi _{\varvec{x}_0},..,\phi _{\varvec{x}_{\tau -1}}] \) and \(\mathcal M_{+}:=[\phi _{\varvec{x}_1},..,\phi _{\varvec{x}_{\tau }}]\), where \(\phi _{\varvec{x}_i}\) is considered as a feature map of \(\varvec{x}_i\) from the state space \(\mathcal M\) to the RKHS \(\mathcal H\).

Here, suppose that the eigenvalues and eigenvectors can be truncated based on eigenvalue magnitude. In other words, \(\bar{G}\approx \bar{B}\bar{G}\bar{B}^{*}\) where p \((\le \tau ) \) eigenvalues are adopted. Then, a principal orthogonal direction in the feature space is given by

$$\begin{aligned} {\nu }_j={\mathcal M_{\tau }}\mathbf {H}{\bar{ S}}_{ jj}^\mathrm{-1/2}{\varvec{\beta }}_{ j}, \end{aligned}$$
(8)

where \(\varvec{\beta }_j\) is the jth row of \(\bar{B}\). Let \(\mathcal U =[{\nu }_{1},\ldots , {\nu }_{j}] = \mathcal M_{\tau }\mathbf {H}\bar{ B}\bar{ S}^\mathrm{-1/2}\). Since \({\mathcal M}_{+}={\mathcal K}_{\mathcal H}{\mathcal M}_{\tau }\), the projection of \({\mathcal K}_{\mathcal H}\) onto the space spanned by \({\nu }_{j}\) is given as follows:

$$\begin{aligned} \hat{F}=\mathcal U\mathcal K_{\mathcal H}\mathcal U=\bar{ S}^{-1/2}\bar{B}^{*}{} \mathbf{H}(\mathcal M_{\tau }{\mathcal M}_{+})\mathbf{H}\bar{ B}\bar{ S}^{-1/2}. \end{aligned}$$
(9)

Note that \({G}_{xy}={\mathcal M}_{\tau }^{*}{\mathcal M}_{+}\). Then, if we let \({\hat{F}}={\hat{T}}^{-1}{\hat{\varLambda }}{\hat{T}}\) be the eigendecomposition of \({\hat{F}}\), we obtain the centered DMD mode \(\bar{\varphi }_{j}=\mathcal {U}{\varvec{b}}_{j}=\mathcal M_{\tau }\mathbf {H}\bar{ B}\bar{ S}^\mathrm{-1/2}{\varvec{b}}_{ j}\), where \(b_{j}\) is the jth row of \({\hat{T}}^{-1}\). The diagonal matrix \({\hat{\varLambda }}\) comprising the eigenvalues represents the temporal evolution of the mode.

3.2 Koopman Spectral Kernels

For calculating the similarity between the dynamical systems \({DS}_{i}\) and \({DS}_{j}\), we compute Koopman spectral kernels based on the idea of Binet-Cauchy kernels. The Binet-Cauchy kernels are basically calculated from the traces of compound matrices [30] defined as follows. Let M be a matrix in \({\mathbb {R}}^{m\times n}\). For \({q} \le min(m, n)\), define \(I_q^n=\left\{ {\varvec{i}}={ i}_{1},\cdots ,i_q: \mathrm {1} \le i_1< . . . < i_q \le n,i_i\mathbb { }\in \mathbb {N}\right\} \), and likewise \(I_{q}^{m}\). We denote by \({C_{q}}(M)\) the qth compound matrix, that is, the \(\begin{pmatrix} m &{} \\ q &{} \\ \end{pmatrix} \times \begin{pmatrix} n &{} \\ q &{} \\ \end{pmatrix} \) matrix whose elements are the minors \(det((M_{k,l})_{k\ne {\varvec{i}},\text {l} \ne {\varvec{j}}}\)), where \({\varvec{i}} \in I_q^n\) and \({\varvec{j}} \in I_q^m\) are assumed to be arranged in lexicographical order. In the unifying viewpoint [30], Binet-Cauchy kernels is a general representation including various kernels [6, 15, 17, 21], divided into two strategies. One is the trace kernel obtained by setting \({q} = 1\) (i.e., \({ C_1}(M) = M\)), which directly reflects the property of temporal evolution of the dynamical systems, including diffusion kernel [17] and graph kernel [15]. Second is the determinant kernel obtained by setting order q to be equal to the order of the dynamical systems n (i.e., \( C_{n}(M) = det(M)\)), which extracts coefficients of dynamical systems, including the Martin distance [21] and the distance based on the subspace angle [6].

We expand the kernels to applying Koopman spectral analysis, which are called the Koopman trace kernel and Koopman determinant kernel, respectively. Both kernels reflect the Koopman eigenvalue, the eigenfunction, and the mode (i.e., system trajectory including the initial condition). However, richer information of system trajectory does not necessarily increase expressiveness such as in classification with real-world data. Therefore, we also expanded the kernel of principal angle [33] to applying Koopman spectral analysis, which is called Koopman kernel of principal angle. The kernel principal angle is theoretically a simple case of the trace kernel [30], which is defined as the inner product of linear subspaces in this feature space. In this paper, for a simple comparison, we compute the kernel with the inner product of the Koopman modes (i.e. not the trajectory and independent of initial condition).

Koopman Trace Kernel and Determinant Kernel. First, for the trace kernel, we generalize the kernel assmuing the ARMA model [30], to nonlinear dynamical systems without specifying an underlying model. The trace kernel of \(DS_{i}\) and \(DS_{j}\) can be theoretically defined as follows:

$$\begin{aligned} k\left( {DS}_{i,}{DS}_j\right) := \sum _{t=0}^{\infty }\left( e^{-\kappa t}{{\varvec{g}}_i\left( {\varvec{x}}_{i,t}\right) }^{\mathrm {T}}{\varvec{W}}{\varvec{g}}_j\left( {\varvec{x}}_{ j,t}\right) \right) , \end{aligned}$$
(10)

where \({\varvec{g}}_i\) and \({\varvec{g}}_j\) is the observation function and \({\varvec{W}}\) is an arbitrary semidefinite matrix (here, \({\varvec{W}}= \mathbf 1 \)). Moreover, for converging the above equation, we suppose the exponential discount \(\mu (t) = e^{-{\kappa }t} ({\kappa }> 0)\). In this paper, noises in observation and latent dynamics are not considered. Koopman trace kernel can be computed using the modal representation given by the kernel DMD as follows:

$$\begin{aligned} k\left( {DS}_{i,}{DS}_j\right) ={{\varvec{\varphi }}_i\left( {\varvec{x}}_{i,0}\right) }^{\mathrm {T}}\sum _{t=0}^{\infty }\left( e^{-\kappa t}{} \mathbf{{\Lambda }}_i^t\left( {\mathbf{{\Psi }}_i}^{\mathrm {T}}{\varvec{W}}{} \mathbf{{\Psi }}_j\right) \mathbf{{\Lambda }}_j^t\right) {\varvec{\varphi }}_j\left( {\varvec{x}}_{j,0}\right) , \end{aligned}$$
(11)

where, \(\mathbf {\Lambda }_{i}\) is a diagonal matrix consisting of Koopman eigenvalues, \(\mathbf {\Psi }_{i}\) is the Koopman mode, and \({\varvec{\varphi }}_{i}\) is the Koopman eigenfunction (also for j). Although the equation includes an infinite sum, we can efficiently compute the matrix \({\varvec{M}}:=\sum _{t=0}^{\infty }{(e^{-\kappa t}\mathbf \Lambda _{ i}^{ t}(\mathbf \Psi _{ i}^{\mathrm {T}}{\varvec{W}}\mathbf {\Psi }_{ j})\mathbf \Lambda _{ j}^{ t})}\) using the following Sylvester equation \({\varvec{M}}=e^{-\kappa }\mathbf \Lambda _{ i}^{\mathrm {T}}M\mathbf \Lambda _{ j}+\mathbf \Psi _{ i}^{\mathrm {T}}W\mathbf \Psi _{ j}\), where the Koopman mode \( {\varvec{\Psi }} =\mathcal U^{*}{} \mathbf{H}{\mathcal M}_{\tau }{} \mathbf{H}{\mathcal U}{\hat{T}}^{-1}\) for i and j. For creating a trace kernel independent of the initial conditions [30], we take expectation over \(\varvec{x}_{i,0}\) and \(\varvec{x}_{j,0}\) in the trace kernel, yielding

$$\begin{aligned} k\left( {DS}_{i,}{DS}_j\right) =tr\left( \mathbf{{\Sigma }}_{{\varvec{\varphi }}_i\left( \varvec{x}_{i,0}\right) ,{\varvec{\varphi }}_j\left( \varvec{x}_{j,0}\right) }{\varvec{M}}\right) , \end{aligned}$$
(12)

where the initial Koopman eigenvalue \({\varvec{\varphi }} (x_{0})={\varvec{a}}^{*}(\mathcal M_{\tau }{} \mathbf{H}{\mathcal U})^{*}{\mathcal M}_{\tau ,0}\) for i and j [16]. Here, \({\varvec{a}}\) is the left eigenvector of \({\hat{F}}\) and \(\mathcal M_{\tau ,0}\) is a vector indicating the first single column of \(\mathcal M_{\tau }\). \(\Sigma _{{\varvec{\varphi }} _{i}(\varvec{x}_{i,0}),{\varvec{\varphi }} _{j}(\varvec{x}_{j,0})}\in {\mathbb {C}}^{p\times p}\) is the covariance of all initial values \({\varvec{\varphi }}_{n}\left( {\varvec{x}}_0\right) \in {\mathbb {C}}^{p\times n}\) of \(DS_{i}\) for each index 1, ... p of eigenvalues (p was fixed for all i). Similarly, the determinant kernel using the representation given by kernel DMD can be computed:

$$\begin{aligned} k\left( {DS}_{i,}{DS}_j\right) =det\left( \mathbf{{\Psi }}_i{\varvec{M}}{\mathbf{{\Psi }}_{ j}}^{\mathrm {T}}\right) , \end{aligned}$$
(13)

where \({\varvec{M}}=e^{-\kappa }{} \mathbf{\Lambda }_{ i}^{\mathrm {T}}{\varvec{M}}{} \mathbf{\Lambda }_{ j}+{\varvec{\varphi }}_{i}(\varvec{x}_{i,0}){\varvec{\varphi }}_{j}(\varvec{x}_{j,0})^{\mathrm {T}}\). Determinant kernels independent of the initial condition can only be computed for a single output system [30].

Koopman Kernel of Principal Angle. The kernel of principal angle can be computed using the Koopman modes given by kernel DMD. With respect to \(DS_{i}\), we define the kernel of principal angles as the inner product of the Koopman modes in the feature space: \(A^{*}A={\hat{T}}_{i}^{-1}{\mathcal U}_{i}^{*}{} \mathbf{H}{G}_{xxi}{} \mathbf{H}{\mathcal U}_{i}{\hat{T}}_{i} \). If the rank of \({\hat{F}}\) is \(r_{i}\), \({A}^{*}{A}\) is a \(r_{i}\)-order square matrix. Also for \(DS_{j}\), we create a similar matrix \(B^{*}B\). Furthermore, we define the inner product of the linear subspaces between \(DS_{i}\) and \(DS_{j}\) as \( A^{*}B={\hat{T}}_{i}^{-1}{\mathcal U}_{i}^{*}{} \mathbf{H}{G}_{xxij}{} \mathbf{H}{\mathcal U}_{j}{\hat{T}}_{j}\). \(G_{xxij}\) is a \(n_{i }\times n_{j}\) matrix obtained by picking up the upper-right part of the centered Gram matrix obtained by connecting \(\varvec{A}_{i}\) and \(\varvec{A}_{j}\) in series (\(n_{i}\) and \(n_{j}\) are the lengths of the time series). Then, using these matrices, we solve the following generalized eigenvalue problem:

$$\begin{aligned} \left( \begin{matrix}0&{}{\left( A^{*}B\right) }^{*}\\ A^{*}B&{}0\end{matrix}\right) {\varvec{V}}= {\varvec{\lambda }}_{ij}\left( \begin{matrix}B^{*}B&{}0\\ 0&{}A^{*}A\end{matrix}\right) {\varvec{V}}, \end{aligned}$$
(14)

where the size of \({\varvec{\lambda }}_{ij}\) is finally adjusted to \(r_{ij}\) = min(\(r_{i},r_{j}\)) in descending order, and \({\varvec{V}}\) is a generalized eigenvector. The eigenvalue \({\varvec{\lambda }}_{ij}\) is the kernel of principal angle.

4 Embedding and Classification of Dynamics

A direct but important application of this analysis is the embedding and classification of dynamics using extracted features. A set of Koopman spectra estimated from the analysis can be used as the basis for a low-dimensional subspace representing the dynamics. The classification of dynamics can be performed using feature vectors determined by the Koopman spectral kernels. We used the Gaussian kernel, with the kernel width set as the median of the distances from a data matrix.

Before applying our approach to multiagent sports data, an experiment was conducted using open-source real-world data. In this case, human locomotion data were taken from the CMU Graphics Lab Motion Capture Database (available at http://mocap.cs.cmu.edu). To verify the classification performance, we computed the trace kernel of an auto-regressive (AR) model, representing a conventional linear dynamical model. For embedding of the distance matrix with our kernels, components of the distance matrix between \(DS_{i}\) and \(DS_{j}\) in the feature space were obtained using \( dist(DS_{i,}DS_{j})=k(\varvec{A}_{ i},\varvec{A}_{ i})+ k(\varvec{A}_{ j},\varvec{A}_{ j})-\mathrm 2 k(\varvec{A}_{ i},\varvec{A}_{ j}).\) Figure 1a–c shows the embedding of the sequences using multidimensional scaling (MDS) with the distance matrix, computed with the Koopman kernel principal angle, Koopman determinant kernel, and trace kernel of the AR model, respectively. Classification of performances into jumping, running, and walking was computed using the k-nearest neighbor algorithm. Error rates of the test data were small in this order: the Koopman kernel of principal angle (0.261), Koopman determinant kernel (0.348), trace kernel of the AR model (0.522), and Koopman trace kernel (0.601). Two Koopman spectral kernels performed better in classification than the kernel of the linear dynamical model.

Fig. 1.
figure 1

MDS embedding of (a) Koopman kernel of principal angle, (b) Koopman determinant kernel, and (c) trace kernel of AR model. Blue, red, and green indicate jump, run, and walk, respectively (x and triangle show the movements with turn and stop, respectively). (Color figure online)

5 Application to Multiagent Sport Plays

We used player-tracking data from two international basketball games in 2015 collected by the STATS SportVU system. The total playing time was 80 min, and the total score of the two teams was 276. Positional data comprised the xy position of every player and the ball on the court, recorded at 25 frames per second. We eliminated transitions in attack to automatically extract the time periods to be analyzed (called an attack-segment). We defined an attack-segment as the period from all players on the attacking side court entry to 1 s before a shot was made. We analyzed a total of 192 attack-segments, 77 of which ended in a successful shot.

Next, we calculated effective attacker-defender distances to predict the success or failure of the shot (details were given by [8]), which were temporally and spatially corrected (Fig. 2a). Although all of the distances were 25 dimensions (five attackers and defenders), we previously reduced to four dimensions [8]: (1) ball-mark distance, (2) ball-help distance, (3) pass-mark distance, and (4) pass-help distance (Fig. 2b–c). These distances were used to create seven input vector series: (i) a one-dimensional distance (1), (ii) a two-dimensional distance comprising (1) and (2), and (iii–iv) three- and four-dimensional (1–3, 1–4) important distances, respectively. For verification, (v) total 25 distances and (vi) 25-dimensional Euclidean distances without spatiotemporal correction were calculated. We also used (vii) the xy position (total 20 dimensions) of all the ten players.

Fig. 2.
figure 2

Diagrams and examples of attacker-defender distance. (a) Diagram of attacker-defender distance with spatiotemporal correction. (b) Examples of four important distances. Orange, black, pink and light blue indicate the ball-mark, ball-help, pass-mark, and pass-help distance, respectively. (c) Example of time series in the same four important attacker-defender distances. (Color figure online)

When predicting the outcome of a team-attack movement, it is preferable to compute the posterior probability rather than the outcome identification of the shot accuracy itself. We used a naive Bayes classifier and a related vector machine (RVM) for classification. Figure 3a shows the result of applying the naive Bayes classifier. The horizontal axis shows the seven input vector series and the vertical axis the classification error. The Koopman kernel principal angles derived by inputting four important distances demonstrated minimum error of 35.9%. The result of applying the RVM is shown in Fig. 3b, using the same axes. The performance of the naive Bayes classifier was superior to that of the RVM. In both cases, the Koopman spectral kernels produced better classification than the kernel of the linear dynamical model.

Fig. 3.
figure 3

Results from applying (a) the naive Bayes classifier and (b) the relevant vector machine. Kpa, Kdet, Ktr, and trAR are Koopman kernel of principal angle, Koopman determinant kernel and trace kernel, and trace kernel with AR model, respectively.

Figure 4a–c show embedding via MDS with the distance matrix of the Koopman kernel of principal angle countered by frequencies of success and failure of the shot. For example, the best case of the four important attacker-defender distances (Fig. 4a) showed the expressiveness in scorability due to wide distribution across the plot. In contrast, they were less widely distributed when only single distance (Fig. 4b) or the xy coordinates of all players (Fig. 4c) were used.

Fig. 4.
figure 4

MDS embedding of Koopman kernel of principal angle with three input vector series. The series consisted of (a) four important distances, (b) single important distance, and (c) xy coordinates of all players. Red and blue indicates success and failure of the shot, respectively. (Color figure online)

6 Discussion and Conclusion

The results of the two empirical examples showed that the best performances of the Koopman spectral kernels (Koopman determinant kernel and kernel of principal angle) are superior to that of the AR model assuming a linear dynamical model. Our proposed kernels can be computed in a closed form; but practically, the values of the Koopman determinant kernel were too large and the performance of the Koopman trace kernel was no better than that of the others. In contrast, the Koopman kernel of principal angle showed effective expressiveness only using Koopman modes.

When applied to multiagent sports data, the highest performance was provided by the classifier using the four important distances. This vector series reflects four characteristics: the scorability of a player in the current and future (i) shot, (ii) dribble, and (iii) pass, and (iv) the scorability of a dribbler after the pass. The proposed kernel reflected the time series of all interactions between players and was more effective for the classification than the kernel based on the information only on the shot itself. Well-trained teams aim to create scoring opportunities by continuously selecting tactical passes and dribbles or by improvising when no shooting opportunity is available.

However, even the best classification was not high (64.1% accuracy) when applied to real multiagent sports data. Two factors may have been neglected by our framework. The first is the existence of local interactions between players, such as local competitive and cooperative play by the attackers and defenders [8] when seen in higher spatial resolution than was available in this study. The approach needs to reflect the hierarchical characteristics of global dynamics and local dynamics. The second is the limitation of the input vector series to the attacker-defender distances. To achieve more accurate classifiers, not only the most important factor (i.e., distance) but also further hand-made time-series input vector series (e.g., Cartesian coordinates or specific movement parameters) should be used.

Overall, we developed Koopman spectral kernels that can be computed in closed form and used to compare multiple nonlinear dynamical systems. In competitive sports, coaches spend considerable amounts of time analyzing videos of their own team and the opposing team. Application of a system such as the one presented here may save time and create tactical plans that can currently be generated only by experienced coaches. More generally, the algorithm can be applied to the analysis of the complex dynamics of groups of living organisms or artificial agents, which currently elude formulation.