Independence test and canonical correlation analysis based on the alignment between kernel matrices for multivariate functional data
 150 Downloads
Abstract
In the case of vector data, Gretton et al. (Algorithmic learning theory. Springer, Berlin, pp 63–77, 2005) defined Hilbert–Schmidt independence criterion, and next Cortes et al. (J Mach Learn Res 13:795–828, 2012) introduced concept of the centered kernel target alignment (KTA). In this paper we generalize these measures of dependence to the case of multivariate functional data. In addition, based on these measures between two kernel matrices (we use the Gaussian kernel), we constructed independence test and nonlinear canonical variables for multivariate functional data. We show that it is enough to work only on the coefficients of a series expansion of the underlying processes. In order to provide a comprehensive comparison, we conducted a set of experiments, testing effectiveness on two real examples and artificial data. Our experiments show that using functional variants of the proposed measures, we obtain much better results in recognizing nonlinear dependence.
Keywords
Multivariate functional data Functional data analysis Correlation analysis Canonical correlation analysis1 Introduction
The theory and practice of statistical methods in situations where the available data are functions (instead of real numbers or vectors) is often referred to as Functional Data Analysis (FDA). The term Functional Data Analysis was already used by Ramsay and Dalzell (1991) two decades ago. This subject has become increasingly popular from the end of the 1990s and is now a major research field in statistics (Cuevas 2014). Good access to the large literature in this field comes from the books by Ramsay and Silverman (2002, 2005), Ferraty and Vieu (2006), and Horváth and Kokoszka (2012). Special issues devoted to FDA topics have been published by different journals, including Statistica Sinica 14(3) (2004), Computational Statistics 22(3) (2007), Computational Statistics and Data Analysis 51(10) (2007), Journal of Multivariate Analysis 101(2) (2010), Advances in Data Analysis and Classification 8(3) (2014).
The range of real world applications, where the objects can be thought of as functions, is as diverse as speech recognition, spectrometry, meteorology, medicine or clients segmentation, to cite just a few (Ferraty and Vieu 2003; James et al. 2009; MartinBaragan et al. 2014; Devijver 2017).
The uncentered kernel alignment originally was introduced by Cristianini et al. (2001). Gretton et al. (2005) defined Hilbert–Schmidt Independence Criterion (HSIC) and the empirical HSIC. Centered kernel target alignment (KTA) was introduced by Cortes et al. (2012). This measure is a normalized version of HSIC. Zhang et al. (2011) gave an interesting kernelbased independence test. This independence testing method is closely related to the one based on the Hilbert–Schmidt independence criterion (HSIC) proposed by Gretton et al. (2008). Gretton et al. (2005) described a permutationbased kernel independence test. There is a lot of work in the literature for kernel alignment and its applications (good overview can be found in Wang et al. 2015).
This work is devoted to a generalization of these measures of dependence to the case of multivariate functional data. In addition, based on these measures we constructed independence test and nonlinear canonical correlation variables for multivariate functional data. These results are based on the assumption that the applied kernel function is Gaussian. Functional HSIC and KTA canonical correlation analysis can be viewed as natural nonlinear extension of functional canonical correlation analysis (FCCA). So, we propose two nonlinear functional CCA extensions that capture nonlinear relationship. Moreover, both algorithms are capable of extracting also linear dependency. Additionally, we show that functional KTA approach is only a normalized variant of HSIC coefficient also for functional data. Finally, we propose some interpretation of module weighting functions for functional canonical correlations.
Section 2 provides an overview of centered measures alignment for random vectors. They are defined by such concepts as: kernel function alignment, kernel matrix alignment, and Hilbert–Schmidt Independence Criterion (HSIC) and associations between them have been shown. Functional data can be seen as values of random processes. In our paper, the multivariate random function \(\pmb X\) and \(\pmb Y\) have special representation (8) in finite dimensional subspaces of the spaces of square integrable functions on the given intervals. In Sect. 3 we present kernelbased independence test. Section 4 discusses the concept of alignment for multivariate functional data. The kernel function, the alignment between two kernels functions, the centered kernel alignment (KTA) between two kernel matrices and the empirical Hilbert–Schmidt Independence Criterion (HSIC) are defined. The HSIC was used as the basis for an independence test. In Sect. 5 we present kernelbased independence test for multivariate functional data. In Sect. 5, based on the concept of alignment between kernel matrices, nonlinear canonical variables were constructed. It is a generalization of the results of Chang et al. (2013) for random vectors. In Sect. 5 we present an one artificial and two real examples which confirm the usefulness of proposed coefficients in detection of nonlinear dependency for group of variables.
2 An overview of kernel alignment and its applications
We introduce the following notational convention. Throughout this section, \(\pmb {X}\) and \(\pmb {Y}\) are random vectors, with domains \(\mathbb {R}^p\) and \(\mathbb {R}^q\), respectively. Let \(P_{\pmb {X},\pmb {Y}}\) be a joint probability measure on (\(\mathbb {R}^p\times \mathbb {R}^q\), \(\Gamma \times \Lambda \)) (here \(\Gamma \) and \(\Lambda \) are the Borel \(\sigma \)algebras on \(\mathbb {R}^p\) and \(\mathbb {R}^q\), respectively), with associated marginal probability measures \(P_{\pmb {X}}\) and \(P_{\pmb {Y}}\).
Definition 1
A kernel function can be interpreted as a kind of similarity measure between the vectors \(\pmb x\) and \(\pmb x'\).
Definition 2
(Gram matrix, Mercer 1909; Riesz 1909; Aronszajn 1950) Given a kernel k and inputs \(\pmb x_1,\ldots , \pmb x_n\in \mathbb {R}^p\), the \(n\times n\) matrix \(\pmb K\) with entries \(K_{ij}=k(\pmb x_i, \pmb x_j)\) is called the Gram matrix (kernel matrix) of k with respect to \(\pmb x_1,\ldots , \pmb x_n\).
Definition 3
Definition 4
(Positive semidefinite kernel, Mercer 1909; Hofmann et al. 2008) A function \(k:\mathbb {R}^p\times \mathbb {R}^p\rightarrow \mathbb {R}\) which for all \(n\in \mathbb {N},\ \pmb x_i\in \mathbb {R}^p,\ i=1,\ldots ,n\) gives rise to a positive semidefinite Gram matrix is called a positive semidefinite kernel.
This raises an interesting question: given a function of two variables \(k(\pmb x, \pmb x')\), does there exist a function \(\pmb \varphi (\pmb x)\) such that \(k(\pmb x, \pmb x') = \langle \pmb \varphi (\pmb x), \pmb \varphi (\pmb x')\rangle _\mathcal {H}?\) The answer is provided by Mercer’s theorem (1909) which says, roughly, that if k is positive semidefinite then such a \(\varphi \) exists.
Often, we will not known \(\pmb {\phi }\), but a kernel function \(k:\mathbb {R}^p\times \mathbb {R}^p\rightarrow \mathbb {R}\) that encodes the inner product in \(\mathcal {H}\), instead.
Popular positive semidefinite kernel functions on \(\mathbb {R}^p\) include the polynomial kernel of degree \(d>0\), \(k(\pmb {x},\pmb {x}')= (1 + \pmb x^\top \pmb x')^d\), the Gaussian kernel \(k(\pmb {x},\pmb {x}')=\exp (\lambda \Vert \pmb {x}\pmb {x}'\Vert ^2)\), \(\lambda >0\), and the Laplace kernel \(k(\pmb {x},\pmb {x}')=\exp (\lambda \Vert \pmb {x}\pmb {x}'\Vert )\), \(\lambda >0\). In this paper we use, the Gaussian kernel.
We start with the definition of centering and the analysis of its relevant properties.
2.1 Centered kernel functions
2.2 Centered kernel matrices
The centered kernel matrix \(\widetilde{\pmb {K}}\) is a positive semidefinite matrix. Also, as with the kernel function \(\frac{1}{n^2}\sum _{i,j}^n\tilde{K}_{ij}=0\).
2.3 Centered kernel alignment
Definition 5
We can define similarly the alignment between two kernel matrices \(\pmb {K}\) and \(\pmb {L}\) based on the finite subset \(\{\pmb {x}_1,\ldots ,\pmb {x}_n\}\) and \(\{\pmb {y}_1,\ldots ,\pmb {y}_n\}\), respectively.
Definition 6
Here, by the Cauchy–Schwarz inequality, \(\hat{\rho }(\pmb {K},\pmb {L})\in [1,1]\) and in fact \(\hat{\rho }(\pmb {K},\pmb {L})\in [0,1]\) when \(\pmb {K}\) and \(\pmb {L}\) are the kernel matrices of the positive semidefinite kernel \(\tilde{k}\) and \(\tilde{l}\).
Gretton et al. (2005) defined Hilbert–Schmidt Independence Criterion (HSIC) as a test statistic to distinguish between null hypothesis \(H_0:P_{\pmb X, \pmb Y}=P_{\pmb X}P_{\pmb Y}\) (equivalently we may write \(\pmb X{\perp \!\!\!\perp }\pmb Y\)) and alternative hypothesis \(H_1:P_{\pmb X, \pmb Y} \ne P_{\pmb X}P_{\pmb Y}\).
Definition 7
(Reproducing kernel Hilbert space, Riesz 1909; Mercer 1909; Aronszajn 1950) Consider a Hilbert space \(\mathcal {H}\) of functions from \(\mathbb {R}^p\) to \(\mathbb {R}\). Then \(\mathcal {H}\) is a reproducing kernel Hilbert space (RKHS) if for each \(\pmb {x}\in \mathbb {R}^p\), the Dirac evaluation operator \(\delta _{\pmb {x}}:\mathcal {H}\rightarrow \mathbb {R}\), which maps \(f\in \mathcal {H}\) to \(f(\pmb {x})\in \mathbb {R}\), is a bounded linear functional.
Let \(\varphi :\mathbb {R}^p\rightarrow \mathcal {H}\) be a map such that for all \(\pmb x, \pmb x'\in \mathbb {R}^p\) we have \(\langle \pmb {\phi }(\pmb {x}),\pmb {\phi }(\pmb {x}')\rangle _{\mathcal {H}}=k(\pmb {x},\pmb {x}')\), where \(k:\mathbb {R}^p\times \mathbb {R}^p\rightarrow \mathbb {R}\) is a unique positive semidefinite kernel. We will require in particular that \(\mathcal {H}\) be separable (it must have a complete, countable orthonormal system). We likewise define a second separable RKHS \(\mathcal {G}\), with kernel \(l(\cdot , \cdot )\) and feature map \(\pmb {\psi }\), on the separable space \(\mathbb {R}^q\).
Definition 8
Definition 9
It follows from (5) that the Frobenius norm of \(\pmb {C}_{\pmb {X},\pmb {Y}}\) exists when the various expectations over the kernels are bounded, which is true as long as the kernels k and l are bounded.
Definition 10
Comparing (4) and (6) and using (3), we see that the centered kernel target alignment (KTA) is simply a normalized version of \({{\mathrm{HSIC}}}(S)\).
In two seminar papers on Székely et al. (2007) and Székely and Rizzo (2009) introduced the distance covariance (dCov) and distance correlation (dCor) as powerful measures of dependence.
Sejdinovic et al. (2013) demonstrated that distance covariance is an instance of the Hilbert–Schmidt Independence Criterion. Górecki et al. (2016, 2017) showed an extension of the distance covariance and distance correlation coefficients to the functional case.
2.4 Kernelbased independence test
Statistical tests of independence have been associated with a broad variety of dependence measures. Classical tests such as Spearman’s \(\rho \) and Kendall’s \(\tau \) are widely applied, however they are not guaranteed to detect all modes of dependence between the random variables. Contingency tablebased methods, and in particular the powerdivergence family of test statistics (Read and Cressie 1988) are the best known general purpose tests of independence, but are limited to relatively low dimensions, since they require a partitioning of the space in which random variable resides. Characteristic functionbased tests (Feuerverger 1993; Kankainen 1995) have also been proposed. They are more general than kernelbased tests, although to our knowledge they have been used only to compare univariate random variables.
Now, we describe how HSIC can be used as an independence measure, and as the basis for an independence test. We begin by demonstrating that the Hilbert–Schmidt norm can be used as a measure of independence, as long as the associated RKHSs are universal.
A continuous kernel k on a compact metric space is called universal if the corresponding RKHS \(\mathcal {H}\) is dense in the class of continuous functions of the space.
Denote by \(\mathcal {H}\), \(\mathcal {G}\) RKHSs with universal kernels k, l on the compact domains \(\mathcal {X}\) and \(\mathcal {Y}\) respectively. We assume without loss of generality that \(\Vert f\Vert _{\infty }\le 1\) and \(\Vert g\Vert _{\infty }\le 1\) for all \(f\in \mathcal {H}\) and \(g\in \mathcal {G}\). Then Gretton et al. (2005) proved that \(\Vert \pmb {C}_{\pmb {X},\pmb {Y}}\Vert _{HS}=0\) if and only if \(\pmb {X}\) and \(\pmb {Y}\) are independent. Examples of universal kernels are Gaussian kernel and Laplacian kernel, while the linear kernel \(k(\pmb {x},\pmb {x}')=\pmb {x}^\top \pmb {x}'\) is not universal—the corresponding HSIC tests only linear relationships, and a zero crosscovariance matrix characterizes independence only for multivariate Gaussian distributions. Working with the infinite dimensional operator with universal kernels, allows us to identify any general nonlinear dependence (in the limit) between any pair of vectors, not just Gaussians.
We recall that in this paper we use the Gaussian kernel. We now consider the asymptotic distribution of statistics (6).
We introduce the null hypothesis \(H_0:\pmb {X} {\perp \!\!\!\perp } \pmb {Y}\) (\(\pmb {X}\) is independent of \(\pmb {Y}\), i.e., \(P_{\pmb {X},\pmb {Y}}=P_{\pmb {X}}P_{\pmb {Y}}\)). Suppose that we are given the i.i.d. samples \(S_{\pmb {x}}=\{\pmb {x}_1,\ldots ,\pmb {x}_n\}\) and \(S_{\pmb {y}}=\{\pmb {y}_1,\ldots ,\pmb {y}_n\}\) for \(\pmb {X}\) and \(\pmb {Y}\), respectively. Let \(\widetilde{\pmb {K}}\) and \(\widetilde{\pmb {L}}\) be the centered kernel matrices associated to the kernel function k and the sets \(S_{\pmb {x}}\) and \(S_{\pmb {y}}\), respectively. Let \(\lambda _1\ge \lambda _2\ge \cdots \ge \lambda _n\ge 0\) be the eigenvalues of the matrix \(\widetilde{\pmb {K}}\) and let \(\pmb {v}_1,\ldots ,\pmb {v}_n\) be a set of orthonormal eigenvectors corresponding to these eigenvalues. Let \(\lambda _1'\ge \lambda _2'\ge \cdots \ge \lambda _n'\ge 0\) be the eigenvalues of the matrix \(\widetilde{\pmb {L}}\) and let \(\pmb {v}_1',\ldots ,\pmb {v}_n'\) be a set of orthonormal eigenvectors corresponding to these eigenvalues. Let \(\Lambda ={{\mathrm{diag}}}(\lambda _1,\ldots ,\lambda _n)\), \(\Lambda '={{\mathrm{diag}}}(\lambda _1',\ldots ,\lambda _n')\), \(\pmb {V}=(\pmb {v}_1,\ldots ,\pmb {v}_n)\) and \(\pmb {V}'=(\pmb {v}_1',\ldots ,\pmb {v}_n')\). Suppose further that we have the eigenvalue decomposition (EVD) of the centered kernel matrices \(\widetilde{\pmb {K}}\) and \(\widetilde{\pmb {L}}\), i.e., \(\widetilde{\pmb {K}}=\pmb {V}\pmb {\Lambda }\pmb {V}^\top \) and \(\widetilde{\pmb {L}}=\pmb {V}'\pmb {\Lambda }'(\pmb {V}')^\top \).
Let \(\pmb {\Psi }=(\pmb {\Psi }_1,\ldots ,\pmb {\Psi }_n)=\pmb {V}\pmb {\Lambda }^{1/2}\) and \(\pmb {\Psi }'=(\pmb {\Psi }_1',\ldots ,\pmb {\Psi }_n')=\pmb {V}'(\pmb {\Lambda }')^{1/2}\), i.e., \(\pmb {\Psi }_i=\sqrt{\lambda _i}\pmb {v}_i\), \(\pmb {\Psi }_i'=\sqrt{\lambda _i'}\pmb {v}_i'\), \(i=1,\ldots ,n\).
Note that the databased test statistic HSIC (or its probabilistic counterpart) is sensible to dependence/independence and therefore can be used as a test statistic. Also important is the knowledge of its asymptotic distribution. These facts inspire the following dependence/independence testing procedure. Given the sample \(S_{\pmb {x}}\) and \(S_{\pmb {y}}\), one first calculates the centered kernel matrices \(\widetilde{\pmb {K}}\) and \(\widetilde{\pmb {L}}\) and their eigenvalues \(\lambda _i\) and \(\lambda _i'\), and then evaluates the statistic \({{\mathrm{HSIC}}}(S)\) according to (6). Next, the empirical null distribution of Z under the null hypothesis can be simulated in the following way: one draws i.i.d. random samples from the \(\chi ^2_1\)distributed variables \(Z_{ij}^2\), and then generates samples for Z according to (7). Finally the p value can be found by locating \({{\mathrm{HSIC}}}(S)\) in the simulated null distribution.
A permutationbased test is described in Gretton et al. (2005). In the first step they propose to calculate the test statistic T (HSIC or KTA) for the given data. Next, keeping the order of the first sample we randomly permute the second sample a large number of times, and recompute the selected statistic each time. This destroy any dependence between samples simulating a draw from the product of marginals, making the empirical distribution of the permuted statistics behave like the null distribution of the test statistic. For a specified significance level \(\alpha \), we calculate threshold \(t_\alpha \) in the right tail of the null distribution. We reject \(H_0\) if \(T>t_\alpha \). This test was proved to be consistent against any fixed alternative. It means that for any fixed significance level \(\alpha \), the power goes to 1 as the sample size tends to infinity.
2.5 Functional data
 1.
They easily cope with the problem of missing observations, an inevitable problem in many areas of research. Unfortunately, most data analysis methods require complete time series. One solution is to delete a time series that has missing values from the data, but this can lead to , and generally leads to, loss of information. Another option is to use one of many statistical methods to predict the missing values, but then the results will depend on the interpolation method. In contrast to this type of solutions, in the case of functional data, the problem of missing observations is solved by expressing time series in the form of a set of continuous functions.
 2.
The functional data naturally preserve the structure of observations, i.e. they maintain the time dependence of the observations and take into account the information about each measurement.
 3.
The moments of observations do not have to be evenly spaced in individual time series.
 4.
Functional data avoids the curse of dimensionality. When the number of time points is greater than the number of time series considered, most statistical methods will not give satisfactory results due to overparametrization. In the case of functional data, this problem can be avoided because the time series are replaced with a set of continuous functions independent of the number of time points in which observations are measured.
This means that the values of random processes \(\pmb {X}\) and \(\pmb {Y}\) are in finite dimensional subspaces of \(L_2^p(I_1)\) and \(L_2^q(I_2)\), respectively. We will denote these subspaces by \(\mathcal {L}_2^p(I_1)\) and \(\mathcal {L}_2^q(I_2)\).
Typically data are recorded at discrete moments in time. The process of transformation of discrete data to functional data is performed for each realization and each variable separately. Let \(x_{gj}\) denote an observed value of the feature \(X_g\), \(g=1,2,\ldots p\) at the jth time point \(s_j\), where \(j=1,2,...,J\). Similarly, let \(y_{hj}\) denote an observed value of feature \(Y_h\), \(h=1,2,\ldots q\) at the jth time point \(t_j\), where \(j=1,2,...,J\). Then our data consist of pJ pairs of \((s_{j},x_{gj})\) and of qJ pairs of \((t_{j},y_{hj})\). Let \(\pmb X_1,\ldots ,\pmb X_n\) and \(\pmb Y_1,\ldots ,\pmb Y_n\) be independent trajectories of random processes \(\pmb X\) and \(\pmb Y\) having the representation (8).
The coefficients \(\pmb {\alpha }_i\) and \(\pmb {\beta }_i\) are estimated by the least squares method. Let us denote these estimates by \(\pmb {a}_i\) and \(\pmb {b}_i\), \(i=1,2,\ldots ,n\).
Górecki and Smaga (2017) described a multivariate analysis of variance (MANOVA) for functional data. In the paper by Górecki et al. (2018), three basic methods of dimension reduction for multidimensional functional data are given: principal component analysis, canonical correlation analysis, and discriminant coordinates.
3 Alignment for multivariate functional data
3.1 The alignment between two kernel functions and two kernel matrices for multivariate functional data
Let \(\pmb {x}(s)\in \mathcal {L}_2^p(I_1)\), \(s\in I_1\), where \(\mathcal {L}_2^p(I_1)\) is a finitedimensional space of continuous squareintegrable vector functions over interval \(I_1\).
For a given subset \(\{ \pmb {x}_1(s),\ldots ,\pmb {x}_n(s) \}\) of \(\mathcal {L}_2^p(I_1)\) and the given kernel function \(k^{\star }\) on \(\mathcal {L}_2^p(I_1)\times \mathcal {L}_2^p(I_1)\), the matrix \(\pmb {K}^{\star }\) of size \(n\times n\), which has its (i, j)th element \(K_{ij}^{\star }(s)\), given by \(K_{ij}^{\star }(s)=k^{\star }(\pmb {x}_i(s),\pmb {x}_j(s))\), \(s\in I_1\), is called the kernel matrix of the kernel function \(k^{\star }\) with respect to the set \(\{ \pmb {x}_1(s),\ldots ,\pmb {x}_n(s) \}\), \(s\in I_1\).
Definition 11
We can define similarly the alignment between two kernel matrices \(\widetilde{\pmb {K}}^{\star }\) and \(\widetilde{\pmb {L}}^{\star }\) based on the subset \(\{ \pmb {x}_1(s),\ldots ,\pmb {x}_n(s) \}\), \(s\in I_1\), and \(\{ \pmb {y}_1(t),\ldots ,\pmb {y}_n(t) \}\), \(t\in I_2\), of \(\mathcal {L}_2^p(I_1)\) and \(\mathcal {L}_2^q(I_2)\), respectively.
Definition 12
3.2 Kernelbased independence test for multivariate functional data
Definition 13
Note also that the null hypothesis \(H_0:\pmb {X} \bot \pmb {Y}\) of independence of the random processes \(\pmb {X}\) and \(\pmb {Y}\) is equivalent to the null hypothesis \(H_0:\pmb {\alpha } \bot \pmb {\beta }\) of independence of random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) occurring in the representation (8) random processes \(\pmb {X}\) and \(\pmb {Y}\). We can therefore use the tests described in Section 2.4, replacing \(\pmb {x}\) and \(\pmb {y}\) by \(\pmb {a}\) and \(\pmb {b}\).
3.3 Canonical correlation analysis based on the alignment between kernel matrices for multivariate functional data
In order to maximize the coefficient of (18) we can use the result of Chang et al. (2013). Authors used a gradient descent algorithm, with modified gradient to ensure the unit length constraint is satisfied at each step (Edelman et al. 1998). Optimal stepsizes were found numerically using the NelderMead method. This article employs the Gaussian kernel exclusively while other kernels are available. The bandwidth parameter \(\lambda \) of the Gaussian kernel was chosen using the “median trick” (Song et al. 2010), i.e. the median Euclidean distance between all pairs of points.
As we mentioned earlier KTA is a normalized variant of HSIC. Hence, we can repeat the above reasoning for HSIC criterion. However, we should remember that both approaches are not equivalent and we can obtain different results.
4 Experiments

KTA—centered kernel target alignment,

HSIC—Hilbert–Schmidt Independence Criterion,

FCCA—classical functional canonical correlation analysis (Ramsay and Silverman 2005; Horváth and Kokoszka 2012),

HSIC.FCCA—functional canonical correlation analysis based on HSIC,

HSIC.KTA—functional canonical correlation analysis based on KTA.
4.1 Simulation
Average raw and functional HSIC and KTA coefficients for artificial time series (number in brackets means standard deviation)
\((X_t,Y_t)\)  \((X_t,Z_t)\)  \((Y_t,Z_t)\)  

Raw  
HSIC  0.795 (0.015)  0.672 (0.027)  0.825 (0.014) 
KTA  0.758 (0.019)  0.601 (0.028)  0.789 (0.019) 
Functional  
HSIC  0.986 (0.000)  0.984 (0.001)  0.988 (0.000) 
KTA  0.999 (0.000)  0.999 (0.000)  0.999 (0.000) 
Average p values from permutationbased tests for raw and functional variants of HSIC and KTA coefficients (number in brackets means standard deviation)
\((X_t,Y_t)\)  \((X_t,Z_t)\)  \((Y_t,Z_t)\)  

Raw  
HSIC  0.000 (0.000)  0.445 (0.290)  0.458 (0.282) 
KTA  0.000 (0.000)  0.445 (0.290)  0.458 (0.282) 
Functional  
HSIC  0.000 (0.000)  0.077 (0.129)  0.125 (0.169) 
KTA  0.000 (0.000)  0.077 (0.129)  0.077 (0.129) 
4.2 Univariate example
From the plots we can observe that the level of smoothness seems big enough. Additionally, we can observe some relationship between average temperature and precipitation. Namely, for weather stations with large average temperature, we observe relatively bigger average precipitation while for Arctic stations with lowest average temperatures we observe the smallest average precipitation. So we can expect some relationship between average temperature and average precipitation for Canadian weather stations.
In the next step, we calculated the values of described earlier coefficients, the values of which are presented in Fig. 8. We observe quite big values of HSIC and KTA, but it is impossible to infer dependency from these values. We see that the values of HSIC and KTA coefficients are stable (both do not depend on basis size).
4.3 Multivariate example
Countries used in analysis, 2008–2015
1  Albania (AL)  14  Greece (GR)  27  Poland (PL) 
2  Austria (AT)  15  Hungary (HU)  28  Portugal (PT) 
3  Belgium (BE)  16  Iceland (IS)  29  Romania (RO) 
4  Bosnia and Herzegovina (BA)  17  Ireland (IE)  30  Russian Federation (RU) 
5  Bulgaria (BG)  18  Italy (IT)  31  Serbia (XS) 
6  Croatia (HR)  19  Latvia (LV)  32  Slovak Republic (SK) 
7  Cyprus (CY)  20  Lithuania(LT)  33  Slovenia (SI) 
8  Czech Republic (CZ)  21  Luxembourg (LU)  34  Spain (ES) 
9  Denmark (DK)  22  Macedonia FYR (MK)  35  Sweden (SE) 
10  Estonia (EE)  23  Malta (MT)  36  Switzerland (CH) 
11  Finland (FI)  24  Montenegro (ME)  37  Ukraine (UA) 
12  France (FR)  25  Netherlands (NL)  38  United Kingdom (GB) 
13  Germany (DE)  26  Norway (NO) 
Pillars used in analysis, 2008–2015
Pillar  Number of variables  

G1  Institutions  16 
G2  Infrastructure  6 
G3  Macroeconomic environment  2 
G4  Health and primary education  7 
G5  Higher education and training  6 
G6  Goods market efficiency  10 
G7  Labor market efficiency  6 
G8  Financial market development  5 
G9  Technological readiness  4 
G10  Market size  4 
G11  Business sophistication  9 
G12  Innovation  5 
Functional HSIC coefficients
1  2  3  4  5  6  7  8  9  10  11  

2  0.9736  
3  0.9736  0.9737  
4  0.9736  0.9737  0.9737  
5  0.9708  0.9706  0.9706  0.9706  
6  0.9728  0.9727  0.9727  0.9727  0.9753  
7  0.9687  0.9683  0.9683  0.9683  0.9799  0.9780  
8  0.9730  0.9730  0.9730  0.9730  0.9725  0.9740  0.9721  
9  0.9736  0.9737  0.9737  0.9737  0.9706  0.9727  0.9683  0.9730  
10  0.9736  0.9737  0.9737  0.9737  0.9706  0.9727  0.9683  0.9730  0.9737  
11  0.9714  0.9711  0.9711  0.9711  0.9785  0.9755  0.9828  0.9726  0.9711  0.9711  
12  0.9688  0.9683  0.9683  0.9683  0.9778  0.9741  0.9897  0.9715  0.9783  0.9683  0.9830 
Functional KTA coefficients
1  2  3  4  5  6  7  8  9  10  11  

2  1.0000  
3  1.0000  1.0000  
4  1.0000  1.0000  1.0000  
5  0.9918  0.9916  0.9916  0.9916  
6  0.9980  0.9978  0.9978  0.9978  0.9951  
7  0.9741  0.9736  0.9736  0.9936  0.9801  0.9821  
8  0.9991  0.9990  0.9990  0.9990  0.9933  0.9989  0.9772  
9  1.0000  1.0000  1.0000  1.0000  0.9916  0.9978  0.9736  0.9990  
10  1.0000  1.0000  1.0000  1.0000  0.9916  0.9978  0.9736  0.9990  1.0000  
11  0.9927  0.9924  0.9924  0.9924  0.9947  0.9957  0.9833  0.9936  0.9924  0.9924  
12  0.9793  0.9788  0.9788  0.9788  0.9831  0.9834  0.9794  0.9917  0.9788  0.9788  0.9887 
We performed permutationbased tests for the HSIC and KTA coefficients discussed above. For most of tests, p values were close to zero, on the basis of which it can be inferred that there is some significant relationship between the groups (pillars) of variables. Table 7 contains the p values obtained for each test. We have exactly the same p values for both methods. Now, we can observe that some groups are independent (\(\alpha =0.05\)): G1 & G3, G3 & G6, G3 & G8, G3 & G11, G3 & G12, G4 & G9.
Functional HSIC & KTA p values permutationbased tests (only nonzero)
1  2  3  4  5  6  7  8  9  10  11  

2  0.0142  
3  0.0714  0.0332  
4  0.0042  0.0343  
5  0.0001  0.0268  
6  0.0157  0.0772  
7  0.0009  0.0061  
8  0.0294  0.0636  
9  0.0030  0.0055  0.0198  0.0640  0.0002  0.0003  0.0009  0.0040  
10  0.0059  0.0294  0.0021  0.0055  
11  0.0039  0.1034  0.0008  
12  0.0008  0.0563  0.0044 
Sorted areas under module weighting functions
No.  Area  Proportion (in %) 

First functional canonical variable (G5)  
1  5.008  51.74 
2  1.724  17.81 
3  1.567  16.19 
4  0.713  7.36 
5  0.351  3.63 
6  0.317  3.27 
First functional canonical variable (G6)  
1  5.187  44.77 
2  3.194  27.56 
3  1.287  11.11 
4  0.580  5.00 
5  0.511  4.41 
6  0.323  2.79 
7  0.206  1.77 
8  0.152  1.31 
9  0.091  0.78 
10  0.057  0.49 
During the numerical calculation process we used R software (R Core Team 2018) and packages fda (Ramsay et al. 2018) and hsicCCA (Chang 2013).
5 Conclusions
We proposed an extension of two dependency measures for two sets of variables for multivariate functional data. We proposed to use tests to examine the significance of results because the values of proposed coefficients are rather hard to interpret. Additionally, we presented the methods of constructing nonlinear canonical variables for multivariate functional data using HSIC and KTA coefficients. Tested on two real examples, the proposed method has proven useful in investigating the dependency between two sets of variables. Examples confirm usefulness of our approach in revealing the hidden structure of codependence between groups of variables.
During the study of proposed coefficients we discovered that the size of basis (smoothing parameter) is rather unimportant, the values (and p values for tests) do not depend on the basis size.
Of course, the performance of the methods needs to be further evaluated on additional real and artificial data sets. Moreover, we can examine the behavior of coefficients (and tests) for different bases like Bsplines or wavelets (when data are not periodic, the Fourier basis could fail). This constitutes the direction of our future research.
Notes
Acknowledgements
The authors are grateful to editor and two anonymous reviewers for giving many insightful and constructive comments and suggestions which led to the improvement of the earlier manuscript.
References
 Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRefGoogle Scholar
 Chang B (2013) hsicCCA: Canonical Correlation Analysis based on Kernel Independence Measures. R package version 1.0. https://CRAN.Rproject.org/package=hsicCCA
 Chang B, Kruger U, Kustra R, Zhang J (2013) Canonical correlation analysis based on hilbertschmidt independence criterion and centered kernel target alignment. In: Proceedings of the 30th international conference on machine learning, Atlanta, Georgia. JMLR: W and CP 28(2), 316–324Google Scholar
 Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13:795–828MathSciNetzbMATHGoogle Scholar
 Cristianini N, ShaweTaylor J, Elisseeff A, Kandola JS (2001) On kerneltarget alignment. In: NIPS2001, 367–373Google Scholar
 Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference 147:1–23MathSciNetCrossRefGoogle Scholar
 Devijver E (2017) Modelbased regression clustering for highdimensional data: application to functional data. Adv Data Anal Classif 11(2):243–279MathSciNetCrossRefGoogle Scholar
 Edelman A, Arias TA, Smith S (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353MathSciNetCrossRefGoogle Scholar
 Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric functional approach. Comput Stat Data Anal 44(1–2):161–173MathSciNetCrossRefGoogle Scholar
 Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, BerlinzbMATHGoogle Scholar
 Feuerverger A (1993) A consistent test for bivariate dependence. Int Stat Rev 61(3):419–433MathSciNetCrossRefGoogle Scholar
 Górecki T, Krzyśko M, Ratajczak W, Wołyński W (2016) An extension of the classical distance correlation coefficient for multivariate functional data with applications. Stat Transit 17(3):449–9466Google Scholar
 Górecki T, Krzyśko M, Wołyński W (2017) Correlation analysis for multivariate functional data. In: Palumbo F, Montanari A, Montanari M (eds) Data science. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 243–258Google Scholar
 Górecki T, Krzyśko M, Waszak Ł, Wołyński W (2018) Selected statistical methods of data analysis fir multivariate functional data. Stat Papers 59:153–182CrossRefGoogle Scholar
 Górecki T, Smaga Ł (2017) Multivariate analysis of variance for functional data. J Appl Stat 44:2172–2189MathSciNetCrossRefGoogle Scholar
 Gretton A., Bousquet O., Smola A., and Schölkopf B., (2005): Measuring statistical dependence with Hilbert–Schmidt norms. In: Jain S, Simon HU, Tomita E (eds) Algorithmic learning theory. Lecture notes in computer science 3734, 63–77. SpringerGoogle Scholar
 Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola AJ (2008) A kernel statistical test of independence. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems. Curran, Red Hook, pp 585–592Google Scholar
 Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220MathSciNetCrossRefGoogle Scholar
 Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, BerlinCrossRefGoogle Scholar
 Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377CrossRefGoogle Scholar
 Hsing T, Eubank R (2015) Theoretical foundations of functional data analysis, with an introduction to linear operators. Wiley, HobokenCrossRefGoogle Scholar
 James GM, Wang JW, Zhu J (2009) Functional linear regression that’s interpretable. Ann Stat 37(5):2083–2108MathSciNetCrossRefGoogle Scholar
 Kankainen A (1995) Consistent testing of total independence based on the empirical charecteristic function, Ph.D. thesis, University of JyväskyläGoogle Scholar
 MartinBaragan B, Lillo R, Romo J (2014) Interpretable support vector machines for functional data. Eur J Oper Res 232:146–155CrossRefGoogle Scholar
 Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc Lond Ser A 209:415–446CrossRefGoogle Scholar
 R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.Rproject.org/
 Ramsay JO, Dalzell CJ (1991) Some tools for functional data analysis (with discission). J R Stat Soc Ser B 53(3):539–572zbMATHGoogle Scholar
 Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer, New YorkzbMATHGoogle Scholar
 Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, BerlinzbMATHGoogle Scholar
 Ramsay JO, Wickham H, Graves S, Hooker G (2018) fda: Functional data analysis. R package version 2.4.8. https://CRAN.Rproject.org/package=fda
 Read T, Cressie N (1988) Goodnessoffit statistics for discrete multivariate analysis. Springer, BerlinCrossRefGoogle Scholar
 Riesz F (1909) Sur les opérations functionnelles linéaires. Comptes rendus hebdomadaires des séances de l’Académie des sciences 149:974–977zbMATHGoogle Scholar
 Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distancebased and RKHSbased statistics in hypothesis testing. Ann Stat 41(5):2263–2291MathSciNetCrossRefGoogle Scholar
 Schölkopf B, Smola AJ, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRefGoogle Scholar
 ShaweTaylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefGoogle Scholar
 Song L, Boots B, Siddiqi S, Gordon G, Somla A (2010) Hilbert space embeddings of hidden Markov models. In: Proceedings of the 26th international conference on machine learning (ICML2010)Google Scholar
 Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRefGoogle Scholar
 Székely GJ, Rizzo ML (2009) Brownian distance covariance. Ann Appl Stat 3(4):1236–1265MathSciNetCrossRefGoogle Scholar
 Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192CrossRefGoogle Scholar
 Zhang K, Peters J, Janzing D, Schölkopf B (2011) Kernelbased conditional independence test and application in causal discovery. In: Cozman FG, Pfeffer A (eds) Proceedings of the 27th conference on uncertainty in artificial intelligence, AUAI Press, Corvallis, OR, USA, 804–813Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.