Sparse and smooth canonical correlation analysis through rank1 matrix approximation
 1.5k Downloads
Abstract
Canonical correlation analysis (CCA) is a wellknown technique used to characterize the relationship between two sets of multidimensional variables by finding linear combinations of variables with maximal correlation. Sparse CCA and smooth or regularized CCA are two widely used variants of CCA because of the improved interpretability of the former and the better performance of the later. So far, the crossmatrix product of the two sets of multidimensional variables has been widely used for the derivation of these variants. In this paper, two new algorithms for sparse CCA and smooth CCA are proposed. These algorithms differ from the existing ones in their derivation which is based on penalized rank1 matrix approximation and the orthogonal projectors onto the space spanned by the two sets of multidimensional variables instead of the simple crossmatrix product. The performance and effectiveness of the proposed algorithms are tested on simulated experiments. On these results, it can be observed that they outperform the state of the art sparse CCA algorithms.
Keywords
Canonical correlation analysis Sparse representation Rank1 matrix approximation1 Introduction
Canonical correlation analysis (CCA) [1] is a multivariate analysis method, the aim of which is to identify and quantify the association between two sets of variables. The two sets of variables can be associated with a pair of linear transforms (projectors) such that the correlation between the projections of the variables in lower dimensional space through these linear transforms are mutually maximized. The pair of canonical projectors are easily obtained by solving a simple generalized eigenvalue decomposition problem, which only involves the covariance and crosscovariance matrices of the considered random vectors. CCA has been widely applied in many important fields, for instance, facial expression recognition [2, 3], detection of neural activity in functional magnetic resonance imaging (fMRI) [4, 5], machine learning [6, 7] and blind source separation [8, 9].
In the context of highdimensional data, there is usually a large portion of features that is not informative in data analysis. When the canonical variables involve all features in the original space, the canonical projectors are, in general, not sparse. Therefore, it is not easy to interpret canonical variables in such highdimensional data analysis. These problems may be tackled by selecting sparse subsets of variables, i.e. obtaining sparse canonical projectors in the linear combinations of variables of each data set [7, 10, 11, 12]. For example, in [11], the authors propose a new criterion for sparse CCA and applied a penalized matrix decomposition approach to solve the sparse CCA problem, and in [10], the presented sparse CCA approach computes the canonical projectors from primal and dual representations.
In this paper, we adopt an alternative formulation of the CCA problem which is based on rank1 matrix approximation of the orthogonal projectors of data sets [13]. Based on this new formulation of the CCA problem, we developed a new sparse CCA based on penalized rank1 matrix approximation which aims to overcome the drawback of CCA in the context of highdimensional data and improved interpretability. The proposed sparse CCA seeks to obtain iteratively a sparse pair of canonical projectors by solving a penalized rank1 matrix approximation via a sparse coding method. Also, we present in this paper a smoothed version of the CCA problem based on rank1 matrix approximation where we impose some smoothness on the projections of the variables in order to avoid abrupt or sudden variations. These proposed algorithms differ from the existing ones in their derivation which is based on penalized rank1 matrix approximation and the orthogonal projectors onto the space spanned by the two sets of multidimensional variables instead of the simple crossmatrix product [7, 10, 11, 12].
The rest of the paper is organized as follows: In Section 2, we give a brief review of the CCA problem. In Section 3, we present a formulation of CCA using a rank1 matrix approximation of the orthogonal projectors of data sets and derive the smoothed solution. In Section 4, we introduce our new sparse CCA algorithm. In Section 5, we present some simulation results to demonstrate the effectiveness of the proposed method compared to state of the art CCA algorithms. Finally, Section 6 concludes the paper.
Henceforth, bold lower cases denote realvalued vectors and bold upper cases denote realvalued matrices. The transpose of a given matrix A is denoted by A ^{ T }. All vectors will be column vectors unless transposed. Throughout the paper, I _{ n } stands for n×n identity matrix, 0 stands for the null vector and 1 _{ n } is the (column) vector of \(\mathbb {R}^{n}\) with one entries only. For a vector x, the notation x _{ i } will stand for the i ^{ t h } component of x. As usual, for any integer m, ⟦1,m ⟧ stands for {1,2,…m}.
2 Canonical correlation analysis
It has been shown in [6] that we can choose the associated eigenvectors corresponding to the top eigenvalues of the generalized eigenvalue problem in (9) and then use (8) for find the corresponding w _{ y }. A number of existing methods for sparse and smooth CCA have used the description provided above of CCA and focused on the use of the cross matrix C _{ xy } for the derivation of new CCA variant algorithms [7, 10, 11, 12]. For the derivation of the proposed CCA variants, we adopt an alternative description of CCA which is based on the orthogonal projectors onto the space spanned by the two sets of multidimensional variables [13].
3 Canonical correlation analysis based on rank1 matrix approximation
where u _{ i } and v _{ i } are the i ^{th} column vectors of the matrices U and V, respectively, and D=diag(d _{1},…,d _{ N }) such that d _{1}≥d _{2}≥…≥d _{ N } represent the singular values of K _{ xy } and K _{ yx }. We can deduce from Eqs. (16), (17) and (15) that the left singular vectors of K _{ yx } correspond to the right singular vectors of K _{ xy }.
Hence, for multiple projected data, the solution consist of the associated singular vectors corresponding to the top singular values of the matrix K _{ xy }.
From (18),we can observe that the optimization problem (10) that involves the two constraints \(\Vert \boldsymbol {w}^{T}_{x}\boldsymbol {X}\Vert _{2}=1\) and \(\Vert \boldsymbol {w}^{T}_{y}\boldsymbol {Y}\Vert _{2}=1\) has now been transformed into a rank1 matrix approximation problem free of constraints and which can be solved with an SVD. With this approach, the proposed algorithm avoids the need of using these constraints and hence also avoids their relaxations as it was proposed in [11].
One disadvantage of the above approach is the restriction that X X ^{ T } and Y Y ^{ T } must be nonsingular. In order to prevent overfitting and avoid the singularity of X X ^{ T } and Y Y ^{ T } [6], two regularization terms, \(\phantom {\dot {i}\!}\gamma _{x}\boldsymbol {I}_{d_{x}}\) and \(\gamma _{y}\boldsymbol {I}_{d_{y}}\phantom {\dot {i}\!}\), with γ _{ x }>0, γ _{ y }>0 are added in (10). Therefore, the regularized version solves the generalized eigenvalue problem with \(\boldsymbol {P}_{x}=\boldsymbol {X}^{T}(\boldsymbol {X}\boldsymbol {X}^{T}+\gamma _{x}\boldsymbol {I}_{d_{x}})^{1}\boldsymbol {X}\) and \(\boldsymbol {P}_{y}=\boldsymbol {Y}^{T}(\boldsymbol {Y}\boldsymbol {Y}^{T}+\gamma _{y}\boldsymbol {I}_{d_{y}})^{1}\boldsymbol {Y}\). We summarized the method of solving the entire rank1 matrix approximation CCA in Algorithm 1.
3.1 Smoothed rank1 matrix approximation CCA algorithm
As presented in Section 3, the singular vectors u _{1} and v _{1} represent the projected data X ^{ T } w _{ x } and Y ^{ T } w _{ y }, respectively. Then, by using the unitary property of matrices U and V, we can compute the singular value associated to the singular vectors u _{1} and v _{1} by \(d_{1} = \boldsymbol {u}_{1}^{T}\boldsymbol {K}_{xy}\boldsymbol {v}_{1}\). Therefore, we propose to use a deflation procedure where the second pair of canonical projectors are defined by using the corresponding residual matrix \(\boldsymbol {K}_{xy}\boldsymbol {w}_{x}^{T}\boldsymbol {X} \boldsymbol {K}_{xy}\boldsymbol {Y}^{T}\boldsymbol {w}_{y} \boldsymbol {X}^{T}\boldsymbol {w}_{x} \boldsymbol {w}_{y}^{T}\boldsymbol {Y} \). Then, we can define the other pair of projectors. The method for solving the smoothed rank1 matrix approximation CCA is summarized by Algorithm 2.
We can observe from Figs. 1, 2 and 3 that the proposed smoothed CCA algorithm have recovered both the temporal signal and spatial maps with better accuracy than CCA for the three presented cases S1, S2 and S3. This demonstrates the effectiveness of the proposed smoothed CCA approach in regularization when the estimated signals are believed to be continuous and smooth.
4 Sparse CCA algorithm based on rank1 matrix approximation
In this section, we will propose the sparse CCA method based on rank1 matrix approximation by penalizing the optimization problem (18). Then, we propose an efficient iterative algorithm to solve the sparse solution of the proposed criterion.
where \(\mathcal {F}_{x}(\cdot)\) and \(\mathcal {F}_{y}(\cdot)\) are penalty functions, which can take on a variety of forms. Useful examples are ℓ _{0}quasinorm \(\mathcal {F}(\boldsymbol {z}) = \Vert \boldsymbol {z} \Vert _{0}\) which count the nonzero entries of a vector; Lasso penalty with ℓ _{1}norm \(\mathcal {F}(\boldsymbol {z}) = \Vert \boldsymbol {z} \Vert _{1}\) and so on.
The optimization problem (22) can be alternatively solved by optimizing w _{ x } and w _{ y }. Specifically, we first fix w _{ y } and solve for w _{ x } by minimizing (22). Then, we fix w _{ x } and minimize (22) to obtain w _{ y }. The above two procedures are repeated until convergence.
The uncorrelated entries of the projected vector is obtained due to the orthogonality of the canonical components. The orthogonality among these components is lost due to the constraints added to the cost (18), a nice property enjoyed by standard CCA. Several other CCA procedures lose this property as well; this is just the price to pay for using the other constraints (sparsity or smoothness).
Then, we summarized the method of solving the entire sparse rank1 matrix approximation CCA in Algorithm 3
In terms of difference between the proposed approach to achieve sparse CCA and the method proposed in [11]; the method proposed in [11] uses a penalized matrix decomposition on the crossproduct matrix X Y ^{ T }, whereas our proposed approach is based on a rank1 matrix approximation of K _{ xy } as defined in (18). Furthermore, the method proposed in [11] makes the assumption that X X ^{ T } and Y Y ^{ T } are identities to replace the constraints \(\boldsymbol {w}^{T}_{x}\boldsymbol {X}\boldsymbol {X}^{T}\boldsymbol {w}_{x}\leq 1\) and \(\boldsymbol {w}^{T}_{y}\boldsymbol {Y}\boldsymbol {Y}^{T}\boldsymbol {w}_{y}\leq 1\) by \(\Vert \boldsymbol {w}_{x}\Vert _{2}^{2}\leq 1\) and \(\Vert \boldsymbol {w}_{y}\Vert _{2}^{2}\leq 1\) (Eqs. (4.2) and (4.3) of [11]). This assumption is relaxed in the proposed sparse CCA algorithm presented in Section 4. This is obtained by directly including these constraints \(\boldsymbol {w}^{T}_{x}\boldsymbol {X}\boldsymbol {X}^{T}\boldsymbol {w}_{x}= 1\) and \(\boldsymbol {w}^{T}_{y}\boldsymbol {Y}\boldsymbol {Y}^{T}\boldsymbol {w}_{y}= 1\) in the derivation of the matrix K _{ xy } used in the penalized rank1 matrix approximation via Eq. (3).
The same argument is valid for [7] and [12] as both these papers are based on the crossproduct matrix X Y ^{ T }; furthermore, their approaches used for regularization is similar to the one described in Algorithm 1 and therefore different from the regularization adopted in this paper given in Algorithm 2.
5 Experiments

The sparse CCA presented in [11], relying on a penalized matrix decomposition denoted PMD. An R package implementing this algorithm, called PMA, is available at http://cran.rproject.org/web/packages/PMA/index.html. Sparsity parameters are selected using the permutation approach presented in [18] of which the code is provided in PMA package.

The sparse CCA presented in [7] where the CCA is reformulated as a leastsquares problem denoted LS CCA. A Matlab package implementing this algorithm is available at http://www.public.asu.edu/~jye02/Software/CCA/.

The sparse CCA presented in [12] where the sparse canonical projectors are computed by solving two ℓ _{1}minimization problems by using the Linearized Bregman iterative method [19]. This algorithm is denoted CCA LB (Linearized Bregman). We reimplemented the sparse CCA algorithm proposed in [12] using Matlab.
For the proposed sparse CCA algorithm, we have used \(\mathcal {F}_{x}(\boldsymbol {z})=\mathcal {F}_{y}(\boldsymbol {z})=\Vert \boldsymbol {z}\Vert _{0}\) as penalty functions. We solve the sparse coding problem by using orthogonal matching pursuit (OMP) algorithm [20, 21]. For proposed smoothed CCA algorithm, we chose Ω _{ x }=Ω _{ y } and given by Eq. (20).
5.1 Synthetic data
Simulation settings
Parameters  d _{ x }  d _{ y }  r  N  C _{ xx }  C _{ yy }  C _{ xy } 

Scenario 1  4  4  3  {50, 100, 200}  I _{4}  I _{4}  \(\left [\begin {array}{llll} \frac {9}{10} & 0 & 0 & 0 \\ 0 & \frac {1}{2} & 0 & 0 \\ 0 & 0 & \frac {1}{3} & 0 \\ 0 & 0 & 0 & 0 \end {array}\right ]\) 
Scenario 2  4  6  2  {50, 100, 200}  I _{4}  I _{6}  \(\left [\begin {array}{llllllll} \frac {3}{5} & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac {1}{2} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end {array}\right ]\) 
Scenario 3  4  6  2  {50, 100, 200}  I _{4}  I _{6}  \(\left [\begin {array}{llllllllll} \frac {2}{5} & \frac {4}{25} & 0 & 0 & 0 & 0 \\ \frac {4}{25} & \frac {2}{5} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end {array}\right ] \) 
Scenario 4  6  10  2  {50, 100, 200}  I _{6}  \(\left [\begin {array}{lllllllll} \boldsymbol {M} & \boldsymbol {0} \\ \boldsymbol {0} & \boldsymbol {I}_{7} \end {array}\right ]\)  \(\frac {1}{2}\left [\begin {array}{llllll} \boldsymbol {I}_{2} & {\boldsymbol 0} \\ \boldsymbol {0} & \boldsymbol {0} \end {array}\right ]\) 
with M(i,j)=0.3^{i−j}  
Scenario 5  20  20  10  {50, 100, 200}  I _{20}  I _{20}  \(\frac {7}{10}\left [\begin {array}{llllll} \boldsymbol {I}_{10} & \boldsymbol {0} \\ \boldsymbol {0} & \boldsymbol {0} \end {array}\right ]\) 
Scenario 6  20  20  10  {50, 100, 200}  I _{20}  I _{20}  \(\left [\begin {array}{lllllll} \boldsymbol {S}_{10} & \boldsymbol {0} \\ \boldsymbol {0} & \boldsymbol {0} \end {array}\right ]\) 
with S _{10}(i,j)=0.4^{i−j+1} 
Simulation results part 1
θ _{ x } (rad)  θ _{ y } (rad)  θ _{ x } (rad)  θ _{ y } (rad)  θ _{ x } (rad)  θ _{ y } (rad)  

Method  N=50  N=100  N=200  
Scenario 1:  CCA  0.5395  0.5033  0.3468  0.3475  0.2273  0.2388 
LS CCA  0.4161  0.3697  0.2649  0.2650  0.1784  0.1872  
CCA LB  0.5172  0.5151  0.3310  0.3341  0.2250  0.2228  
PMD  0.2203  0.2420  0.0908  0.0506  0.0207  0.0175  
Algorithm 2  0.5074  0.5189  0.3123  0.3140  0.2225  0.2202  
Algorithm 3  0.2011  0.2191  0.0491  0.0273  0.0044  0.0057  
Scenario 2:  CCA  0.5091  0.6682  0.3108  0.4123  0.2089  0.2771 
LS CCA  0.3481  0.5083  0.2285  0.3247  0.1605  0.2182  
CCA LB  0.3000  0.3761  0.0227  0.0228  0.0008  0.0009  
PMD  0.2061  0.3068  0.0230  0.0706  0.0043  0.0443  
Algorithm 2  0.5064  0.6462  0.3062  0.4111  0.2061  0.2792  
Algorithm 3  0.1162  0.1508  0.0012  0.0015  0.0001  0.0001  
Scenario 3:  CCA  0.8699  1.0281  0.6800  0.8254  0.4823  0.6009 
LS CCA  0.6398  0.8314  0.4608  0.6139  0.3116  0.4348  
CCA LB  0.8681  1.0285  0.6575  0.8122  0.3859  0.4938  
PMD  0.7690  0.9080  0.5382  0.6736  0.2736  0.4811  
Algorithm 2  0.8465  0.9876  0.6654  0.8078  0.4345  0.5839  
Algorithm 3  0.3424  0.4571  0.0393  0.0628  0.0001  0.0016 
Simulation results part 2
θ _{ x } (rad)  θ _{ y } (rad)  θ _{ x } (rad)  θ _{ y } (rad)  θ _{ x } (rad)  θ _{ y } (rad)  

Method  N=50  N=100  N=200  
Scenario 4:  CCA  0.8125  0.9956  0.5603  0.6678  0.3390  0.4484 
LS CCA  0.5275  0.7305  0.3553  0.4711  0.2412  0.3449  
CCA LB  0.7603  0.9209  0.2785  0.5163  0.0149  0.3152  
PMD  0.6111  0.8273  0.2031  0.4616  0.0397  0.3373  
Algorithm 2  0.8829  0.9938  0.5288  0.6735  0.3295  0.4447  
Algorithm 3  0.3990  0.6856  0.0173  0.3237  0.0001  0.3035  
Scenario 5:  CCA  1.3798  1.3764  0.8879  0.8744  0.4700  0.4722 
LS CCA  0.8538  0.8298  0.5231  0.5187  0.3373  0.3378  
CCA LB  1.3681  1.3659  0.7264  0.7347  0.0478  0.0417  
PMD  1.3972  1.3542  1.1316  1.0342  0.4082  0.3820  
Algorithm 2  1.3627  1.3655  0.7413  0.8096  0.4407  0.4605  
Algorithm 3  1.1185  1.0986  0.0275  0.0271  0.0001  0.0001  
Scenario 6:  CCA  1.4853  1.4854  1.4624  1.4633  1.4249  1.4199 
LS CCA  1.4589  1.4578  1.3797  1.3838  1.1954  1.1951  
CCA LB  1.4862  1.4851  1.4684  1.4740  1.4830  1.4793  
PMD  1.5244  1.5130  1.4985  1.4954  1.4553  1.4551  
Algorithm 2  1.4794  1.4791  1.4512  1.4509  1.3869  1.3790  
Algorithm 3  1.4633  1.4628  0.7775  0.7885  0.0220  0.0221 
We can observe that the simulation accuracy of the proposed sparse CCA method is significantly better compared to other CCA methods. In the case of low number of observations, the proposed sparse CCA method is still doing well and where the performance gain increases with increasing number of observations. This demonstrates the robustness of our sparse CCA method with respect to the number of available observations and the benefit of using our sparse CCA method in the context of a relatively low number of observations
5.2 Blind channel identification for SIMO systems
Blind channel identification is a fundamental signal processing technology aimed at retrieving a system’s unknown information from its outputs only. Estimation of sparse long channels (i.e. channels with small number of nonzero coefficients but a large span of delays) is considered in this simulation. Such sparse channels are encountered in many communication applications: highdefinition television (HDTV) [23], underwater acoustic communications [24] and wireless communications [25, 26]. The problem addressed in this section is to determine the sparse impulse response of a SIMO system in a blind way, i.e. only the observed system outputs are available and used without assuming knowledge of the specific input signal.
where ∗ denotes linear convolution, η(t)=[η _{1}(t),η _{2}(t)]^{ T } is an additive spatial white Gaussian noise, i.e. \(\mathbb {E}[\boldsymbol {\eta }(t)\boldsymbol {\eta }(t)^{T}]=\sigma ^{2} \boldsymbol {I}_{2}\), and \(\boldsymbol {h} = [\boldsymbol {h}_{1}^{T} \boldsymbol {h}_{2}^{T}]^{T}\) with h _{ i }=[h _{ i }(0),…,h _{ i }(L)]^{ T } (i=1,2) denotes the impulse response vector of the i ^{ t h } channel. Given a finite set of observation of length T, the objective in this experience is to estimate the channel coefficients vector h. The identification method presented by Xu et al. in [27] which is closely related to linear prediction exploits the commutativity of the convolution. Based on this approach and inspired from [28], we present in the following an experience to asses the performance of blind channel identification methods based on CCA.
3GPP extended typical urban channel model [29]
Excess tap delay (ns)  0  50  120  200  230  500  1600  2300  5000 

Relative power (dB)  −1.0  −1.0  −1.0  0.0  0.0  0.0  −3.0  −5.0  −7.0 
5.3 Blind source separation for fMRI signals
Performance comparison in terms of correlation with the reference Fig. 8 (a)
Algorithms  CCA  LS CCA  CCA LB  PMD  Algo 2 (f)  Algo 2 (g)  Algo 3 (h)  Algo 3 (i) 

Correlation  0.9438  0.9054  0.9764  0.9235  0.9822  0.9852  0.9953  0.9995 
To use CCA, either a second data set obtained from a different subject is used or the second data set is obtained from the original data Y by time delay [31]. This last option is used in this application example. Instead of taking N as the total number of voxels, only the cortical, subcortical and cerebellum regions in the brain obtained by parcellating the whole brain into 116 ROIs using automated anatomical labelling [32] were considered. For each considered region, the average time series was generated and used.
The single subject (id 100307) rsfMRI dataset used in this section was obtained from the Human Connectome Project Q1 release [33]. The acquisition parameters of rsfMRI data are 90 × 104 matrix, 220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290Hz/Px, inplane FOV = 208 × 180 mm with 2.0 mm isotropic voxels. The obtained data was already preprocessed with the preprocessing pipeline consisting of motion correction, temporal prewhitening, slice time correction and global drift removal, and the scans were spatially normalized to a standard MNI152 template and were resampled to 2 mm × 2 mm × 2 mm voxels. The reader is referred to [33, 34] for more details regarding data acquisition and preprocessing.
The second data set obtained by a single sample delay was used for CCA. The different CCA algorithms were applied on Y and Y _{ t−1} of dimension n×N to allow us to generate canonical correlation components representing maximally correlated temporal profile. The neural dynamics of interest can be obtained by correlating the modulation profile of the canonical correlation components with the time series representing average neural dynamics for regions of interest (ROIs). For functional connectivity analysis of the default mode network (DMN), the modulation profile that was most correlated with posterior cingulate cortex (PCC) representative time series is used. Using the neural dynamics of interest, sparsely distributed and clustered origin of the dynamics are obtained by converting the associated coefficient rows to zscores.
Using the different CCA variant algorithms, the connected regions obtained for DMN are mostly PCC, medial prefrontal cortex (MFC) and right inferior parietal lobe (IPL). As there is no gold standard reference for DMN connectivity available, therefore, we relied on the similarity of temporal dynamics of DMNbased modulation profile with PCC representative time series. The similarity measure used was correlation and estimated as >0.9 for all the algorithms.
6 Conclusions
In this paper, we have developed two new variants of CCA; more specifically, we have introduced new algorithms for sparse and smooth CCA. The proposed algorithms are based on penalized rank1 matrix approximation and differ from the existing ones in the matrices they use for their derivation. Indeed, instead of focusing on the crossmatrix product of the two sets of multidimensional variables, we have used the product of the orthogonal projectors onto the space spanned by the columns of the two sets of multidimensional variables. Using this approach, the sparse and smooth CCA algorithms proposed differ only in the penalty used in the penalized rank1 matrix approximation. Simulation results illustrating the effectiveness of the proposed CCA variant algorithms are provided where we can observe that proposed sparse CCA outperforms state of the art methods. As a continuation of the presented work and in order to fix the tuning parameters of the proposed approaches, the main idea of the permutation method presented in [18] will be studied and adapted.
7 Endnotes
^{1} Let A and B be two matrices. In order to compute the angle θ between the subspaces spanned by the columns of A and B; first, we compute an orthonormal basis A _{⊥} and B _{⊥} for the range of A and B respectively. θ is computed by \(\theta =\arccos (\min (\boldsymbol {A}_{\perp }^{T}\boldsymbol {B}_{\perp }))\).
Notes
Funding
No funding was received or used to prepare this manuscript.
Authors’ contributions
All authors contributed equally to this work. All authors discussed the results and implications and commented on the manuscript at all stages. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
References
 1.H Hotelling, Relations between two sets of variables. Biometrika. 28(3–4), 321–377 (1936).CrossRefMATHGoogle Scholar
 2.W Zheng, X Zhou, C Zou, L Zhao, Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. Neural Netw. 17(1), 233–238 (2006).CrossRefGoogle Scholar
 3.XY Jing, S Li, C Lan, D Zhang, J Yang, Q Liu, Color image canonical correlation analysis for face feature extraction and recognition. Signal Process. 91(8), 2132–2140 (2011).CrossRefMATHGoogle Scholar
 4.O Friman, J Carlsson, P Lundberg, M Borga, H Knutsson, Detection of neural activity in functional MRI using canonical correlation analysis. Magn. Reson. Med. 45(2), 323–330 (2001).CrossRefGoogle Scholar
 5.DR Hardoon, J MouraoMiranda, M Brammer, J ShaweTaylor, Unsupervised analysis of fMRI data using kernel canonical correlation. NeuroImage. 37(4), 1250–1259 (2007).CrossRefGoogle Scholar
 6.DR Hardoon, S Szedmak, J ShaweTaylor, Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004).CrossRefMATHGoogle Scholar
 7.L Sun, S Ji, J Ye, Canonical correlation analysis for multilabel classification: a leastsquares formulation, extensions, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 194–200 (2011).CrossRefGoogle Scholar
 8.W Liu, DP Mandic, A Cichocki, Analysis and online realization of the CCA approach for blind source separation. IEEE Trans. Neural Netw. 18(5), 1505–1510 (2007).CrossRefGoogle Scholar
 9.YO Li, T Adali, W Wang, VD Calhoun, Joint blind source separation by multiset canonical correlation analysis. IEEE Trans. Signal Process. 57(10), 3918–3929 (2009).MathSciNetCrossRefGoogle Scholar
 10.DR Hardoon, J ShaweTaylor, Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011). doi:10.1007/s1099401052227.MathSciNetCrossRefMATHGoogle Scholar
 11.DM Witten, R Tibshirani, T Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 10(3), 515–534 (2009). doi:10.1093/biostatistics/kxp008.CrossRefGoogle Scholar
 12.D Chu, LZ Liao, MK Ng, X Zhang, Sparse canonical correlation analysis: new formulation and algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 3050–3065 (2013).CrossRefGoogle Scholar
 13.KV Mardia, JT Kent, JM Bibby, Multivariate Analysis. Probability and mathematical statistics, 1st edn. (Academic Press, University of Leeds, Leeds, 1979).MATHGoogle Scholar
 14.AN Tikhonov, On the stability of inverse problems. Doklady Akademii nauk SSSR. 39(5), 195–198 (1943).MathSciNetGoogle Scholar
 15.JO Ramsay, BW Silverman, Functional Data Analysis, 2nd edn. (SprinverVerlag, New York, 2005).CrossRefMATHGoogle Scholar
 16.K Lee, SK Tak, JC Yee, A data driven sparse GLM for fMRI analysis using sparse dictionary learning and MDL criterion. IEEE Trans. Med. Imaging. 30:, 1176–1089 (2011).CrossRefGoogle Scholar
 17.A AïssaElBey, AK Seghouane, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Sparse canonical correlation analysis based on rank1 matrix approximation and its application for FMRI signals, (2016), pp. 4678–4682. doi:10.1109/ICASSP.2016.7472564.
 18.S Gross, B Narasimhan, R Tibshirani, D Witten, Correlate: sparse canonical correlation analysis for the integrative analysis of genomic data. Technical Report User guide and technical document, Stanford University (2011).Google Scholar
 19.JF Cai, S Osher, Z Shen, Convergence of the linearized bregman iteration for ℓ _{1}norm minimization. Technical Report CAM Report 08–52, University of California Los Angeles (2008).Google Scholar
 20.YC Pati, R Rezaiifar, PS Krishnaprasad, 1. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers (Pacific Grove, 1993), pp. 40–44. doi:10.1109/ACSSC.1993.342465.
 21.G Davis, S Mallat, M Avellaneda, Adaptive greedy approximations. Constr. Approximation. 13(1), 57–98 (1997). doi:10.1007/BF02678430.MathSciNetCrossRefMATHGoogle Scholar
 22.JA Branco, C Croux, P Filzmoser, MR Oliveira, Robust canonical correlations: a comparative study. Comput. Stat. 20(2), 203–229 (2005). doi:10.1007/BF02789700.MathSciNetCrossRefMATHGoogle Scholar
 23.W Schreiber, Advanced television systems for terrestrial broadcasting: some problems and some proposed solutions. Proc. IEEE. 83(6), 958–981 (1995).CrossRefGoogle Scholar
 24.M Kocic, D Brady, M Stojanovic, in Proc. OCEANS, 3. Sparse equalization for realtime digital underwater acoustic communications (San Diego, 1995), pp. 1417–1422.Google Scholar
 25.L PerrosMeilhac, E Moulines, K AbedMeraim, P Chevalier, P Duhamel, Blind identification of multipath channels: a parametric subspace approach. IEEE Trans. Signal Process. 49(7), 1468–1480 (2001).Google Scholar
 26.S Ariyavisitakul, N Sollenberger, L Greenstein, Tap selectable decisionfeedback equalization. IEEE Trans. Commun. 45(12), 1497–1500 (1997).CrossRefGoogle Scholar
 27.G Xu, H Liu, L Tong, T Kailath, A leastsquares approach to blind channel identification. IEEE Trans. Signal Process. 43(12), 2982–2993 (1995).CrossRefGoogle Scholar
 28.S Van Vaerenbergh, J Via, I Santamaria, Blind identification of SIMO Wiener systems based on kernel canonical correlation analysis. IEEE Trans. Signal Process. 61(9), 2219–2230 (2013).MathSciNetCrossRefGoogle Scholar
 29.3GPP TS 36.104, Evolved Universal Terrestrial Radio Access (EUTRA); Base Station (BS) Radio Transmission and Reception (2015). 3GPP TS 36.104. www.3gpp.org/dynareport/36104.htm.
 30.NA Lazar, Statistics for Biology and Health. The Statistical Analysis of Functional MRI Data, 1st edn. (Springer, New York, 2008).Google Scholar
 31.MU Khaled, AK Seghouane. Improving functional connectivity detection in FMRI by combining sparse dictionary learning and canonical correlation analysis, 10th IEEE International Symposium on Biomedical Imaging (San Francisco, 2013), pp. 286–289. doi:10.1109/ISBI.2013.6556468.
 32.N TzourioMazoyer, B Landeau, D Papathanassiou, F Crivello, O Etard, N Delcroix, B Mazoyer, M Joliot, Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the mni mri singlesubject brain. NeuroImage. 15:, 273–289 (2002).Google Scholar
 33.DM Barch, GC Burgess, MP Harms, SE Petersen, BL Schlaggar, M Corbetta, MF Glasser, S Curtiss, S Dixit, C Feldt, D Nolan, E Bryant, T Hartley, O Footer, JM Bjork, R Poldrack, S Smith, H JohansenBerg, AZ Snyder, DCV Essen, Function in the human connectome: taskfMRI and individual differences in behavior. NeuroImage. 80:, 169–189 (2013).CrossRefGoogle Scholar
 34.MF Glasser, SN Sotiropoulos, JA Wilson, TS Coalson, B Fischl, JL Andersson, J Xu, S Jbabdi, M Webster, JR Polimeni, DCV Essen, M Jenkinson, The minimal preprocessing pipelines for the human connectome project. NeuroImage. 80:, 105–124 (2013).CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.