Learning CCA Representations for Misaligned Data

Sahbi, Hichem

doi:10.1007/978-3-030-11018-5_39

Hichem Sahbi¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11132))

Included in the following conference series:

European Conference on Computer Vision

1251 Accesses
3 Citations

Abstract

Canonical correlation analysis (CCA) is a statistical learning method that seeks to build view-independent latent representations from multi-view data. This method has been successfully applied to several pattern analysis tasks such as image-to-text mapping and view-invariant object/action recognition. However, this success is highly dependent on the quality of data pairing (i.e., alignments) and mispairing adversely affects the generalization ability of the learned CCA representations.

In this paper, we address the issue of alignment errors using a new variant of canonical correlation analysis referred to as alignment-agnostic (AA) CCA. Starting from erroneously paired data taken from different views, this CCA finds transformation matrices by optimizing a constrained maximization problem that mixes a data correlation term with context regularization; the particular design of these two terms mitigates the effect of alignment errors when learning the CCA transformations. Experiments conducted on multi-view tasks, including multi-temporal satellite image change detection, show that our AA CCA method is highly effective and resilient to mispairing errors.

You have full access to this open access chapter, Download conference paper PDF

Robust generalized canonical correlation analysis

Article 17 May 2023

Robust deflated canonical correlation analysis via feature factoring for multi-view image classification

Article 12 April 2021

Trace ratio criterion for multi-view discriminant analysis

Article 03 June 2022

Keywords

1 Introduction

Several tasks in computer vision and neighboring fields require labeled datasets in order to build effective statistical learning models. It is widely agreed that the accuracy of these models relies substantially on the availability of large labeled training sets. These sets require a tremendous human annotation effort and are thereby very expensive for many large scale classification problems including image/video-to-text (a.k.a captioning) [1,2,3,4], multi-modal information retrieval [5], multi-temporal change detection [6, 7], object recognition and segmentation [8, 9], etc. The current trend in machine learning, mainly with the data-hungry deep models [1, 2, 10,11,12], is to bypass supervision, by making the training of these models totally unsupervised [13], or at least weakly-supervised using: fine-tuning [14], self-supervision [15], data augmentation and game-based models [16]. However, the hardness of collecting annotated datasets does not only stem from assigning accurate labels to these data, but also from aligning them; for instance, in the neighboring field of machine translation, successful training models require accurately aligned bi-texts (parallel bilingual training sets), while in satellite image change detection, these models require accurately georeferenced and registered satellite images. This level of requirement, both on the accuracy of labels and their alignments, is clearly hard-to-reach; alternative models, that skip the sticky alignment requirement, should be preferred.

Canonical correlation analysis (CCA) [17,18,19,20] is one of the statistical learning models that require accurately aligned (paired) multi-view data^{Footnote 1}; CCA finds – for each view – a transformation matrix that maps data from that view to a view-independent (latent) representation such that aligned data obtain highly correlated latent representations. Several extensions of CCA have been introduced in the literature including nonlinear (kernel) CCA [21], sparse CCA [22,23,24], multiple CCA [25], locality preserving and instance-specific CCA [26, 27], time-dependent CCA [28] and other unified variants (see for instance [29, 30]); these methods have been applied to several pattern analysis tasks such as image-to-text [31], pose estimation [21, 26] and object recognition [32], multi-camera activity correlation [33, 34] and motion alignment [35, 36] as well as heterogeneous sensor data classification [37].

The success of all the aforementioned CCA approaches is highly dependent on the accuracy of alignments between multi-view data. In practice, data are subject to misalignments (such as registration errors in satellite imagery) and sometimes completely unaligned (as in muti-lingual documents) and this skews the learning of CCA. Excepting a few attempts – to handle temporal deformations in monotonic sequence datasets [38] using canonical time warping [36] (and its deep extension [39]) – none of these existing CCA variants address alignment errors for non-monotonic datasets^{Footnote 2}. Besides CCA, the issue of data alignment has been approached, in general, using manifold alignment [40,41,42], Procrustes analysis [43] and source-target domain adaption [44] but none of these methods consider resilience to misalignments as a part of CCA design (which is the main purpose of our contribution in this paper). Furthermore, these data alignment solutions rely on a strong “apples-to-apples” comparison hypothesis (that data taken from different views have similar structures) which does not always hold especially when handling datasets with heterogeneous views (as text/image data and multi-temporal or multi-sensor satellite images). Moreover, even when data are globally well (re)aligned, some residual alignment errors are difficult to handle (such as parallax in multi-temporal satellite imagery) and harm CCA (as shown in our experiments).

In this paper, we introduce a novel CCA approach that handles misaligned data; i.e., it does not require any preliminary step of accurate data alignment. This is again very useful for different applications where aligning data is very time demanding or when data are taken from multiple sources (sensors, modalities, etc.) which are intrinsically misaligned^{Footnote 3}. The benefit of our approach is twofold; on the one hand, it models the uncertainty of alignments using a new data correlation term and on the other hand, modeling alignment uncertainty allows us to use not only decently aligned data (if available) when learning CCA, but also the unaligned ones. In sum, this approach can be seen as an extension of CCA on unaligned sets compared to standard CCA (and its variants) that operate only on accurately aligned data. Furthermore, the proposed method is as efficient as standard CCA and its computationally complexity grows w.r.t the dimensionality (and not the cardinality) of data, and this makes it very suitable for large datasets.

Our CCA formulation is based on the optimization of a constrained objective function that combines two terms; a correlation criterion and a context-based regularizer. The former maximizes a weighted correlation between data with a high cross-view similarity while the latter makes this weighted correlation high for data whose neighbors have high correlations too (and vice-versa). We will show that optimizing this constrained maximization problem is equivalent to solving an iterative generalized eigenvalue/eigenvector decomposition; we will also show that the solution of this iterative process converges to a fixed-point. Finally, we will illustrate the validity of our CCA formulation on different challenging problems including change detection both on residually and strongly misaligned multi-temporal satellite images; indeed, these images are subject to alignment errors due to the hardness of image registration under challenging conditions, such as occlusion and parallax.

The rest of this paper is organized as follows; Sect. 2 briefly reminds the preliminaries in canonical correlation analysis, followed by our main contribution: a novel alignment-agnostic CCA, as well as some theoretical results about the convergence of the learned CCA transformation to a fixed-point (under some constraints on the parameter that weights our regularization term). Section 3 shows the validity of our method both on synthetic toy data as well as real-world problems namely satellite image change detection. Finally, we conclude the paper in Sect. 4 while providing possible extensions for a future work.

2 Canonical Correlation Analysis

Considering the input spaces $\mathcal{X}_r$ and $\mathcal{X}_t$ as two sets of images taken from two modalities; in satellite imagery, these modalities could be two different sensors, or the same sensor at two different instants, etc. Denote $\mathcal{I}_r=\{\mathbf {u}_i\}_i$, $\mathcal{I}_t=\{\mathbf {v}_j\}_j$ as two subsets of $\mathcal{X}_r$ and $\mathcal{X}_t$ respectively; our goal is learn a transformation between $\mathcal{X}_r$ and $\mathcal{X}_t$ that assigns, for a given $\mathbf {u}\in \mathcal{X}_r$, a sample $\mathbf {v}\in \mathcal{X}_t$. The learning of this transformation usually requires accurately paired data in $\mathcal{X}_r \times \mathcal{X}_t$ as in CCA.

2.1 Standard CCA

Assuming centered data in $\mathcal{I}_r$, $\mathcal{I}_t$, standard CCA (see for instance [19]) finds two projection matrices that map aligned data in $\mathcal{I}_r \times \mathcal{I}_t$ into a latent space while maximizing their correlation. Let $\mathbf{P _r}$, $\mathbf{P _t}$ denote these projection matrices which respectively correspond to reference and test images. CCA finds these matrices as $(\mathbf{P _r},\mathbf{P _t})=\arg \max _\mathbf{A ,\mathbf B } {\text {tr}}(\mathbf A '\mathbf C _{rt} \mathbf B )$, subject to $\mathbf A '\mathbf C _{rr}\mathbf A =\mathbf I _u$, $\mathbf B '\mathbf C _{tt} \mathbf B =\mathbf I _{v}$; here $\mathbf I _{u}$ (resp. $\mathbf I _{v}$) is the $d_u\times d_u$ (resp. $d_v\times d_v$) identity matrix, $d_u$ (resp. $d_v$) is the dimensionality of data in $\mathcal{X}_r$ (resp. $\mathcal{X}_t$), $\mathbf A '$ stands for transpose of $\mathbf A $, ${\text {tr}}$ is the trace, $\mathbf C _{rt}$ (resp. $\mathbf C _{rr}$, $\mathbf C _{tt}$) correspond to inter-class (resp. intra-class) covariance matrices of data in $\mathcal{I}_r$, $\mathcal{I}_t$, and equality constraints control the effect of scaling on the solution. One can show that problem above is equivalent to solving the eigenproblem $\mathbf C _{rt}\mathbf C _{tt}^{-1}\mathbf C _{tr}\mathbf{P _r}=\gamma ^2 \mathbf C _{rr}\mathbf{P _r}$ with $\mathbf{P _t}=\frac{1}{\gamma } \mathbf C _{tt}^{-1}\mathbf C _{tr} \mathbf{P _r}$. In practice, learning these two transformations requires “paired” data in $\mathcal{I}_r \times \mathcal{I}_t$, i.e., aligned data. However, and as will be shown through this paper, accurately paired data are not always available (and also expensive), furthermore the cardinality of $\mathcal{I}_r$ and $\mathcal{I}_t$ can also be different, so one should adapt CCA in order to learn transformation between data in $\mathcal{I}_r$ and $\mathcal{I}_t$ as shown subsequently.

2.2 Alignment Agnostic CCA

We introduce our main contribution: a novel alignment agnostic CCA approach. Considering $\{(\mathbf {u}_i,\mathbf {v}_j)\}_{ij}$ as a subset of $\mathcal{I}_r \times \mathcal{I}_t$ (cardinalities of $\mathcal{I}_r$, $\mathcal{I}_t$ are not necessarily equal), we propose to find the transformation matrices $\mathbf{P _r}$, $\mathbf{P _t}$ as

$$\begin{aligned} \begin{array}{ll} \displaystyle \max _{\mathbf{P _r},\mathbf{P _t}} &{} {\text {tr}}(\mathbf U ' \mathbf{P _r}\mathbf{P '_t}\mathbf V \mathbf{D }) \\ \text {s.t.} &{} \mathbf{P '_r}\mathbf C _{rr} \mathbf{P _r}= \mathbf I _{u} \ \ \text {and} \ \ \mathbf{P '_t}\mathbf C _{tt} \mathbf{P _t}= \mathbf I _{v}, \end{array} \end{aligned}$$

(1)

the non-matrix form of this objective function is given subsequently. In this constrained maximization problem, $\mathbf U $, $\mathbf V $ are two matrices of data in $\mathcal{I}_r$, $\mathcal{I}_t$ respectively, and $\mathbf{D }$ is an (application-dependent) matrix with its given entry $\mathbf{D }_{ij}$ set to the cross affinity or the likelihood that a given data $\mathbf {u}_i \in \mathcal{I}_r$ aligns with $\mathbf {v}_j \in \mathcal{I}_t$ (see Sect. 3.2 about different setting of this matrix). This definition of $\mathbf{D }$, together with objective function (1), make CCA alignment agnostic; indeed, this objective function (equivalent to $\sum _{i,j} \langle \mathbf{P '_r}\mathbf {u}_i,\mathbf{P '_t}\mathbf {v}_j \rangle \mathbf{D }_{ij}$) aims to maximize the correlation between pairs (with a high cross affinity of alignment) while it also minimizes the correlation between pairs with small cross affinity. For a particular setting of $\mathbf{D }$, the following proposition provides a special case.

Proposition 1

Provided that $|\mathcal{I}_r|=|\mathcal{I}_t|$ and $\forall \mathbf {u}_i \in \mathcal{I}_r$, $\exists ! \mathbf {v}_j \in \mathcal{I}_t$ such that $\mathbf{D }_{ij}= 1$; the constrained maximization problem (1) implements standard CCA.

Proof

Considering the non-matrix form of (1), we obtain

$$\begin{aligned} {\text {tr}}(\mathbf U ' \mathbf{P _r}\mathbf{P '_t}\mathbf V \mathbf{D }) = \displaystyle \sum _{i,j} \langle \mathbf{P '_r}\mathbf {u}_i,\mathbf{P '_t}\mathbf {v}_j \rangle \mathbf{D }_{ij}, \end{aligned}$$

(2)

considering a particular order of $\mathcal{I}_t$ such that each sample $\mathbf {u}_i$ in $\mathcal{I}_r$ aligns with a unique $\mathbf {v}_i$ in $\mathcal{I}_t$ we obtain

(3)

with $\mathbf C _{rt}$ being the inter-class covariance matrix and the indicator function. Since the equality constraints (shown in Sect. 2.1) remain unchanged, the constrained maximization problem (1) is strictly equivalent to standard CCA for this particular $\mathbf{D }$ $\Box $

This particular setting of $\mathbf{D }$ is relevant only when data are accurately paired and also when $\mathcal{I}_r$, $\mathcal{I}_t$ have the same cardinality. In practice, many problems involve unpaired/mispaired datasets with different cardinalities; that’s why $\mathbf{D }$ should be relaxed using affinity between multiple pairs (as discussed earlier in this section) instead of using strict alignments. With this new CCA setting, the learned transformations $\mathbf{P _t}$, and $\mathbf{P _r}$ generate latent data representations $\phi _t(\mathbf {v}_i)=\mathbf{P '_t}\mathbf {v}_i$, $\phi _r(\mathbf {u}_j)=\mathbf{P '_r}\mathbf {u}_j$ which align according to $\mathbf{D }$ (i.e., $\Vert \phi _r(\mathbf {v}_i)-\phi _t(\mathbf {u}_j)\Vert _2$ decreases if $\mathbf{D }_{ij}$ is high and vice-versa). However, when multiple entries $\{{\mathbf{D }}_{ij}\}_j$ are high for a given i, this may produce noisy correlations between the learned latent representations and may impact their discrimination power (see also experiments). In order to mitigate this effect, we also consider context regularization.

2.3 Context-Based Regularization

For each data $\mathbf {u}_i \in \mathcal{I}_r$, we define a (typed) neighborhood system $\{\mathcal{N}_c(i)\}_{c=1}^C$ which corresponds to the typed neighbors of $\mathbf {u}_i$ (see Sect. 3.2 for an example). Using $\{\mathcal{N}_c(.)\}_{c=1}^C$, we consider for each c an intrinsic adjacency matrix $\mathbf{W}^c_u$ whose $(i,k)^{\text {th}}$ entry is set as . Similarly, we define the matrices $\{\mathbf{W}^c_v\}_c$ for data $\{\mathbf {v}_j\}_j\in \mathcal{I}_t$; extra details about the setting of these matrices are again given in experiments.

Using the above definition of $\{\mathbf{W}^c_u\}_c$, $\{\mathbf{W}^c_v\}_c$, we add an extra-term in the objective function (1) as

$$\begin{aligned} \begin{array}{l} \displaystyle \max _{\mathbf{P _r},\mathbf{P _t}} \ \text {tr}(\mathbf U ' \mathbf{P _r}\mathbf{P '_t}\mathbf V \mathbf{D }) + \beta \displaystyle \sum _{c=1}^C\displaystyle \text {tr}\big ( \mathbf U ' \mathbf{P _r}\mathbf{P '_t}\mathbf V \mathbf{W}^c_{v} \mathbf V ' \mathbf{P _t}\mathbf{P '_r}\mathbf U \mathbf{W}^{c '}_{u}\big ) \\ {\text {s.t.}} \ \ \ \ \ \ \mathbf{P '_r}\mathbf C _{rr} \mathbf{P _r}\ = \mathbf{I}_{u} \ \ \ \ \ \text {and} \ \ \ \ \ \mathbf{P '_t}\mathbf C _{tt} \mathbf{P _t}\ \ = \mathbf{I}_{v}. \end{array} \end{aligned}$$

(4)

The above right-hand side term is equivalent to

$$\small \beta \sum _c \sum _{i,j} \langle \mathbf{P '_r}\mathbf {u}_i, \mathbf{P '_t}\mathbf {v}_j\rangle \sum _{k,\ell } \langle \mathbf{P '_r}\mathbf {u}_k, \mathbf{P '_t}\mathbf {v}_\ell \rangle \mathbf{W}^c_{u,i,k} \mathbf{W}^c_{v,j,\ell }$$

the latter corresponds to a neighborhood (or context) criterion which considers that a high value of the correlation $\langle \mathbf{P '_r}\mathbf {u}_i, \mathbf{P '_t}\mathbf {v}_j\rangle $, in the learned latent space, should imply high correlation values in the neighborhoods $\{\mathcal{N}_c(i) \times \mathcal{N}_c(j)\}_c$. This term (via $\beta $) controls the sharpness of the correlations (and also the discrimination power) of the learned latent representations (see example in Fig. 2). Put differently, if a given $(\mathbf {u}_i,\mathbf {v}_j)$ is surrounded by highly correlated pairs, then the correlation between $(\mathbf {u}_i,\mathbf {v}_j)$ should be maximized and vice-versa [45, 46].

2.4 Optimization

Considering Lagrange multipliers for the equality constraints in Eq. (4), one may show that optimality conditions (related to the gradient of Eq. (4) w.r.t $\mathbf{P _r}$, $\mathbf{P _t}$ and the Lagrange multipliers) lead to the following generalized eigenproblem

(5)

here $\mathbf{K}_{tr}=\mathbf{K}_{rt}'$ and

$$\begin{aligned} \mathbf{K}_{tr} = \mathbf V \mathbf{D }\mathbf U '&+ \beta \sum \nolimits _c \mathbf V \mathbf{W}^c_v \mathbf V ' \mathbf{P _t}\mathbf{P '_r}\mathbf U \mathbf{W}^{c'}_u \mathbf U ' \nonumber \\&+ \beta \sum \nolimits _c \mathbf V \mathbf{W}^{c'}_v \mathbf V ' \mathbf{P _t}\mathbf{P '_r}\mathbf U \mathbf{W}^c_u \mathbf U '. \end{aligned}$$

(6)

In practice, we solve the above eigenproblem iteratively. For each iteration $\tau $, we fix $\mathbf{P _r}^{(\tau )}$, $\mathbf{P _t}^{(\tau )}$ (in $\mathbf{K}_{tr}$, $\mathbf{K}_{rt}$) and we find the subsequent projection matrices $\mathbf{P _r}^{(\tau +1)}$, $\mathbf{P _t}^{(\tau +1)}$ by solving Eq. (5); initially, $\mathbf{P _r}^{(0)}$, $\mathbf{P _t}^{(0)}$ are set using projection matrices of standard CCA. This process continues till a fixed-point is reached. In practice, convergence to a fixed-point is observed in less than five iterations.

Proposition 2

Let $\Vert .\Vert _1$ denote the entry-wise $L_1$-norm and $\mathbf{1}_{\tiny vu}$ a $d_v\times d_u$ matrix of ones. Provided that the following inequality holds

$$\begin{aligned} \beta < {\gamma _{\min }} \times \bigg ( \sum _c \big \Vert \mathbf{E}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{F}_c' \big \Vert _1 + \sum _c \big \Vert \mathbf{G}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{H}_c' \big \Vert _1\bigg )^{-1} \end{aligned}$$

(7)

with $\gamma _{\min }$ being a lower bound of the positive eigenvalues of (5), $\mathbf{E}_c=\mathbf V \mathbf{W}^c_v \mathbf V ' \mathbf C _{tt}^{-1}$, $ \mathbf{F}_c=\mathbf U \mathbf{W}^{c}_u \mathbf U ' \mathbf C _{rr}^{-1}$, $\mathbf{G}_c=\mathbf V \mathbf{W}^{c'}_v \mathbf V ' \mathbf C _{tt}^{-1}$ and $\mathbf{H}_c=\mathbf U \mathbf{W}^{c'}_u \mathbf U ' \mathbf C _{rr}^{-1}$; the problem in (5), (6) admits a unique solution $\tilde{\mathbf{P}}_r$, $\tilde{\mathbf{P}}_t$ as the eigenvectors of

$$\begin{aligned} \tilde{\mathbf{K}}_{rt} \mathbf C _{tt}^{-1} \tilde{\mathbf{K}}_{tr} \mathbf{P _r}&= \gamma ^2 \mathbf C _{rr} \mathbf{P _r}\nonumber \\ \mathbf{P _t}&= \frac{1}{\gamma } \ \mathbf C _{tt}^{-1} \tilde{\mathbf{K}}_{tr} \mathbf{P _r}, \end{aligned}$$

(8)

with $\tilde{\mathbf{K}}_{tr}$ being the limit of

$$\begin{aligned} \mathbf{K}_{tr}^{(\tau +1)} = \varPsi \big (\mathbf{K}_{tr}^{(\tau )}\big ), \end{aligned}$$

(9)

and $\varPsi : \mathbb {R}^{d_v\times d_u} \rightarrow \mathbb {R}^{d_v\times d_u}$ is given as

$$\begin{aligned} \varPsi (\mathbf{K}_{tr}) = \displaystyle \mathbf V \mathbf{D }\mathbf U '&+ \beta \sum \nolimits _c \mathbf V \mathbf{W}^c_v \mathbf V ' \mathbf{P _t}\mathbf{P '_r}\mathbf U \mathbf{W}^{c'}_u \mathbf U ' \nonumber \\&+ \beta \sum \nolimits _c \mathbf V \mathbf{W}^{c'}_v \mathbf V ' \mathbf{P _t}\mathbf{P '_r}\mathbf U \mathbf{W}^c_u \mathbf U ', \end{aligned}$$

(10)

with $\mathbf{P _t}$, $\mathbf{P _r}$, in (10), being functions of $\mathbf{K}_{tr}$ using (5). Furthermore, the matrices $\mathbf{K}_{tr}^{(\tau +1)}$ in (9) satisfy the convergence property

$$\begin{aligned} \big \Vert \mathbf{K}_{tr}^{(\tau +1)} - \tilde{\mathbf{K}}_{tr}\big \Vert _1 \le L^\tau \big \Vert \mathbf{K}_{tr}^{(\tau +1)} - \mathbf{K}_{tr}^{(0)}\big \Vert _1, \end{aligned}$$

(11)

with $L=\frac{\beta }{\gamma _{\min }} \big (\sum _c\big \Vert \mathbf{E}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{F}_c' \big \Vert _1 + \sum _c \big \Vert \mathbf{G}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{H}_c' \big \Vert _1\big )$.

Proof

See appendix

Note that resulting from the extreme sparsity of the typed adjacency matrices $\{\mathbf{W}^c_u\}_c$, $\{\mathbf{W}^c_v\}_c$, the upper bound about $\beta $ (shown in the sufficient condition in Eq. 7) is loose, and easy to satisfy; in practice, we observed convergence for all the values of $\beta $ that were tried in our experiments (see the x-axis of Fig. 2).

3 Experiments

In this section, we show the performance of our method both on synthetic and real datasets. The goal is to show the extra gain brought when using our alignment agnostic (AA) CCA approach against standard CCA and other variants.

3.1 Synthetic Toy Example

In order to show the strength of our AA CCA method, we first illustrate its performance on a 2D toy example. We consider 2D data sampled from an “arc” as shown in Fig. 1(a); each sample is endowed with an RGB color feature vector which depends on its curvilinear coordinates in that “arc”. We duplicate this dataset using a 2D rotation (with an angle of 180$^\circ $) and we add a random perturbation field (noise) both to the color features and the 2D coordinates (see Fig. 1). Note that accurate ground-truth pairing is available but, of course, not used in our experiments.

We apply our AA CCA (as well as standard CCA) to these data, and we show alignment results; this 2D toy example is very similar to the subsequent real data task as the goal is to find for each sample in the original set, its correlations and its realignment with the second set. From Fig. 1, it is clear that standard CCA fails to produce accurate results when data is contaminated with random perturbations and alignment errors, while our AA CCA approach successfully realigns the two sets (see again details in Fig. 1).

3.2 Satellite Image Change Detection

We also evaluate and compare the performance of our proposed AA CCA method on the challenging task of satellite image change detection (see for instance [6, 47,48,49,50]). The goal is to find instances of relevant changes into a given scene acquired at instance $t_1$ with respect to the same scene taken at instant $t_0< t_1$; these acquisitions (at instants $t_0$, $t_1$) are referred to as reference and test images respectively. This task is known to be very challenging due to the difficulty to characterize relevant changes (appearance or disappearance of objects^{Footnote 4}) from irrelevant ones such as the presence of cars, clouds, as well as registration errors. This task is also practically important; indeed, in the particular important scenario of damage assessment after natural hazards (such as tornadoes, earth quakes, etc.), it is crucial to achieve automatic change detection accurately in order to organize and prioritize rescue operations.

JOPLIN-TORNADOES11 Dataset. This dataset includes 680928 non overlapping image patches (of 30 $\times $ 30 pixels in RGB) taken from six pairs of (reference and test) GeoEye-1 satellite images (of 9850 $\times $ 10400 pixels each). This dataset is randomly split into two subsets: labeled used for training^{Footnote 5} (denoted $\mathcal{L}_r \subset \mathcal{I}_r$, $\mathcal{L}_t \subset \mathcal{I}_t$) and unlabeled used for testing (denoted $\mathcal{U}_r=\mathcal{I}_r \backslash \mathcal{L}_r$ and $\mathcal{U}_t = \mathcal{I}_t \backslash \mathcal{L}_t$) with $|\mathcal{L}_r|=|\mathcal{L}_t|=3000$ and $|\mathcal{U}_r|=|\mathcal{U}_t|=680928-3000$. All patches in $\mathcal{I}_r$ (or in $\mathcal{I}_t$), stitched together, cover a very large area – of about 20 $\times $ 20 km$^2$ – around Joplin (Missouri) and show many changes after tornadoes that happened in may 2011 (building destruction, etc.) and no-changes (including irrelevant ones such as car appearance/disappearance, etc.). Each patch in $\mathcal{I}_r$, $\mathcal{I}_t$ is rescaled and encoded with 4096 coefficients corresponding to the output of an inner layer of the pretrained VGG-net [51]. A given test patch is declared as a “change” or “no-change” depending on the score of SVMs trained on top of the learned CCA latent representations.

In order to evaluate the performances of change detection, we report the equal error rate (EER). The latter is a balanced generalization error that equally weights errors in “change” and “no-change” classes. Smaller EER implies better performance.

Data Pairing and Context Regularization. In order to study the impact of AA CCA on the performances of change detection – both with residual and relatively stronger misalignments – we consider the following settings for comparison (see also Table 1).

Standard CCA: patches are strictly paired by assigning each patch, in the reference image, to a unique patch in the test image (in the same location), so it assumes that satellite images are correctly registered. CCA learning is supervised (only labeled patches are used for training) and no-context regularization is used (i.e, $\beta =0$). In order to implement this setting, we consider $\mathbf{D }$ as a diagonal matrix with $\mathbf{D }_{ii}=\pm 1$ depending on whether $\mathbf {v}_i \in \mathcal{L}_t$ is labeled as “no-change” (or “change”) in the ground-truth, and $\mathbf{D }_{ii}=0$ otherwise.
Sup+CA CCA: this is similar to “standard CCA” with the only difference being $\beta $ which is set to its “optimal” value (0.01) on the validation set (see Fig. 2).
SemiSup CCA: this setting is similar to “standard CCA” with the only difference being the unlabeled patches which are now added when learning the CCA transformations, and $\mathbf{D }_{ii}$ (on the unlabeled patches) is set to $2 \kappa (\mathbf {v}_i,\mathbf {u}_i)-1$ (score between $-1$ and $+1$); here $\kappa (.,.) \in [0,1]$ is the RBF similarity whose scale is set to the 0.1 quantile of pairwise distances in $\mathcal{L}_t \times \mathcal{L}_r$.
SemiSup+CA CCA: this setting is similar to “SemiSup CCA” but context regularization is used (with again $\beta $ set to 0.01).
Res CCA: this is similar to “standard CCA”, but strict data pairing is relaxed, i.e., each patch in the reference image is assigned to multiple patches in the test image; hence, $\mathbf{D }$ is no longer diagonal, and set as $\mathbf{D }_{ij} = \kappa (\mathbf {v}_i,\mathbf {u}_j) \in [0,1]$ iff $(\mathbf {v}_i,\mathbf {u}_j) \in \mathcal{L}_t \times \mathcal{L}_r$ is labeled as “no-change” in the ground-truth, $\mathbf{D }_{ij} =\kappa (\mathbf {v}_i,\mathbf {u}_j)-1 \in [-1,0]$ iff $(\mathbf {v}_i,\mathbf {u}_j) \in \mathcal{L}_t \times \mathcal{L}_r$ is labeled as “change” and $\mathbf{D }_{ij} =0$ otherwise.
Res+Sup+CA CCA: this is similar to “Res CCA” with the only difference being $\beta $ which is again set to 0.01.
Res+SemiSup CCA: this setting is similar to “Res CCA” with the only difference being the unlabeled patches which are now added when learning the CCA transformations; on these unlabeled patches $\mathbf{D }_{ij} = 2\kappa (\mathbf {v}_i,\mathbf {u}_j)-1$.
Res+SemiSup+CA CCA: this setting is similar to “Res+SemiSup CCA” but context regularization is used (i.e., $\beta =0.01$).

Context setting: in order to build the adjacency matrices of the context (see Sect. 2.3), we define for each patch $\mathbf {u}_i \in \mathcal{I}_r$ (in the reference image) an anisotropic (typed) neighborhood system $\{\mathcal{N}_c(i)\}_{c=1}^C$ (with $C=8$) which corresponds to the eight spatial neighbors of $\mathbf {u}_i$ in a regular grid [52]; for instance when $c=1$, $\mathcal{N}_1(i)$ corresponds to the top-left neighbor of $\mathbf {u}_i$. Using $\{\mathcal{N}_c(.)\}_{c=1}^8$, we build for each c an intrinsic adjacency matrix $\mathbf{W}^c_u$ whose $(i,k)^{\text {th}}$ entry is set as ; here is the indicator function equal to 1 iff (i) the patch $\mathbf {u}_k$ is neighbor to $\mathbf {u}_i$ and (ii) its relative position is typed as c ($c=1$ for top-left, $c=2$ for left, etc. following an anticlockwise rotation), and 0 otherwise. Similarly, we define the matrices $\{\mathbf{W}^c_v\}_c$ for data $\{\mathbf {v}_j\}_j\in \mathcal{I}_t$.

Table 1. This table shows different configurations of CCA resulting from different instances of our model. In this table, “Sup” stands for supervised, “SemiSup” for semi-supervised, “CA” for context aware and “Res” for resilient.

Full size table

Impact of AA CCA and Comparison. Table 2 shows a comparison of different versions of AA CCA against other CCA variants under the regime of small residual alignment errors. In this regime, reference and test images are first registered using RANSAC [53]; an exhaustive visual inspection of the overlapping (reference and test) images (after RANSAC registration) shows sharp boundaries in most of the areas covered by these images, but some areas still include residual misalignments due to the presence of changes, occlusions (clouds, etc.) as well as parallax. Note that, in spite of the relative success of RANSAC in registering these images, our AA CCA versions (rows #5–8) provide better performances (see Table 2) compared to the other CCAs (rows #1–4); this clearly corroborates the fact that residual alignment errors remain after RANSAC (re)alignment (as also observed during visual inspection of RANSAC registration). Put differently, our AA CCA method is not an opponent to RANSAC but complementary.

Table 2. This table shows change detection EER (in %) on labeled (training and validation) and unlabeled sets under the residual error regime. When context regularization (referred to as CA in this table) is used, $\beta $ is set to $10^{\small -2}$.

Full size table

These results also show that when reference and test images are globally well aligned (with some residual errors; see Table 2), the gain in performance is dominated by the positive impact of alignment resilience; indeed, the impact of the unlabeled data is not always consistent (#5,6 vs. #7,8 resp.) in spite of being positive (in #1,2 vs. #3,4 resp.) while the impact of context regularization is globally positive (#1,3,5,7 vs. #2,4,6,8 resp.). This clearly shows that, under the regime of small residual errors, the use of labeled data is already enough in order to enhance the performance of change detection; the gain comes essentially from alignment resilience with a marginal (but clear) positive impact of context regularization.

In order to study the impact of AA CCA w.r.t stronger alignment errors (i.e. w.r.t a more challenging setting), we apply a relatively strong motion field to all the pixels in the reference image; precisely, each pixel is shifted along a direction whose x–y coordinates are randomly set to values between 15 and 30 pixels. These shifts are sufficient in order to make the quality of alignments used for CCA very weak so the different versions of CCA, mentioned earlier, become more sensitive to alignment errors (EERs increase by more than 100% in Table 3 compared to EERs with residual alignment errors in Table 2). With this setting, AA CCA is clearly more resilient and shows a substantial relative gain compared to the other CCA versions.

Table 3. This table shows change detection EER (in %) on labeled (training and validation) and unlabeled sets under the strong error regime. When context regularization (referred to as CA in this table) is used, $\beta $ is set to $10^{-2}$.

Full size table

3.3 Discussion

Invariance: resulting from its misalignment resilience, it is easy to see that our AA CCA is de facto robust to local deformations as these deformations are strictly equivalent to local misalignments. It is also easy to see that our AA CCA may achieve invariance to similarity transformations; indeed, the matrices used to define the spatial context are translation invariant, and can also be made rotation and scale invariant by measuring a “characteristic” scale and orientation of patches in a given satellite image. For that purpose, dense SIFT can be used to recover (or at least approximate) the field of orientations and scales, and hence adapt the spatial support (extent and orientation) of context using the characteristic scale, in order to make context invariant to similarity transformations.

Computational Complexity: provided that VGG-features are extracted (offline) on all the patches of the reference/test images, and provided that the adjacency matrices of context are precomputed^{Footnote 6}, and since the adjacency matrices $\{\mathbf{W}^c_u\}_c$, $\{\mathbf{W}^c_v\}_c$ are very sparse, the computational complexity of evaluating Eq. (6) and solving the generalized eigenproblem in Eq. (5) both reduce to $O(\min (d_u^2 d_v,d_v^2 d_u))$, here $d_u$, $d_v$ are again the dimensions of data in $\mathbf U $, $\mathbf V $ respectively; hence, this complexity is very equivalent to standard CCA which also requires solving a generalized eigenproblem. Therefore, the gain in the accuracy of our AA CCA is obtained without any overhead in the computational complexity that remains dependent on dimensionality of data (which is, in practice, smaller compared to the cardinality of our datasets) (see also Fig. 3).

4 Conclusion

We introduced in this paper a new canonical correlation analysis method that learns projection matrices which map data from input spaces to a latent common space where unaligned data become strongly or weakly correlated depending on their cross-view similarity and their context. This is achieved by optimizing a criterion that mixes two terms: the first one aims at maximizing the correlations between data which are likely to be paired while the second term acts as a regularizer and makes correlations spatially smooth and provides us with robust context-aware latent representations. Our method considers both labeled and unlabeled data when learning the CCA projections while being resilient to alignment errors. Extensive experiments show the substantial gain of our CCA method under the regimes of residual and strong alignment errors.

As a future work, our CCA method can be extended to many other tasks where alignments are error-prone and when context can be exploited in order to recover from these alignment errors. These tasks include “text-to-text” alignment in multilingual machine translation, as well as “image-to-image” matching in multi-view object tracking.

Notes

1.
Multi-view data stands for input data described with multiple modalities such as documents described with text and images.
2.
Non-monotonic stands for datasets without a “unique” order (such as patches in images).
3.
Satellite images – georeferenced with the Global Positioning System (GPS) – have localization errors that may reach 15 m in some geographic areas. On high resolution satellite images (sub-metric resolution) this corresponds to alignment errors/drifts that may reach 30 pixels.
4.
This can be any object so there is no a priori knowledge about what object may appear or disappear into a given scene.
5.
From which a subset of 1000 is used for validation (as a dev set).
6.
Note that the adjacency matrices of the spatial neighborhood system can be computed offline once and reused.

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Boujemaa, N., Fleuret, F., Gouet, V., Sahbi, H.: Visual content extraction for automatic semantic annotation of video news. In: The proceedings of the SPIE Conference, San Jose, CA, vol. 6 (2004)
Google Scholar
Wang, L., Sahbi, H.: Directed acyclic graph kernels for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3168–3175 (2013)
Google Scholar
Tollari, S., et al.: A comparative study of diversity methods for hybrid text and image retrieval approaches. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 585–592. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_72
Chapter Google Scholar
Hussain, M., Chen, D., Cheng, A., Wei, H., Stanley, D.: Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J. Photogrammetry Remote Sens. 80, 91–106 (2013)
Article Google Scholar
Bourdis, N., Marraud, D., Sahbi, H.: Spatio-temporal interaction for aerial video change detection. In: 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2253–2256. IEEE (2012)
Google Scholar
Sahbi, H., Boujemaa, N.: Coarse-to-fine support vector classifiers for face detection. In: null, p. 30359. IEEE (2002)
Google Scholar
Li, X., Sahbi, H.: Superpixel-based object class segmentation using conditional random fields. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1101–1104. IEEE (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. Book in preparation for MIT Press (2016). http://www.deeplearningbook.org
Jiu, M., Sahbi, H.: Nonlinear deep kernel learning for image annotation. IEEE Trans. Image Process. 26(4), 1820–1832 (2017)
Article MathSciNet Google Scholar
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
MathSciNet MATH Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar
Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. arXiv preprint arXiv:1708.07860 (2017)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part II. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article Google Scholar
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, vol. 2. Wiley, New York (1958)
MATH Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article Google Scholar
Sahbi, H.: Interactive satellite image change detection with context-aware canonical correlation analysis. IEEE Geosci. Remote Sens. Lett. 14(5), 607–611 (2017)
Article Google Scholar
Melzer, T., Reiter, M., Bischof, H.: Appearance models based on kernel canonical correlation analysis. Patt. Recogn. 36(9), 1961–1971 (2003)
Article Google Scholar
Hardoon, D.R., Shawe-Taylor, J.: Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011)
Article MathSciNet Google Scholar
Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)
Article MathSciNet Google Scholar
Zhang, Z., Zhao, M., Chow, T.W.: Binary-and multi-class group sparse canonical correlation analysis for feature extraction and classification. IEEE Trans. Knowl. Data Eng. 25(10), 2192–2205 (2013)
Article Google Scholar
Vía, J., Santamaría, I., Pérez, J.: A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Netw. 20(1), 139–152 (2007)
Article Google Scholar
Sun, T., Chen, S.: Locality preserving cca with applications to data visualization and pose estimation. Image Vis. Comput. 25(5), 531–543 (2007)
Article Google Scholar
Zhai, D., Zhang, Y., Yeung, D.Y., Chang, H., Chen, X., Gao, W.: Instance-specific canonical correlation analysis. Neurocomputing 155, 205–218 (2015)
Article Google Scholar
Yger, F., Berar, M., Gasso, G., Rakotomamonjy, A.: Adaptive canonical correlation analysis based on matrix manifolds. arXiv preprint arXiv:1206.6453 (2012)
De la Torre, F.: A unification of component analysis methods. In: Handbook of Pattern Recognition and Computer Vision, pp. 3–22 (2009)
Google Scholar
Sun, J., Keates, S.: Canonical correlation analysis on data with censoring and error information. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 1909–1919 (2013)
Article Google Scholar
Sun, L., Ji, S., Ye, J.: Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis. IEEE Trans. Patt. Anal. Mach. Intell. 33(1), 194–200 (2011)
Article Google Scholar
Haghighat, M., Abdel-Mottaleb, M.: Low resolution face recognition in surveillance systems using discriminant correlation analysis. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 912–917. IEEE (2017)
Google Scholar
Ferecatu, M., Sahbi, H.: Multi-view object matching and tracking using canonical correlation analysis. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2109–2112. IEEE (2009)
Google Scholar
Loy, C.C., Xiang, T., Gong, S.: Multi-camera activity correlation analysis. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1988–1995. IEEE (2009)
Google Scholar
Zhou, F., Torre, F.: Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp. 2286–2294 (2009)
Google Scholar
Zhou, F., De la Torre, F.: Generalized canonical time warping. IEEE Trans. Patt. Anal. Mach. Intell. 38(2), 279–294 (2016)
Article Google Scholar
Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Patt. Anal. Mach. Intell. 31(8), 1415–1428 (2009)
Article Google Scholar
Fischer, B., Roth, V., Buhmann, J.M.: Time-series alignment by non-negative multiple generalized canonical correlation analysis. BMC Bioinform. 8(10), S4 (2007)
Article Google Scholar
Trigeorgis, G., Nicolaou, M., Zafeiriou, S., Schuller, B.: Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE Trans. Patt. Anal. Mach. Intell. 40(5), 1128–1138 (2017)
Article Google Scholar
Ham, J., Lee, D.D., Saul, L.K.: Semisupervised alignment of manifolds. In: AISTATS, pp. 120–127 (2005)
Google Scholar
Lafon, S., Keller, Y., Coifman, R.R.: Data fusion and multicue data matching by diffusion maps. IEEE Trans. Patt. Anal. Mach. Intell. 28(11), 1784–1797 (2006)
Article Google Scholar
Wang, C., Mahadevan, S.: Manifold alignment using procrustes analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1120–1127. ACM (2008)
Google Scholar
Luo, B., Hancock, E.R.: Iterative procrustes alignment with the EM algorithm. Image Vis. Comput. 20(5), 377–396 (2002)
Article Google Scholar
Feuz, K.D., Cook, D.J.: Collegial activity learning between heterogeneous sensors. Knowl. Inf. Syst. 1–28 (2017)
Google Scholar
Sahbi, H., Li, X.: Context-based support vector machines for interconnected image annotation. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 214–227. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19315-6_17
Chapter Google Scholar
Sahbi, H.: CNRS-TELECOM ParisTech at imageCLEF 2013 scalable concept image annotation task: winning annotations with context dependent SVMs. In: CLEF (Working Notes) (2013)
Google Scholar
Sahbi, H.: Discriminant canonical correlation analysis for interactive satellite image change detection. In: IGARSS, pp. 2789–2792 (2015)
Google Scholar
Bourdis, N., Denis, M., Sahbi, H.: Constrained optical flow for aerial image change detection. In: 2011 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 4176–4179 (2011)
Google Scholar
Bourdis, N., Denis, M., Sahbi, H.: Camera pose estimation using visual servoing for aerial video change detection. In: 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3459–3462 (2012)
Google Scholar
Sahbi, H.: Relevance feedback for satellite image change detection. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1503–1507. IEEE (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Thiemert, S., Sahbi, H., Steinebach, M.: Applying interest operators in semi-fragile video watermarking. In: Security, Steganography, and Watermarking of Multimedia Contents VII, vol. 5681, International Society for Optics and Photonics, pp. 353–363 (2005)
Google Scholar
Kim, T., Im, Y.J.: Automatic satellite image registration by combination of matching and random sample consensus. IEEE Trans. Geosci. Remote Sens. 41(5), 1111–1117 (2003)
Article Google Scholar
Lu, L.Z., Pearce, C.E.M.: Some new bounds for singular values and eigenvalues of matrix products. Ann. Oper. Res. 98(1), 141–148 (2000)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

CNRS, Sorbonne University, Paris, France
Hichem Sahbi

Authors

Hichem Sahbi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hichem Sahbi .

Editor information

Editors and Affiliations

Technical University of Munich, Garching, Germany
Laura Leal-Taixé
Technische Universität Darmstadt, Darmstadt, Germany
Stefan Roth

A Appendix (Proof of Proposition 2)

We will prove that $\varPsi $ is L-Lipschitzian, with $L=\frac{\beta }{\gamma _{\min }} \big (\sum _c \big \Vert \mathbf{E}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{F}_c' \big \Vert _1 + \sum _c \big \Vert \mathbf{G}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{H}_c' \big \Vert _1\big )$. For ease of writing, we omit in this proof the subscripts t, r in $\mathbf{K}_{tr}$ (unless explicitly required and mentioned).

Given two matrices $\mathbf{K}^{(2)}$, $\mathbf{K}^{(1)}$, we have $\big \Vert \mathbf{K}^{(2)} - \mathbf{K}^{(1)} \big \Vert _1=(*)$ with

$$\begin{aligned} (*)&= \displaystyle \beta \bigg \Vert \sum _c \mathbf V \mathbf{W}^c_v \mathbf V ' (\mathbf{P _t}^{(1)} \mathbf{P '_r}^{(1)} -\mathbf{P _t}^{(0)} \mathbf{P '_r}^{(0)}) \mathbf U \mathbf{W}^{c'}_u \mathbf U ' \nonumber \\&\ \ \ \ + \displaystyle \ \sum _c \mathbf V \mathbf{W}^{c'}_v \mathbf V ' (\mathbf{P _t}^{(1)} \mathbf{P '_r}^{(1)}-\mathbf{P _t}^{(0)} \mathbf{P '_r}^{(0)}) \mathbf U \mathbf{W}^{c}_u \mathbf U ' \bigg \Vert _1. \end{aligned}$$

(12)

Using Eq. (5), one may write

$$\begin{aligned} \mathbf{P _t}\mathbf{P '_r}= \frac{1}{\gamma } \mathbf C _{tt}^{-1} \mathbf{K}_{tr} \mathbf C _{rr}^{-1},\end{aligned}$$

(13)

which also results from the fact that $\mathbf{K}_{rt} \mathbf C _{tt}^{-1} \mathbf{K}_{tr}$ is Hermitian and $\mathbf C _{rr}$ is positive semi-definite. By adding the superscript $\tau $ in $\mathbf{P _t}$, $\mathbf{P _r}$, $\gamma $, $\mathbf{K}_{tr}$ (with $\tau =0,1$), omitting again the subscripts t, r in $\mathbf{K}_{tr}$ and then plugging (13) into (12) we obtain

$$\begin{aligned} (*)&= \displaystyle \beta \bigg \Vert \sum _c \mathbf{E}_c \ (\frac{1}{\gamma ^{(1)}}{} \mathbf{K}^{(1)}-\frac{1}{\gamma ^{(0)}}{} \mathbf{K}^{(0)}) \ \mathbf{F}'_c + \displaystyle \ \sum _c \mathbf{G}_c \ (\frac{1}{\gamma ^{(1)}}{} \mathbf{K}^{(1)}-\frac{1}{\gamma ^{(0)}}{} \mathbf{K}^{(0)}) \ \mathbf{H}_c' \bigg \Vert _1 \nonumber \\&\le \displaystyle \frac{\beta }{\gamma _{\min }} \bigg \Vert \sum _c \mathbf{E}_c \ (\mathbf{K}^{(1)}-\mathbf{K}^{(0)}) \ \mathbf{F}_c' \bigg \Vert _1 + \ \displaystyle \frac{\beta }{\gamma _{\min }} \bigg \Vert \sum _c \mathbf{G}_c \ (\mathbf{K}^{(1)}-\mathbf{K}^{(0)}) \ \mathbf{H}_c' \bigg \Vert _1, \end{aligned}$$

(14)

here $\gamma _{\min }$ is the lower bound of the eigenvalues of (5) which can be derived (see for instance [54]). Considering $\mathbf{K}_{k,\ell }$ as the $(k,\ell )^\mathrm{{th}}$ entry of $\mathbf{K}$, we have

$$\begin{aligned} \begin{array}{lll} (*) &{} \le &{} \ \ \frac{\beta }{\gamma _{\min }} \displaystyle \sum _{i,j} \bigg | \displaystyle \sum _{k,\ell } (\mathbf{K}^{(1)}_{k,\ell } - \mathbf{K}^{(0)}_{k,\ell }) \sum _{c} \mathbf{E}_{c,i,k} \ \mathbf{F}_{c,j,\ell } \bigg | \\ &{} &{} \ \ \ + \frac{\beta }{\gamma _{\min }} \displaystyle \sum _{i,j} \bigg | \displaystyle \sum _{k,\ell } (\mathbf{K}^{(1)}_{k,\ell } - \mathbf{K}^{(0)}_{k,\ell }) \sum _{c} \mathbf{G}_{c,i,k} \ \mathbf{H}_{c,j,\ell } \bigg | \\ &{} &{} \\ &{} \mathbf{\le } &{} \ \ \frac{\beta }{\gamma _{\min }} \displaystyle \sum _{i,j} \displaystyle \sum _{k,\ell } \big | \mathbf{K}^{(1)}_{k,\ell } - \mathbf{K}^{(0)}_{k,\ell } \big | \sum _c \big | \mathbf{E}_{c,i,k} \ \mathbf{F}_{c,j,\ell } \big | \\ &{} &{} \ \ \ +\frac{\beta }{\gamma _{\min }} \displaystyle \sum _{i,j} \displaystyle \sum _{k,\ell } \big | \mathbf{K}^{(1)}_{k,\ell } - \mathbf{K}^{(0)}_{k,\ell } \big | \sum _c \big | \mathbf{G}_{c,i,k} \ \mathbf{H}_{c,j,\ell } \big | \\ &{} &{} \\ &{} \le &{} \ \ \frac{\beta }{\gamma _{\min }} \displaystyle \sum _{k,\ell } \big | \mathbf{K}^{(1)}_{k,\ell } - \mathbf{K}^{(0)}_{k,\ell } \big | \\ &{} &{} \ \times \bigg (\displaystyle \sum _{i,j} \sum _{k,\ell ,c} \big | \mathbf{E}_{c,i,k} \ \mathbf{F}_{c,j,\ell } \big | + \displaystyle \sum _{i,j} \sum _{k,\ell ,c} \big | \mathbf{G}_{c,i,k} \ \mathbf{H}_{c,j,\ell } \big |\bigg ) \\ &{} &{} (\text {as} \ \sum _{i} |a_i|.|b_i| \le \sum _{i,j} |a_i|.|b_j|, \ \forall \ \{a_i\}_i, \{b_j\}_i \subset \mathbb {R} )\\ &{} &{} \\ &{} = &{} L \ \big \Vert \mathbf{K}^{(1)} - \mathbf{K}^{(0)} \big \Vert _1, \end{array} \end{aligned}$$

$$\begin{aligned} \text {with} \ \ \ L=\frac{\beta }{\gamma _{\min }} \big ( \sum _c \big \Vert \mathbf{E}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{F}_c' \big \Vert _1 + \sum _c \big \Vert \mathbf{G}_c \ \mathbf{1}_{\tiny vu} \ \mathbf{H}_c' \big \Vert _1\big ) \end{aligned}$$

$\Box $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sahbi, H. (2019). Learning CCA Representations for Misaligned Data. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11132. Springer, Cham. https://doi.org/10.1007/978-3-030-11018-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-11018-5_39
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11017-8
Online ISBN: 978-3-030-11018-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning CCA Representations for Misaligned Data

Abstract

Similar content being viewed by others

Robust generalized canonical correlation analysis

Robust deflated canonical correlation analysis via feature factoring for multi-view image classification

Trace ratio criterion for multi-view discriminant analysis

Keywords

1 Introduction