Robust Data Whitening as an Iteratively Re-weighted Least Squares Problem

Mukundan, Arun; Tolias, Giorgos; Chum, Ondřej

doi:10.1007/978-3-319-59126-1_20

Arun Mukundan¹⁵,
Giorgos Tolias¹⁵ &
Ondřej Chum¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10269))

Included in the following conference series:

Scandinavian Conference on Image Analysis

2499 Accesses

Abstract

The entries of high-dimensional measurements, such as image or feature descriptors, are often correlated, which leads to a bias in similarity estimation. To remove the correlation, a linear transformation, called whitening, is commonly used. In this work, we analyze robust estimation of the whitening transformation in the presence of outliers. Inspired by the Iteratively Re-weighted Least Squares approach, we iterate between centering and applying a transformation matrix, a process which is shown to converge to a solution that minimizes the sum of $\ell _2$ norms. The approach is developed for unsupervised scenarios, but further extend to supervised cases. We demonstrate the robustness of our method to outliers on synthetic 2D data and also show improvements compared to conventional whitening on real data for image retrieval with CNN-based representation. Finally, our robust estimation is not limited to data whitening, but can be used for robust patch rectification, e.g. with MSER features.

You have full access to this open access chapter, Download conference paper PDF

Bounded Non-Local Means for Fast and Effective Image Denoising

Robust K-SVD: A Novel Approach for Dictionary Learning

Robust principal component analysis via weighted nuclear norm with modified second-order total variation regularization

Article 15 July 2023

Yi Dou, Xinling Liu, … Jianjun Wang

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In many computer vision tasks, visual elements are represented by vectors in high-dimensional spaces. This is the case for image retrieval [3, 14], object recognition [17, 23], object detection [9], action recognition [20], semantic segmentation [16] and many more. Visual entities can be whole images or videos, or regions of images corresponding to potential object parts. The high-dimensional vectors are used to train a classifier [19] or to directly perform a similarity search in high-dimensional spaces [14].

Vector representations are often post-processed by mapping to a different representation space, which can be higher or lower dimensional. Such mappings or embeddings can be either non-linear [2, 5] or linear [4, 6]. In the non-linear case, methods that directly evaluate [2] or efficiently approximate [5] non-linear kernels are known to be perform better. Typical applications range from image classification [5] and retrieval [4] to semantic segmentation [8]. Examples of the linear kind are used for dimensionality reduction in which dimensions carrying the most meaningful information are kept. Dimensionality reduction with Principal Component Analysis (PCA) is very popular in numerous tasks [4, 6, 15]. In the same vein as PCA is data whitening, which is the focus of this work^{Footnote 1}.

A whitening transformation is a linear transformation that performs correlation removal or suppression by mapping the data to a different space such that the covariance matrix of the data in the transformed space is identity. It is commonly learned in an unsupervised way from a small sample of training vectors. It is shown to be quite effective in retrieval tasks with global image representations, for example, when an image is represented by a vector constructed through the aggregation of local descriptors [13] or by a vector of Convolutional Neural Network (CNN) activations [11, 22]. In particular, PCA whitening significantly boosts the performance of CNN compact image vectors, i.e. 256 to 512 dimensions, due to handling of inherent co-occurrence phenomena [4]. Principal components found are ordered by decreasing variance, allowing for dimensionality reduction at the same time [12]. Dimensionality reduction may also be performed in a discriminative, supervised fashion. This is the case in the work by Cai et al. [6], where the covariance matrices are constructed by using information of pairs of similar and non-similar elements. In this fashion, the injected supervision performs better separation between matching and non-matching vectors and has better chances to avoid outliers in the estimation. It has been shown [10] that an unsupervised approach based on least squares minimization is likely to be affected by outliers: even a single outlier of high magnitute can significantly deviate the solution.

In this work, we propose an unsupervised way to learn the whitening transformation such that the estimation is robust to outliers. Inspired by the Iteratively Re-weighted Least Squares of Aftab and Hartley [1], we employ robust M-estimators. We perform minimization of robust cost functions such as $\ell _1$ or Cauchy. Our approach iteratively alternates between two minimizations, one to perform the centering of the data and one to perform the whitening. In each step a weighted least squares problem is solved and is shown to minimize the sum of the $\ell _2$ norms of the training vectors. We demonstrate the effectiveness of this approach on synthetic 2D data and on real data of CNN-based representation for image search. The method is additionally extended to handle supervised cases, as in the work of Cai et al. [6], where we show further improvements. Finally, our methodology is not limited to data whitening. We provide a discussion on applying it for robust patch rectification of MSER features [18].

The rest of the paper is organized as follows: In Sect. 2 we briefly review conventional data whitening and give our motivation, while in Sect. 3 we describe the proposed iterative whitening approach. Finally, in Sects. 4 and 5 we compare our method to the conventional approach on synthetic and real data, respectively.

2 Data Whitening

In this section, we first briefly review the background of data whitening and then give a geometric interpretation, which forms our motivation for the proposed approach.

2.1 Background on Whitening

A whitening transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix. The transformation is called “whitening” because it changes the input vector into a white noise vector.

We consider the case where this transformation is applied on a set of zero centered vectors $\mathcal {X} = \lbrace \mathbf {x} _1, \ldots , \mathbf {x} _i, \ldots , \mathbf {x} _N \rbrace $, with $\mathbf {x} _i \in \mathbb {R} ^d$, where $\varSigma = \sum _i \mathbf {x} _i \mathbf {x} _i^{{\!\top }}$. The whitening transformation P is given by

$$\begin{aligned} P^{\!\top }P = \varSigma ^{-1}. \end{aligned}$$

(1)

In Fig. 1 we show a toy example of 2D points and their whitened counterpart.

Assumption. In the following text, we assume that the points of $\mathcal {X} $ do not lie in a linear subspace of dimensionality $d' < d$. If this is the case, a solution is to first identify the $d'$-dimensional subspace and perform the proposed algorithms on this subspace. The direct consequence of the assumption is that the sample covariance matrix $\varSigma $ is full rank, in particular $\det {(\varSigma )} > 0$.

It is clear from (1) that the whitening transformation is given up to an arbitrary rotation $R \in \mathbb {R} ^{d \times d}$, with $R^{{\!\top }}R = I$. The transformation matrix P of the whitening is thus given by

(2)

2.2 Geometric Interpretation

We provide a geometric interpretation of data whitening, which also serves as our motivation for the proposed method in this work.

Observation. Assuming zero-mean points, the whitening transform P in (2) minimizes the sum of squared $\ell _2$ norms among all linear transforms T with .

Proof.

$$\begin{aligned} \begin{aligned} C_{\ell _2}(P)&= \sum _i ||P\mathbf {x} _i||^2 \\&= \sum _i tr\left( \mathbf {x} _i^{\!\top }P^{\!\top }P \mathbf {x} _i \right) \\&= \sum _i tr\left( \left( \mathbf {x} _i \mathbf {x} _i^{\!\top }\right) P^{\!\top }P \right) \\&= tr\left( \left( \sum _i \mathbf {x} _i \mathbf {x} _i^{\!\top }\right) P^{\!\top }P \right) \\&= tr\left( \varSigma P^{\!\top }P \right) \\&= \sum _{j=1}^d \lambda _j, \end{aligned} \end{aligned}$$

(3)

where $\lambda _i$ are the eigenvalues of $ \varSigma P^{\!\top }P$ and $||\cdot ||$ is denoting $\ell _2$ norm. Upon imposing the condition , we get that $\det ( \varSigma P^{\!\top }P ) = \prod _{j=1}^d \lambda _j$ is constant with respect to P. It follows from the arithmetic and geometric mean inequality, that the sum in (3) is minimized when $\lambda _i=\lambda _j , \forall i=j$. Equality of all eigenvalues allows us to show that

(4)

which is exactly the solution in (2) that also minimizes (3). The need for the existence of $\varSigma ^{-1}$ justifies the stated full rank assumption.

We have just shown that learning a whitening transformation reduces to a least squares problem.

3 Robust Whitening

In this section we initially review the necessary background on the the iteratively re-weighted least squares (IRLS) method recently proposed by Aftab and Hartley [1], which is the starting point for our method. Then, we present the robust whitening and centering procedures, which are posed as weighted least squares problems and performed iteratively. Finally, the extension to the supervised case is described.

3.1 Background on IRLS

In the context of distance minimization the IRLS method minimizes the cost function

$$\begin{aligned} C_h(\mathbf {\theta }) = \sum _{i=1}^N h \circ f(\mathbf {\theta }, \mathbf {x} _i), \end{aligned}$$

(5)

where f is a distance function that is defined on some domain, h is a function that makes the cost less sensitive to outliers, and $\mathbf {x} _i \in \mathcal {X} $. Some examples of robust h functions are $\ell _1$, Huber, pseudo-Huber, etc. as described in [1]. For instance, assume the case of the geometric median of the points in $\mathcal {X} $. Setting $f(\varvec{\mu }, \mathbf {x} _i) = ||\varvec{\mu }-\mathbf {x} _i||$ and $h(z)=z$, we get the cost (5) as the sum of $\ell _2$ norms. The minimum of this cost is attained when $\varvec{\mu } $ is equal to the geometric median.

It is shown [1] that a solution for ${{\mathrm{argmin}}}_{\mathbf {\theta }} C_h(\mathbf {\theta })$ may be found by solving a sequence of weighted least squares problems. Given some initial estimate $\mathbf {\theta } ^0$, the parameters $\mathbf {\theta } $ are iteratively estimated

$$\begin{aligned} \mathbf {\theta } ^{t+1} = \mathop {\text {argmin}}\limits _{\mathbf {\theta }} \sum _{i=1}^N w(\mathbf {\theta } ^t,\mathbf {x} _i) f(\mathbf {\theta },\mathbf {x} _i)^2, \end{aligned}$$

(6)

where for brevity $w(\mathbf {\theta } ^t,\mathbf {x} _i)$ is denoted $w_i^t$ in the following. Provided $h(\sqrt{z})$ is differentiable at all points and concave, for certain values of $w_i^t$ and conditions on f this solution minimizes $C_h(\mathbf {\theta })$. In some cases, it may even be possible to find a simple and anlytic solution.

Given that the iterative procedure indeed converges to a minimum cost of (5), we get the following condition on the weights:

$$\begin{aligned} \begin{aligned} \nabla _{\mathbf {\theta }} (h \circ f({\mathbf {\theta },\mathbf {x} _i}))&= 0, \\ \nabla _{\mathbf {\theta }} (w_i^t f({\mathbf {\theta },\mathbf {x} _i})^2)&= 0. \\ \end{aligned} \end{aligned}$$

(7)

This results in the following weights

$$\begin{aligned} w_i^t = \frac{h'(f({\mathbf {\theta } ^t,\mathbf {x} _i}))}{2f({\mathbf {\theta } ^t,\mathbf {x} _i})}. \end{aligned}$$

(8)

Geometric median. The geometric median $\varvec{\mu } $ of a set of points $\lbrace \mathbf {x} _i \rbrace $ is the point that minimizes the sum of $\ell _2$ distances to the points. As shown in one of the cases in the work by Aftab and Hartley [1], the problem of finding the geometric median can be cast in an IRLS setting for certain value of weights. Setting $f(\varvec{\mu }, \mathbf {x} _i) = ||\varvec{\mu }-\mathbf {x} _i||$ and $h(z)=z$, the IRLS algorithm minimizes the sum of distances at each iteration, thus converging to the geometric median.

3.2 Method

From the observation in Sect. 2.2, we know that there is a closed-form solution to the problem of finding a linear transformation P so that $\sum _i ||P\mathbf {x} _i||^2 $ is minimized subject to a fixed determinant $\det (P)$. The idea of the robust whitening is to use this least squares minimizer in a framework similar to the iterative re-weighted least squares to minimize a robust cost.

Robust transformation estimation. In contrast to the conventional whitening and the minimization of (3), we now propose the estimation of a whitening transform (transformation matrix P) in a way that is robust to outliers. We assume zero mean points and seek the whitening transformation that minimizes the robust cost function of (5). We set $f(P, \mathbf {x} _i) = ||P\mathbf {x} _i||$ and use the $\ell _1$ cost function $h(z) = z$. Other robust cost functions can be used, too^{Footnote 2}.

We seek to minimize the sum of $\ell _2$ norms in the whitened space

$$\begin{aligned} C_{\ell _1}(P) = \sum _{i=1}^N f(P, \mathbf {x} _i) = \sum _{i=1}^N ||P\mathbf {x} _i||. \end{aligned}$$

(9)

The corresponding iteratively re-weighted least squares solution is given by

$$\begin{aligned} P^{t+1} = \mathop {\text {argmin}}\limits _{P} \sum _{i=1}^N w_i^t ||P \mathbf {y} _i^t||^2, \end{aligned}$$

(10)

where $\mathbf {y} _i^t = P^t \mathbf {y} _i^{t-1}$ and $\mathbf {y} _i^0=\mathbf {x} _i$. This means that each time transformation $P^{t}$ is estimated and applied to whiten the data points. In the following iteration, the estimation is performed on data points in the whitened space. The effective transformation at iteration t with respect to the initial points $\mathbf {x} _i$ is given by

$$\begin{aligned} \hat{P}^t = \prod _{i=1}^{t} P^i. \end{aligned}$$

(11)

Along the lines of proof (3) we find a closed form solution that minimizes (9) as

$$\begin{aligned} \begin{aligned}&\sum _i w_i^t ||P\mathbf {y} _i^t||^2 \\&= tr\left( \left( \sum _i w_i^t \mathbf {y} _i^t {\mathbf {y} _i^t}^{\!\top }\right) P^{\!\top }P \right) \\&= tr\left( \tilde{\varSigma } P^{\!\top }P \right) \\ \end{aligned} \end{aligned}$$

(12)

where $\tilde{\varSigma } = \sum _i w_i^t \mathbf {y} _i^t {\mathbf {y} _i^t}^{\!\top }$ is a weighted covariance. Therefore, P is given, up to a rotation, as

(13)

Joint centering and transformation matrix estimation. In this section we describe the proposed approach for data whitening. We propose to jointly estimate a robust mean $\varvec{\mu } $ and a robust transformation matrix P by alternating between the two previously described procedures: estimating the geometric median and estimating the robust transformation. In other words, in each iteration, we first find $\varvec{\mu } $ keeping P fixed and then find P keeping $\varvec{\mu } $ fixed. In this way the assumption for centered points when finding P is satisfied. Given that each iteration of the method outlined above reduces the cost, and that the cost must be non-negative, we are assured convergence to a local minimum.

We propose to minimize cost

$$\begin{aligned} C_{\ell _1}(P,\varvec{\mu }) = \sum _{i=1}^N ||P(\mathbf {x} _i-\varvec{\mu })||. \end{aligned}$$

(14)

In order to reformulate this as an IRLS problem, we use $h(z) = z$, and $f(P,\varvec{\mu },\mathbf {x} _i)= ||P(\mathbf {x} _i-\varvec{\mu })||$. Now, at iteration t the minimization is performed on points $\mathbf {y} _i^t = \hat{P}^t(\mathbf {x} _i-\hat{\varvec{\mu }}^t)$ and the conditions for convergence with respect to $\varvec{\mu } $ (skipping t and notation for effective parameters for brevity) are

$$\begin{aligned} \begin{aligned} \nabla _{\varvec{\mu }} (h \circ f)&= \nabla _{\varvec{\mu }} ||P(\mathbf {x} _i - \varvec{\mu })|| \\&= \nabla _{\varvec{\mu }} \sqrt{(\mathbf {y} _i - \varvec{\mu })^{{\!\top }}P^{{\!\top }}P(\mathbf {y} _i - \varvec{\mu })} \\&= \frac{1}{2||P(\mathbf {y} _i - \varvec{\mu })||} \cdot \nabla _{\varvec{\mu }} M\\ \\ \nabla _{\varvec{\mu }} (w_i \cdot f^2)&= w_i \cdot \nabla _{\varvec{\mu }} M \\ \end{aligned} \end{aligned}$$

(15)

where we have $M= (\mathbf {y} _i - \varvec{\mu })^{{\!\top }}P^{{\!\top }}P(\mathbf {y} _i - \varvec{\mu })$. This gives the expression for the weight

$$\begin{aligned} w_i^t = \frac{1}{2||\hat{P}^t(\mathbf {x} _i - \hat{\varvec{\mu }}^t)||}. \end{aligned}$$

(16)

A similar derivation gives us the weights for the iteration step of P. Therefore in each iteration, we find the solutions to the following weighted least squares problems,

$$\begin{aligned} \varvec{\mu } ^{t+1} = \mathop {\text {argmin}}\limits _{\varvec{\mu }} \sum _{i=1}^N w_i(P^t,\varvec{\mu } ^t) ||P^t(\mathbf {y} _i - \varvec{\mu })||^2, \end{aligned}$$

(17)

$$\begin{aligned} P^{t+1} = \mathop {\text {argmin}}\limits _{P} \sum _{i=1}^N w_i(P^t,\varvec{\mu } ^{t+1}) ||P(\mathbf {y} _i^t - \varvec{\mu } ^{t+1})||^2. \end{aligned}$$

(18)

The effective centering and transformation matrix at iteration t are given by

$$\begin{aligned} \hat{\varvec{\mu }}^t = \sum _{i=1}^t \left( \prod _{j=1}^{i-1} P_j^{-1}\right) \varvec{\mu } ^i \quad , \quad \hat{P}^t = \prod _{i=1}^{t} P^i. \end{aligned}$$

(19)

The whole procedure is summarized in Algorithm 1, where chol is used to denote the Cholesky decomposition.

3.3 Extension with Supervision

We firstly review the work of Cai et al. [6] who perform supervised descriptor whitening and then present our extension for robust supervised whitening.

Background on linear discriminant projections [6]. The linear discriminant projections (LDP) are learned via supervision of pairs of similar and dissimilar descriptors. A pair (i, j) is similar if $(i,j) \in \mathcal {S} $ while dissimilar if $(i,j) \in \mathcal {D} $. The projections are learned in two parts. Firstly, the whitening part is obtained as the square-root of the intra-class covariance matrix , where

$$\begin{aligned} C_{\mathcal {S}} = \sum _{(i,j\in \mathcal {S})} (x_i - x_j)(x_i - x_j)^\top . \end{aligned}$$

(20)

Then, the rotation part is given by the PCA of the inter-class covariance matrix which is computed in the space of the whitened descriptors. It is computed as , where

$$\begin{aligned} C_{\mathcal {D}} = \sum _{(i,j\in \mathcal {D})} (x_i - x_j)(x_i - x_j)^\top . \end{aligned}$$

(21)

The final whitening is performed by $P_{\mathcal {S} \mathcal {D}}^\top (x-m)$, where m is the mean descriptor and . It is noted [6] that, if the number of descriptors is large compared to the number of classes (two in this case), then $ C_{\mathcal {D}} \approx C_{\mathcal {S} \cup \mathcal {D}}$ since $|\mathcal {S} | \ll |\mathcal {D} |$. This is the approach we follow.

Robust linear discriminant projections. The proposed method uses the provided supervision in a robust manner by employing the method introduced in Sect. 3.2. The whitening is estimated in a robust manner by Algorithm 1 on the intra-class covariance. In this manner, small weights are assigned to pairs of descriptors that are found to be outliers. Then, the mean and covariance are estimated in a robust manner in the whitened space. The whole procedure is summarized in Algorithm 2. Mean $\mu _1$ is zero due to the including the pairs in a symmetric manner.

4 Examples on Synthetic Data

We compare the proposed and the conventional whitening approaches on synthetic 2D data in order to demonstrate the robustness of our method to outliers. We sample a set of 2D points from a normal distribution, which is shown in Fig. 2(a) and then add an outlier and show the result in Fig. 2(b). In the absence of outliers, both methods provide a similar estimation as shown in Fig. 3. It is also shown how the iterative approach reduces the cost at each iteration. With the presence of an outlier, the estimation of the conventional approach is largely affected, while the robust method gives a much better estimation, as shown in Fig. 3. Using the Cauchy cost function the estimated covariance is very close to that of the ground truth. The weights assigned to each point with the robust approach are visualized in Fig. 2 and show how the outlier is discarded in the final estimation. Finally, in Fig. 4, we compare the conventional way with our approach for outlier of increasing distance.

5 Experiments

In this section, the robust whitening is applied to real-application data. In particular, we test on SPOC [4] descriptors, which are CNN-based image descriptors constructed via sum pooling of network activations in the internal convolutional layers. We evaluate on 3 popular retrieval benchmarks, namely Oxford5k, Paris6k and Holidays (the upright version), and use around 25 k training images to learn the whitening. We use VGG network [21] to extract the descriptors and, in contrast to the work of Babenko and Lempitsky [4], we do not $\ell _2$-normalize the input vectors. The final ranking is obtained using Euclidean distance between the query and the database vectors. Evaluation is performed by measuring mean Average Precision (mAP). As in the case of conventional whitening, the dimension reduction is performed by preserving those dimensions that have the highest variance. This is done by finding an eigenvalue decomposition of the estimated covariance and ordering the eigenvectors according to decreasing eigenvalue.

There are many approaches performing robust PCA [7, 24, 25] by assuming that the data matrix can be decomposed into the sum of a low rank matrix and a sparse matrix corresponding to the outliers. We employ the robust PCA (RPCA) method by Candès et al. [7] to perform a comparison. The low rank matrix is recovered and PCA whitening is learned on this.

We present results in Table 1, where the robust approach offers a consistent improvement over the conventional PCA whitening [4]. Especially in the case where the whitening is learned on few training vectors, the improvement is larger as outliers will heavily influence the conventional whitening, as shown in Fig. 5. Our approach is also better than RPCA whitening for large dimensionalities. It seems that RPCA underestimates the rank of the matrix and does not offer any further improvements for large dimensions.

Table 1. Retrieval performance comparison using mAP on 3 common benchmarks. Comparison of retrieval using the initial sum-pooled CNN activations, post-processing using the baselines and our methods for unsupervised and supervised whitening. Results for descriptors of varying dimensionality. The full training set is used. Descriptors extracted using VGG. S: indicates the use of supervision.

Full size table

6 Discussion

The applicability of the proposed method goes beyond robust whitening. Consider, for example, the task of affine-invariant descriptors of local features, such as MSERs [18]. A common approach is to transform the detected feature into a canonical frame prior to computing a robust descriptor based on the gradient map of the normalized patch (SIFT [17]). To remove the effect of an affine transformation, a centre of gravity and centered second-order moment (covariance matrix) are used. It can be shown that both the centre of gravity and the covariance matrix are affine-covariants, i.e. if the input point set is transformed by an affine transformation A, they transform with the same transformation A.

The proposed method searches $\mu $ and P by minimization over all possible affine transformations with a fixed determinant. In turn, $\mu $ is fully affine covariant and P is affine covariant up to an unknown scale (and rotation, $P^{\!\top }P$ cancels the rotation). To the best of our knowledge, this type of robust-to-outliers covariants have not been used.

7 Conclusions

We cast the problem of data whitening as minimization of robust cost functions. In this fashion we iteratively estimate a whitening transformation that is robust to the presence of outliers. With the use of synthetic data, we show that our estimation is almost unaffected even with extreme cases of outliers, while it also offers improvements when whitening CNN descriptors for image retrieval.

Notes

1.
The authors were supported by the MSMT LL1303 ERC-CZ grant, Arun Mukundan was supported by the SGS17/185/OHK3/3T/13 grant.
2.
We also use Cauchy cost in our experiments. It is defined as $h(z) = b^2 log(1 + z^2/b^2)$.

References

Aftab, K., Hartley, R.: Convergence of iteratively re-weighted least squares to robust M-estimators. In: IEEE Winter Conference on Applications of Computer Vision (2015)
Google Scholar
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)
Google Scholar
Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)
Google Scholar
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)
Google Scholar
Bo, L., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. In: NIPS (2009)
Google Scholar
Cai, H., Mikolajczyk, K., Matas, J.: Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans. PAMI 33(2), 338–352 (2011)
Article Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
Article MathSciNet MATH Google Scholar
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_32
Chapter Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
De la Torre, F., Black, M.J.: Robust principal component analysis for computer vision. In: ICCV (2001)
Google Scholar
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: arXiv (2016)
Google Scholar
Huber, P.J.: Projection pursuit. In: The annals of Statistics (1985)
Google Scholar
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of PCA and Whitening. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 774–787. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_55
Chapter Google Scholar
Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. PAMI 34(9), 1704–1716 (2012)
Article Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: CVPR (2004)
Google Scholar
Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: CVPR (2013)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. In: Image and Vision Computing (2010)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv (2014)
Google Scholar
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: arXiv (2015)
Google Scholar
Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: CVPR (1991)
Google Scholar
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: NIPS (2009)
Google Scholar
Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. In: NIPS (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Visual Recognition Group, Czech Technical University in Prague, Prague, Czech Republic
Arun Mukundan, Giorgos Tolias & Ondřej Chum

Authors

Arun Mukundan
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Tolias
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Chum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arun Mukundan .

Editor information

Editors and Affiliations

University of Tromsø, Tromsø, Norway
Puneet Sharma
University of Tromsø, Tromsø, Norway
Filippo Maria Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukundan, A., Tolias, G., Chum, O. (2017). Robust Data Whitening as an Iteratively Re-weighted Least Squares Problem. In: Sharma, P., Bianchi, F. (eds) Image Analysis. SCIA 2017. Lecture Notes in Computer Science(), vol 10269. Springer, Cham. https://doi.org/10.1007/978-3-319-59126-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-59126-1_20
Published: 19 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59125-4
Online ISBN: 978-3-319-59126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)