Scalable Low-Rank Representation

Liu, Guangcan; Yan, Shuicheng

doi:10.1007/978-3-319-12000-3_3

Scalable Low-Rank Representation

Guangcan Liu² &
Shuicheng Yan³

Chapter
First Online: 30 October 2014

1964 Accesses
1 Citations

Abstract

While the optimization problem associated with LRR is convex and easy to solve, it is actually a big challenge to achieve high efficiency, especially under large-scale settings. In this chapter we therefore address the problem of solving nuclear norm regularized optimization problems (NNROPs), which contain a category of problems including LRR. Based on the fact that the optimal solution matrix to an NNROP is often low-rank, we revisit the classic mechanism of low-rank matrix factorization, based on which we present an active subspace algorithm for efficiently solving NNROPs by transforming large-scale NNROPs into small-scale problems. The transformation is achieved by factorizing the large-size solution matrix into the product of a small-size orthonormal matrix (active subspace) and another small-size matrix. Although such a transformation generally leads to non-convex problems, we show that suboptimal solution can be found by the augmented Lagrange alternating direction method. For the robust PCA (RPCA) [7] problem, which is a typical example of NNROPs, theoretical results verify sub-optimality of the solution produced by our algorithm. For the general NNROPs, we empirically show that our algorithm significantly reduces the computational complexity without loss of optimality.

The contents of this chapter have been published on Neural Computation [18]. $\copyright $ [2014] MIT Press. Reprinted, with permission, from MIT Press Journals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
More generally, NNROPs are expressed as $\min _{X}\Vert X\Vert _*+\lambda {}f(x)$, where $f(x)$ is a convex function. In this work, we are particularly interested in the form (1), which has covered a wide range of problems.
2.
For an $m\times {n}$ matrix $M$ (without loss of generality, assuming $m\le {n}$), its SVD is defined by $M=U[\varSigma ,0]V^T$, where $U$ and $V$ are orthogonal matrices and $\varSigma =\mathrm {diag}\left( \sigma _1,\sigma _2,\ldots ,\sigma _m\right) $ with $\{\sigma _i\}_{i=1}^m$ being singular values. The SVD defined in this way is also called the full SVD. If we only calculate the $m$ column vectors of $V$, i.e., $M=U\varSigma {}V^T$ with $U\in \fancyscript{R}^{m\times {}m}$, $\varSigma \in \fancyscript{R}^{m\times {}m}$, and $V\in \fancyscript{R}^{n\times {}m}$, the simplified form is called the thin SVD. If we only keep the positive singular values, the reduced form is called the skinny SVD. For a matrix $M$ of rank $r$, its skinny SVD is computed by $M=U_r\varSigma _rV_r^T$, where $\varSigma _r=\mathrm {diag}\left( \sigma _1,\sigma _2,\ldots ,\sigma _r\right) $ with $\{\sigma _i\}_{i=1}^r$ being positive singular values. More precisely, $U_r$ and $V_r$ are formed by taking the first $r$ columns of $U$ and $V$, respectively.
3.
Nevertheless, as shown in Fig. 1b, the algorithm is less efficient while using smaller $\rho $. So ones could choose this parameter by trading off between efficiency and optimality. Here, we introduce a heuristic technique that modifies Step 5 of Algorithm 1 into:
$$\begin{aligned} \mu _{k+1} = \min (10^{6},\rho \mu _k). \end{aligned}$$
In this way, it will be safe to use relatively large $\rho $.

References

F. Bach, Consistency of trace norm minimization. J. Mach. Learn. Res. 9, 1019–1048 (2008)
MathSciNet MATH Google Scholar
S. Burer, R. Monteiro, Local minima and convergence in low-rank semidefinite programming. Math. Program. 103, 427–444 (2005)
Article MathSciNet MATH Google Scholar
J. Cai, S. Osher, Fast singular value thresholding without singular value decomposition. UCLA Technical Report (2010)
Google Scholar
J. Cai, E. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
E. Candés, Y. Plan, Matrix completion with noise. IEEE Proc. 9(6), 925–936 (2010)
Article Google Scholar
E. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Article MathSciNet MATH Google Scholar
E. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 1–37 (2009)
Article Google Scholar
V. Chandrasekaran, S. Sanghavi, P. Parrilo, A. Willsky, Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2009)
Article MathSciNet Google Scholar
A. Edelman, T. Arias, S. Smith, The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20, 303–353 (1999)
Article MathSciNet Google Scholar
M. Fazel, Matrix rank minimization with applications. PhD Thesis (2002)
Google Scholar
N. Halko, P. Martinsson, J. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet MATH Google Scholar
N. Higham, Matrix procrustes problems (1995)
Google Scholar
M. Jaggi, M. Sulovský, A simple algorithm for nuclear norm regularized problems, in International Conference on Machine Learning, pp. 471–478 (2010)
Google Scholar
K.C. Lee, J. Ho, D. Kriegman, Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 684–698 (2005)
Article Google Scholar
Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report, UILU-ENG-09-2215 (2009)
Google Scholar
Z. Lin, R. Liu, Z. Su, Linearized alternating direction method with adaptive penalty for low-rank representation. Neural Inf. Process. Syst. 25, 612–620 (2011)
Google Scholar
G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation. Int. Conf. Mach. Learn. 3, 663–670 (2010)
Google Scholar
G. Liu, S. Yan, Active subspace: toward scalable low-rank learning. Neural Comput. 24(12), 3371–3394 (2012)
Article MathSciNet MATH Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell., Preprint (2012)
Google Scholar
K. Min, Z. Zhang, J. Wright, Y. Ma, Decomposing background topics from keywords by principal component pursuit. Conf. Inf. Knowl. Manag. 269–278 (2010)
Google Scholar
J. Nocedal, S. Wright, Numerical Optimization (Springer, New York, 2006)
MATH Google Scholar
S. Shalev-Shwartz, A. Gonen, O. Shamir, Large-scale convex minimization with a low-rank constraint. Int. Conf. Mach. Learn. 329–336 (2011)
Google Scholar
Y. Shen, Z. Wen, Y. Zhang, Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Technical Report (2011)
Google Scholar
N. Srebro, N. Alon, T. Jaakkola, Generalization error bounds for collaborative prediction with low-rank matrices. Neural Inf. Process. Syst. 5–27 (2005)
Google Scholar
R. Tomioka, T. Suzuki, M. Sugiyama, H. Kashima, A fast augmented lagrangian algorithm for learning low-rank matrices. Int. Conf. Mach. Learn. 1087–1094 (2010)
Google Scholar
P. Tseng, On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM J. Optim. (2008)
Google Scholar
M. Weimer, A. Karatzoglou, Q. Le, A. Smola, Cofi rank—maximum margin matrix factorization for collaborative ranking. Neural Inf. Process. Syst. (2007)
Google Scholar
C. Williams, M. Seeger, The effect of the input density distribution on kernel-based classifiers. Int. Conf. Mach. Learn., 1159–1166 (2000)
Google Scholar
J. Wright, A. Ganesh, S. Rao, Y. Peng, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Neural Inform. Process. Syst. 2080–2088 (2009)
Google Scholar
J. Yang, X. Yuan, An inexact alternating direction method for trace norm regularized least squares problem. Under Rev. Math. Comput. (2010)
Google Scholar
Y. Zhang, Recent advances in alternating direction methods: practice and theory. Tutorial (2010)
Google Scholar
Z. Zhang, X. Liang, A. Ganesh, Y. Ma, TILT: transform invariant low-rank textures. Int. J. Comput. Vis. 99(1), 314–328 (2012)
Article MathSciNet Google Scholar
G. Zhu, S. Yan, Y. Ma, Image tag refinement towards low-rank, content-tag prior and error sparsity. ACM Multimed. 461–470 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Cornell University, Room 101 Weill Hall, 14853, Ithaca, NY, USA
Guangcan Liu
National University of Singapore, Block E4, Engineering Drive 3, 117576, Kent Ridge, Singapore
Shuicheng Yan

Authors

Guangcan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuicheng Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangcan Liu .

Editor information

Editors and Affiliations

Northeastern University, Boston, Massachusetts, USA
Yun Fu

Appendix

1.1 Proof of Lemma 1

The proof is based on the following two lemmas.

Lemma 3

The sequences $\{Y_{k}\}$, $\{\hat{Y}_k\}$ and $\{\tilde{Y}_k\}$ are all bounded.

Proof

By the optimality of $E_{k+1}$, the standard conclusion from convex optimization states that

$$0\in \partial \fancyscript{L}_{E}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k),$$

i.e.,

$$\begin{aligned} Y_{k}+\mu _k(D-Q_{k+1}J_{k+1}-E_{k+1})\in \lambda \partial \Vert E_{k+1}\Vert _1, \end{aligned}$$

which directly leads to

$$\begin{aligned} Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1,\text { and so }\left\| Y_{k+1} \right\| _{\infty }\le \lambda . \end{aligned}$$

(11)

Hence, the sequence $\{Y_k\}$ is bounded.

By the optimality of $Q_{k+1}$, it can be calculated that

$$\begin{aligned} \Vert \tilde{Y}_{k+1}\Vert _F&\le \Vert Y_k+\mu _k(D-Q_kJ_k-E_k)\Vert _F=\Vert Y_k+\rho \mu _{k-1}(D-Q_kJ_k-E_k)\Vert _F\\&= \Vert (1+\rho )Y_k-\rho {}Y_{k-1}\Vert _F. \end{aligned}$$

So $\{\tilde{Y}_k\}$ is bounded due to the boundedness of $\{Y_k\}$.

By the optimality of $J_{k+1}$, the standard conclusion from convex optimization states that

$$0\in \partial \fancyscript{L}_{J}(Q_{k+1},J_{k+1},E_{k},Y_k,\mu _k),$$

which leads to

$$\begin{aligned} Q_{k+1}^{T}\hat{Y}_{k+1}\in \partial \left\| J_{k+1} \right\| _*\text {, and so }\Vert Q_{k+1}^{T}\hat{Y}_{k+1}\Vert _2\le 1. \end{aligned}$$

(12)

At the same time, let $Q_{k+1}^{\bot }$ be the orthogonal component of $Q_{k+1}$, it can be calculated that

$$(Q_{k+1}^{\bot })^T\hat{Y}_{k+1} = (Q_{k+1}^{\bot })^T(Y_k+\mu _k(D-E_k))=(Q_{k+1}^{\bot })^T\tilde{Y}_{k+1}.$$

Hence,

$$\begin{aligned} \Vert (Q_{k+1}^{\bot })^T\hat{Y}_{k+1}\Vert _2=\Vert (Q_{k+1}^{\bot })^T\tilde{Y}_{k+1}\Vert _2\le {}\Vert \tilde{Y}_{k+1}\Vert _2. \end{aligned}$$

So both $Q_{k+1}^{T}\hat{Y}_{k+1}$ and $(Q_{k+1}^{\bot })^T\hat{Y}_{k+1}$ are bounded, which implies that $\hat{Y}_{k+1}$ is bounded. $\square $

Lemma 4

The sequences $\{J_{k}\}$, $\{E_k\}$ and $\{Q_kJ_k\}$ are all bounded.

Proof

From the iteration procedure of Algorithm 1, we have that

$$\begin{aligned} \fancyscript{L}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k)&\le \fancyscript{L}(Q_{k+1},J_{k+1},E_k,Y_k,\mu _k)\\&\le \fancyscript{L}(Q_{k+1},J_k,E_k,Y_k,\mu _k)\\&\le \fancyscript{L}(Q_k,J_k,E_k,Y_k,\mu _k)\\&=\fancyscript{L}(Q_k,J_k,E_k,Y_{k-1},\mu _{k-1})\\&\quad +\frac{\mu _{k-1}+\mu _k}{2\mu _{k-1}^2}\Vert Y_k-Y_{k-1}\Vert _F^2. \end{aligned}$$

So $\{\fancyscript{L}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k)\}$ is upper bounded due to the boundedness of $\{Y_k\}$ and

$$\begin{aligned}\sum _{k=1}^{+\infty }\frac{\mu _{k-1}+\mu _k}{2\mu _{k-1}^2}=\frac{\rho (1+\rho )}{2\mu _0}\sum _{k=1}^{+\infty }\rho ^{-k}=\frac{\rho (1+\rho )}{2\mu _0(\rho -1)}. \end{aligned}$$

Hence,

$$\Vert J_k\Vert _*+\lambda \Vert E_k\Vert _1=\fancyscript{L}(Q_k,J_k,E_k,Y_{k-1},\mu _{k-1})-\frac{1}{2\mu _{k-1}}(\Vert Y_k\Vert _F^2-\Vert Y_{k-1}\Vert _F^2)$$

is upper bounded, which means that $\{J_k\}$ and $\{E_k\}$ are bounded. Since $\Vert Q_{k}J_{k}\Vert _*=\Vert J_k\Vert _*$, $\{Q_kJ_k\}$ is also bounded. $\square $

Proof

(of Lemma 1 ). By the boundedness of $Y_k$, $\hat{Y}_k$ and $\tilde{Y}_{k+1}$ and the fact that $\lim _{k\rightarrow \infty }\mu _k=\infty $,

$$\begin{aligned} \frac{Y_{k+1}-Y_k}{\mu _k}&\rightarrow 0, \\ \frac{\hat{Y}_{k+1}-Y_{k+1}}{\mu _k}&\rightarrow 0, \\ \frac{\tilde{Y}_{k+1}-\hat{Y}_{k+1}}{\mu _k}&\rightarrow 0. \end{aligned}$$

According to the definitions of $Y_{k}$ and $\hat{Y}_{k}$, it can be also calculated that

$$\begin{aligned} E_{k+1}-E_{k}&= \frac{\hat{Y}_{k+1}-Y_{k+1}}{\mu _k},\\ J_{k+1}-J_{k}&= \frac{Q_{k+1}^T(\tilde{Y}_{k+1}-\hat{Y}_{k+1})}{\mu _k},\\ D-Q_{k+1}J_{k+1}-E_{k+1}&= \frac{Y_{k+1}-Y_k}{\mu _k},\\ Q_{k+1}J_{k+1}-Q_{k}J_{k}&= \frac{(1+\rho )Y_{k}-(\hat{Y}_{k+1}+\rho {}Y_{k-1})}{\mu _k}. \end{aligned}$$

Hence, the sequences $\{J_k\}$, $\{E_k\}$ and $\{Q_kJ_k\}$ are Cauchy sequences, and Algorithm 1 can stop within a finite number of iterations.

By the convergence conditions of Algorithm 1, it can be calculated that

$$\begin{aligned} \Vert D-Q^*J^*-E^*\Vert _{\infty }=\Vert \frac{Y_{k^*+1}-Y_{k^*}}{\mu _k}\Vert \le \varepsilon , \end{aligned}$$

where $k^*$ is defined in (6), and $\varepsilon >0$ is the control parameter set in Algorithm 1. $\square $

Note. One may have noticed that $\{Q_k\}$ may not converge. This is because the basis of a subspace is not unique. Nevertheless, it is actually insignificant whether or not $\{Q_k\}$ converges, because it is the product of $Q^*$ and $J^*$, namely $(X=Q^*J^*,E=E^*)$ that recovers a solution to the original RPCA problem.

1.2 Proof of Lemma 2

We prove the following lemma at first.

Lemma 5

Let $X,Y$ and $Q$ are matrices of compatible dimensions. If $Q$ obeys $Q^TQ=\mathtt {I}$ and $Y\in \partial \Vert X\Vert _*$, then

$$QY\in \partial \Vert QX\Vert _*.$$

Proof

Let the skinny SVD of $X$ is $U\varSigma {}V^T$. By $Y\in \partial \Vert X\Vert _*$, we have

$$\begin{aligned} Y=UV^T+W,\text { with } U^TW=0, WV=0\text { and } \Vert W\Vert \le 1. \end{aligned}$$

Since $Q$ is column-orthonormal, we have

$$\begin{aligned} \partial \Vert QX\Vert _*=\{QUV^T+W_1| U^TQ^TW_1=0, W_1V=0\text { and } \Vert W_1\Vert \le 1\}. \end{aligned}$$

With the above notations, it can be verified that $QY\in \partial \Vert QX\Vert _*$.

Proof

of (Lemma 2 ) Let the skinny SVD of $D-E_k+Y_k/\mu _k$ be $D-E_k+Y_k/\mu _k=U_k\varSigma _k{}V_k^T$, then it can be calculated that

$$\begin{aligned} Q_{k+1}=\fancyscript{P}[(D-E_k+\frac{Y_k}{\mu _k})J_k^T]=\fancyscript{P}[U_k\varSigma _k{}V_k^TJ_k^T]. \end{aligned}$$

Let the full SVD of $\varSigma _kV_k^TJ_k^T$ be $\varSigma _kV_k^TJ_k^T=U\varSigma {}V^T$ (note that $U$ and $V$ are orthogonal matrices), then it can be calculated that

$$\begin{aligned} Q_{k+1}=\fancyscript{P}[U_k\varSigma _k{}V_k^TJ_k^T]=\fancyscript{P}[U_kU\varSigma {}V^T]=U_kUV^T, \end{aligned}$$

which simply leads to

$$\begin{aligned} Q_{k+1}Q_{k+1}^T=U_kUV^TVU^TU_k^T=U_kU_k^T. \end{aligned}$$

Hence,

$$\begin{aligned} \hat{Y}_{k+1}-Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}&= \mu _k((D-E_k+\frac{Y_k}{\mu _k})-Q_{k+1}Q_{k+1}^T(D-E_k+\frac{Y_k}{\mu _k}))\\&= \mu _k(U_k\varSigma _kV_k^T-Q_{k+1}Q_{k+1}^TU_k\varSigma _kV_k^T)\\&= \mu _k(U_k\varSigma _kV_k^T-U_kU_k^TU_k\varSigma _kV_k^T)\\&= \mu _k(U_k\varSigma _kV_k^T-U_k\varSigma _kV_k^T)=0, \end{aligned}$$

i.e.,

$$\begin{aligned} \hat{Y}_{k+1}=Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}. \end{aligned}$$

According to (12) and Lemma 5, we have

$$Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}\in \partial \Vert Q_{k+1}J_{k+1}\Vert _*.$$

Hence,

$$\begin{aligned} \hat{Y}_{k+1}\in \partial \Vert Q_{k+1}J_{k+1}\Vert _* \,\text {and}\,Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1,\forall {}k. \end{aligned}$$

where the conclusion of $Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1$ is quoted from (11). Since the above conclusion holds for any $k$, it naturally holds at $(Q^*,J^*,E^*)$:

$$\begin{aligned} \hat{Y}^*=\hat{Y}_{k^*+1}\in \partial \Vert Q^*J^*\Vert _* \,\text {and}\,Y^*=Y_{k^*+1}\in {}\lambda \partial \left\| E^* \right\| _1. \end{aligned}$$

(13)

Given any feasible solution $(Q,J,E)$ to problem (5), by the convexity of matrix norms and (13), it can be calculated that

$$\begin{aligned} \left\| J \right\| _*+\lambda \left\| E \right\| _1&=\left\| QJ \right\| _*+\lambda \left\| E \right\| _1\\&\ge {}\Vert Q^*J^*\Vert _*+\langle \hat{Y}^*,QJ-Q^*J^*\rangle + \lambda \Vert E^*\Vert _1+\langle {}Y^*,E-E^*\rangle \\&=\Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}\hat{Y}^*,QJ+E-Q^*J^*-E^*\rangle \\&\quad +\langle {}Y^*-\hat{Y}^*,E-E^*\rangle . \end{aligned}$$

By Lemma 1, we have that $\Vert QJ+E-Q^*J^*-E^*\Vert _{\infty }\le \Vert D-Q^*J^*-E^*\Vert _{\infty }<\varepsilon $, which leads to

$$\begin{aligned} |\langle {}\hat{Y}_{*},QJ+E-Q^*J^*-E^*\rangle |&\le \Vert \hat{Y}_*\Vert _{\infty }\Vert QJ+E-Q^*J^*-E^*\Vert _1\\&\le \Vert \hat{Y}_*\Vert \Vert D-Q^*J^*-E^*\Vert _1\\&\le mn\Vert D-Q^*J^*-E^*\Vert _{\infty }<mn\varepsilon . \end{aligned}$$

where $\hat{Y}_*\le 1$ is due to (13). Hence,

$$\left\| J \right\| _*+\lambda \left\| E \right\| _1\ge {}\Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}Y^*-\hat{Y}^*,E-E^*\rangle -mn\varepsilon .$$

$\square $

1.3 Proof of Theorem 1

Proof

Notice that $(Q^*,J=0,E=D)$ is feasible to (5). Let $(Q^g,J^g,E^g)$ be a globally optimal solution to (5), then we have

$$\lambda \Vert E^g\Vert _1\le \Vert J^g\Vert _*+\lambda \Vert E^g\Vert _1\le \lambda \Vert D\Vert _1.$$

By the proof procedure of Lemma 4, we have that $E^*$ is bounded by

$$\begin{aligned} \lambda \Vert E^*\Vert _1&\le \Vert J^*\Vert _*+\lambda \Vert E\Vert _1\\&\le \fancyscript{L}(Q_{k^*+1},J_{k^*+1},E_{k^*+1},Y_{k^*},\mu _{k^*})+\frac{\Vert Y_{k^*}\Vert _F^2}{2\mu _{k^*}}\\&\le \frac{mn\lambda ^2}{\mu _0}(\frac{\rho (1+\rho )}{\rho -1}+\frac{1}{2\rho ^{k^*}})\\&= mn\Vert D\Vert \lambda ^2(\frac{\rho (1+\rho )}{\rho -1}+\frac{1}{2\rho ^{k^*}}). \end{aligned}$$

Hence,

$$\begin{aligned} \Vert E^g-E^*\Vert _1\le \Vert E^g\Vert _1+\Vert E^*\Vert _1\le {}c_1. \end{aligned}$$

(14)

Note that $|\langle {}M,N\rangle |\le \Vert M\Vert _{\infty }\Vert N\Vert _1$ holds for any matrices $M$ and $N$. By Lemma 2 and (14), we have

$$\begin{aligned} f^g=\left\| J^g \right\| _*+\lambda \left\| E^g \right\| _1&\ge \Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}Y^*-\hat{Y}^*,E^g-E^*\rangle -mn\varepsilon \\&\ge f^*-\Vert Y^*-\hat{Y}^*\Vert _{\infty }\Vert E^g-E^*\Vert _1-mn\varepsilon \\&= f^*-\varepsilon _1\Vert E^g-E^*\Vert _1-mn\varepsilon \\&\ge f^*-c_1\varepsilon _1-mn\varepsilon , \end{aligned}$$

which simply leads to the inequality stated in Theorem 1. $\square $

1.4 Proof of Theorem 2

Proof

Let $X=Q^*J^*$ and $E=E^*$, then $(X,E)$ is a feasible solution to the original RPCA problem. By the convexity of the RPCA problem and the optimality of $(X^o,E^o)$, it naturally follows that

$$f^0\le {}f^*.$$

Let $X^o=U^o\varSigma ^o{}(V^o)^T$ be the skinny SVD of $X^o$. Construct $Q^{\prime }=U^o$, $J^{\prime }=\varSigma ^o{}(V^o)^T$ and $E^{\prime }=E^o$. When $r\ge {}r_0$, we have

$$D=X^o+E^o=U^o\varSigma ^o{}(V^o)^T+E^o=Q^{\prime }J^{\prime }+E^{\prime },$$

i.e., $(Q^{\prime },J^{\prime },E^{\prime })$ is a feasible solution to problem (5). By Theorem 1, it can be concluded that

$$ f^*-c_1\varepsilon _1-mn\varepsilon \le {}\Vert J^{\prime }\Vert _*+\lambda \Vert E^{\prime }\Vert _*=\Vert \varSigma ^o\Vert _*+\lambda \Vert E^o\Vert _1=f^o. $$

For $r<r_0$, we decompose the skinny SVD of $X^o$ as

$$ X^o=U_0\varSigma {}_0V_0^T+U_1\varSigma {}_1V_1^T, $$

where $U_0,V_0$ (resp. $U_1, V_1$) are the singular vectors associated with the $r$ largest singular values (resp. the rest singular values smaller than or equal to $\sigma _{r}$). With these notations, we have a feasible solution to problem (5) by constructing

$$\begin{aligned} Q^{\prime \prime }=U_0, J^{\prime \prime }=\varSigma {}_0V_0^T \text { and }E^{\prime \prime }=D-U_0\varSigma {}_0V_0^T=E^o+U_1\varSigma {}_1V_1^T. \end{aligned}$$

By Theorem 1, it can be calculated that

$$\begin{aligned} f^*-c_1\varepsilon _1-mn\varepsilon&\le f^g\le \Vert J^{\prime \prime }\Vert _*+\lambda \Vert E^{\prime \prime }\Vert _1\\&=\Vert \varSigma {}_0V_0^T\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&=\Vert \varSigma {}_0\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&\le \Vert X^o\Vert _*-\Vert \varSigma _1\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&\le \Vert X^o\Vert _*-\Vert \varSigma _1\Vert _*+\lambda {}\Vert E^o\Vert _1+\lambda \Vert U_1\varSigma {}_1V_1^T\Vert _1\\&=f^o-\Vert \varSigma _1\Vert _*+\lambda \Vert U_1\varSigma {}_1V_1^T\Vert _1\\&\le f^o-\Vert \varSigma _1\Vert _*+\lambda \sqrt{mn}\Vert U_1\varSigma {}_1V_1^T\Vert _F\\&\le f^o-\Vert \varSigma _1\Vert _*+\lambda \sqrt{mn}\Vert U_1\varSigma {}_1V_1^T\Vert _*\\&= f^o+(\lambda \sqrt{mn}-1)\Vert \varSigma {}_1\Vert _*\\&\le f^o+(\lambda \sqrt{mn}-1)\sigma _{r+1}(r_0-r). \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, G., Yan, S. (2014). Scalable Low-Rank Representation. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-12000-3_3
Published: 30 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11999-1
Online ISBN: 978-3-319-12000-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Lemma 1

Lemma 3

Proof

Lemma 4

Proof

Proof

1.2 Proof of Lemma 2

Lemma 5

Proof

Proof

1.3 Proof of Theorem 1

Proof

1.4 Proof of Theorem 2

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation