Skip to main content

Scalable Low-Rank Representation

  • Chapter
  • First Online:

Abstract

While the optimization problem associated with LRR is convex and easy to solve, it is actually a big challenge to achieve high efficiency, especially under large-scale settings. In this chapter we therefore address the problem of solving nuclear norm regularized optimization problems (NNROPs), which contain a category of problems including LRR. Based on the fact that the optimal solution matrix to an NNROP is often low-rank, we revisit the classic mechanism of low-rank matrix factorization, based on which we present an active subspace algorithm for efficiently solving NNROPs by transforming large-scale NNROPs into small-scale problems. The transformation is achieved by factorizing the large-size solution matrix into the product of a small-size orthonormal matrix (active subspace) and another small-size matrix. Although such a transformation generally leads to non-convex problems, we show that suboptimal solution can be found by the augmented Lagrange alternating direction method. For the robust PCA (RPCA) [7] problem, which is a typical example of NNROPs, theoretical results verify sub-optimality of the solution produced by our algorithm. For the general NNROPs, we empirically show that our algorithm significantly reduces the computational complexity without loss of optimality.

The contents of this chapter have been published on Neural Computation [18]. \(\copyright \) [2014] MIT Press. Reprinted, with permission, from MIT Press Journals.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    More generally, NNROPs are expressed as \(\min _{X}\Vert X\Vert _*+\lambda {}f(x)\), where \(f(x)\) is a convex function. In this work, we are particularly interested in the form (1), which has covered a wide range of problems.

  2. 2.

    For an \(m\times {n}\) matrix \(M\) (without loss of generality, assuming \(m\le {n}\)), its SVD is defined by \(M=U[\varSigma ,0]V^T\), where \(U\) and \(V\) are orthogonal matrices and \(\varSigma =\mathrm {diag}\left( \sigma _1,\sigma _2,\ldots ,\sigma _m\right) \) with \(\{\sigma _i\}_{i=1}^m\) being singular values. The SVD defined in this way is also called the full SVD. If we only calculate the \(m\) column vectors of \(V\), i.e., \(M=U\varSigma {}V^T\) with \(U\in \fancyscript{R}^{m\times {}m}\), \(\varSigma \in \fancyscript{R}^{m\times {}m}\), and \(V\in \fancyscript{R}^{n\times {}m}\), the simplified form is called the thin SVD. If we only keep the positive singular values, the reduced form is called the skinny SVD. For a matrix \(M\) of rank \(r\), its skinny SVD is computed by \(M=U_r\varSigma _rV_r^T\), where \(\varSigma _r=\mathrm {diag}\left( \sigma _1,\sigma _2,\ldots ,\sigma _r\right) \) with \(\{\sigma _i\}_{i=1}^r\) being positive singular values. More precisely, \(U_r\) and \(V_r\) are formed by taking the first \(r\) columns of \(U\) and \(V\), respectively.

  3. 3.

    Nevertheless, as shown in Fig. 1b, the algorithm is less efficient while using smaller \(\rho \). So ones could choose this parameter by trading off between efficiency and optimality. Here, we introduce a heuristic technique that modifies Step 5 of Algorithm 1 into:

    $$\begin{aligned} \mu _{k+1} = \min (10^{6},\rho \mu _k). \end{aligned}$$

    In this way, it will be safe to use relatively large \(\rho \).

References

  1. F. Bach, Consistency of trace norm minimization. J. Mach. Learn. Res. 9, 1019–1048 (2008)

    MathSciNet  MATH  Google Scholar 

  2. S. Burer, R. Monteiro, Local minima and convergence in low-rank semidefinite programming. Math. Program. 103, 427–444 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. J. Cai, S. Osher, Fast singular value thresholding without singular value decomposition. UCLA Technical Report (2010)

    Google Scholar 

  4. J. Cai, E. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. E. Candés, Y. Plan, Matrix completion with noise. IEEE Proc. 9(6), 925–936 (2010)

    Article  Google Scholar 

  6. E. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. E. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58(3), 1–37 (2009)

    Article  Google Scholar 

  8. V. Chandrasekaran, S. Sanghavi, P. Parrilo, A. Willsky, Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2009)

    Article  MathSciNet  Google Scholar 

  9. A. Edelman, T. Arias, S. Smith, The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20, 303–353 (1999)

    Article  MathSciNet  Google Scholar 

  10. M. Fazel, Matrix rank minimization with applications. PhD Thesis (2002)

    Google Scholar 

  11. N. Halko, P. Martinsson, J. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. N. Higham, Matrix procrustes problems (1995)

    Google Scholar 

  13. M. Jaggi, M. Sulovský, A simple algorithm for nuclear norm regularized problems, in International Conference on Machine Learning, pp. 471–478 (2010)

    Google Scholar 

  14. K.C. Lee, J. Ho, D. Kriegman, Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 684–698 (2005)

    Article  Google Scholar 

  15. Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report, UILU-ENG-09-2215 (2009)

    Google Scholar 

  16. Z. Lin, R. Liu, Z. Su, Linearized alternating direction method with adaptive penalty for low-rank representation. Neural Inf. Process. Syst. 25, 612–620 (2011)

    Google Scholar 

  17. G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation. Int. Conf. Mach. Learn. 3, 663–670 (2010)

    Google Scholar 

  18. G. Liu, S. Yan, Active subspace: toward scalable low-rank learning. Neural Comput. 24(12), 3371–3394 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell., Preprint (2012)

    Google Scholar 

  20. K. Min, Z. Zhang, J. Wright, Y. Ma, Decomposing background topics from keywords by principal component pursuit. Conf. Inf. Knowl. Manag. 269–278 (2010)

    Google Scholar 

  21. J. Nocedal, S. Wright, Numerical Optimization (Springer, New York, 2006)

    MATH  Google Scholar 

  22. S. Shalev-Shwartz, A. Gonen, O. Shamir, Large-scale convex minimization with a low-rank constraint. Int. Conf. Mach. Learn. 329–336 (2011)

    Google Scholar 

  23. Y. Shen, Z. Wen, Y. Zhang, Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Technical Report (2011)

    Google Scholar 

  24. N. Srebro, N. Alon, T. Jaakkola, Generalization error bounds for collaborative prediction with low-rank matrices. Neural Inf. Process. Syst. 5–27 (2005)

    Google Scholar 

  25. R. Tomioka, T. Suzuki, M. Sugiyama, H. Kashima, A fast augmented lagrangian algorithm for learning low-rank matrices. Int. Conf. Mach. Learn. 1087–1094 (2010)

    Google Scholar 

  26. P. Tseng, On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM J. Optim. (2008)

    Google Scholar 

  27. M. Weimer, A. Karatzoglou, Q. Le, A. Smola, Cofi rank—maximum margin matrix factorization for collaborative ranking. Neural Inf. Process. Syst. (2007)

    Google Scholar 

  28. C. Williams, M. Seeger, The effect of the input density distribution on kernel-based classifiers. Int. Conf. Mach. Learn., 1159–1166 (2000)

    Google Scholar 

  29. J. Wright, A. Ganesh, S. Rao, Y. Peng, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Neural Inform. Process. Syst. 2080–2088 (2009)

    Google Scholar 

  30. J. Yang, X. Yuan, An inexact alternating direction method for trace norm regularized least squares problem. Under Rev. Math. Comput. (2010)

    Google Scholar 

  31. Y. Zhang, Recent advances in alternating direction methods: practice and theory. Tutorial (2010)

    Google Scholar 

  32. Z. Zhang, X. Liang, A. Ganesh, Y. Ma, TILT: transform invariant low-rank textures. Int. J. Comput. Vis. 99(1), 314–328 (2012)

    Article  MathSciNet  Google Scholar 

  33. G. Zhu, S. Yan, Y. Ma, Image tag refinement towards low-rank, content-tag prior and error sparsity. ACM Multimed. 461–470 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangcan Liu .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Lemma 1

The proof is based on the following two lemmas.

Lemma 3

The sequences \(\{Y_{k}\}\), \(\{\hat{Y}_k\}\) and \(\{\tilde{Y}_k\}\) are all bounded.

Proof

By the optimality of \(E_{k+1}\), the standard conclusion from convex optimization states that

$$0\in \partial \fancyscript{L}_{E}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k),$$

i.e.,

$$\begin{aligned} Y_{k}+\mu _k(D-Q_{k+1}J_{k+1}-E_{k+1})\in \lambda \partial \Vert E_{k+1}\Vert _1, \end{aligned}$$

which directly leads to

$$\begin{aligned} Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1,\text { and so }\left\| Y_{k+1} \right\| _{\infty }\le \lambda . \end{aligned}$$
(11)

Hence, the sequence \(\{Y_k\}\) is bounded.

By the optimality of \(Q_{k+1}\), it can be calculated that

$$\begin{aligned} \Vert \tilde{Y}_{k+1}\Vert _F&\le \Vert Y_k+\mu _k(D-Q_kJ_k-E_k)\Vert _F=\Vert Y_k+\rho \mu _{k-1}(D-Q_kJ_k-E_k)\Vert _F\\&= \Vert (1+\rho )Y_k-\rho {}Y_{k-1}\Vert _F. \end{aligned}$$

So \(\{\tilde{Y}_k\}\) is bounded due to the boundedness of \(\{Y_k\}\).

By the optimality of \(J_{k+1}\), the standard conclusion from convex optimization states that

$$0\in \partial \fancyscript{L}_{J}(Q_{k+1},J_{k+1},E_{k},Y_k,\mu _k),$$

which leads to

$$\begin{aligned} Q_{k+1}^{T}\hat{Y}_{k+1}\in \partial \left\| J_{k+1} \right\| _*\text {, and so }\Vert Q_{k+1}^{T}\hat{Y}_{k+1}\Vert _2\le 1. \end{aligned}$$
(12)

At the same time, let \(Q_{k+1}^{\bot }\) be the orthogonal component of \(Q_{k+1}\), it can be calculated that

$$(Q_{k+1}^{\bot })^T\hat{Y}_{k+1} = (Q_{k+1}^{\bot })^T(Y_k+\mu _k(D-E_k))=(Q_{k+1}^{\bot })^T\tilde{Y}_{k+1}.$$

Hence,

$$\begin{aligned} \Vert (Q_{k+1}^{\bot })^T\hat{Y}_{k+1}\Vert _2=\Vert (Q_{k+1}^{\bot })^T\tilde{Y}_{k+1}\Vert _2\le {}\Vert \tilde{Y}_{k+1}\Vert _2. \end{aligned}$$

So both \(Q_{k+1}^{T}\hat{Y}_{k+1}\) and \((Q_{k+1}^{\bot })^T\hat{Y}_{k+1}\) are bounded, which implies that \(\hat{Y}_{k+1}\) is bounded.   \(\square \)

Lemma 4

The sequences \(\{J_{k}\}\), \(\{E_k\}\) and \(\{Q_kJ_k\}\) are all bounded.

Proof

From the iteration procedure of Algorithm 1, we have that

$$\begin{aligned} \fancyscript{L}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k)&\le \fancyscript{L}(Q_{k+1},J_{k+1},E_k,Y_k,\mu _k)\\&\le \fancyscript{L}(Q_{k+1},J_k,E_k,Y_k,\mu _k)\\&\le \fancyscript{L}(Q_k,J_k,E_k,Y_k,\mu _k)\\&=\fancyscript{L}(Q_k,J_k,E_k,Y_{k-1},\mu _{k-1})\\&\quad +\frac{\mu _{k-1}+\mu _k}{2\mu _{k-1}^2}\Vert Y_k-Y_{k-1}\Vert _F^2. \end{aligned}$$

So \(\{\fancyscript{L}(Q_{k+1},J_{k+1},E_{k+1},Y_k,\mu _k)\}\) is upper bounded due to the boundedness of \(\{Y_k\}\) and

$$\begin{aligned}\sum _{k=1}^{+\infty }\frac{\mu _{k-1}+\mu _k}{2\mu _{k-1}^2}=\frac{\rho (1+\rho )}{2\mu _0}\sum _{k=1}^{+\infty }\rho ^{-k}=\frac{\rho (1+\rho )}{2\mu _0(\rho -1)}. \end{aligned}$$

Hence,

$$\Vert J_k\Vert _*+\lambda \Vert E_k\Vert _1=\fancyscript{L}(Q_k,J_k,E_k,Y_{k-1},\mu _{k-1})-\frac{1}{2\mu _{k-1}}(\Vert Y_k\Vert _F^2-\Vert Y_{k-1}\Vert _F^2)$$

is upper bounded, which means that \(\{J_k\}\) and \(\{E_k\}\) are bounded. Since \(\Vert Q_{k}J_{k}\Vert _*=\Vert J_k\Vert _*\), \(\{Q_kJ_k\}\) is also bounded.   \(\square \)

Proof

(of Lemma 1 ). By the boundedness of \(Y_k\), \(\hat{Y}_k\) and \(\tilde{Y}_{k+1}\) and the fact that \(\lim _{k\rightarrow \infty }\mu _k=\infty \),

$$\begin{aligned} \frac{Y_{k+1}-Y_k}{\mu _k}&\rightarrow 0, \\ \frac{\hat{Y}_{k+1}-Y_{k+1}}{\mu _k}&\rightarrow 0, \\ \frac{\tilde{Y}_{k+1}-\hat{Y}_{k+1}}{\mu _k}&\rightarrow 0. \end{aligned}$$

According to the definitions of \(Y_{k}\) and \(\hat{Y}_{k}\), it can be also calculated that

$$\begin{aligned} E_{k+1}-E_{k}&= \frac{\hat{Y}_{k+1}-Y_{k+1}}{\mu _k},\\ J_{k+1}-J_{k}&= \frac{Q_{k+1}^T(\tilde{Y}_{k+1}-\hat{Y}_{k+1})}{\mu _k},\\ D-Q_{k+1}J_{k+1}-E_{k+1}&= \frac{Y_{k+1}-Y_k}{\mu _k},\\ Q_{k+1}J_{k+1}-Q_{k}J_{k}&= \frac{(1+\rho )Y_{k}-(\hat{Y}_{k+1}+\rho {}Y_{k-1})}{\mu _k}. \end{aligned}$$

Hence, the sequences \(\{J_k\}\), \(\{E_k\}\) and \(\{Q_kJ_k\}\) are Cauchy sequences, and Algorithm 1 can stop within a finite number of iterations.

By the convergence conditions of Algorithm 1, it can be calculated that

$$\begin{aligned} \Vert D-Q^*J^*-E^*\Vert _{\infty }=\Vert \frac{Y_{k^*+1}-Y_{k^*}}{\mu _k}\Vert \le \varepsilon , \end{aligned}$$

where \(k^*\) is defined in (6), and \(\varepsilon >0\) is the control parameter set in Algorithm 1.   \(\square \)

Note. One may have noticed that \(\{Q_k\}\) may not converge. This is because the basis of a subspace is not unique. Nevertheless, it is actually insignificant whether or not \(\{Q_k\}\) converges, because it is the product of \(Q^*\) and \(J^*\), namely \((X=Q^*J^*,E=E^*)\) that recovers a solution to the original RPCA problem.

1.2 Proof of Lemma 2

We prove the following lemma at first.

Lemma 5

Let \(X,Y\) and \(Q\) are matrices of compatible dimensions. If \(Q\) obeys \(Q^TQ=\mathtt {I}\) and \(Y\in \partial \Vert X\Vert _*\), then

$$QY\in \partial \Vert QX\Vert _*.$$

Proof

Let the skinny SVD of \(X\) is \(U\varSigma {}V^T\). By \(Y\in \partial \Vert X\Vert _*\), we have

$$\begin{aligned} Y=UV^T+W,\text { with } U^TW=0, WV=0\text { and } \Vert W\Vert \le 1. \end{aligned}$$

Since \(Q\) is column-orthonormal, we have

$$\begin{aligned} \partial \Vert QX\Vert _*=\{QUV^T+W_1| U^TQ^TW_1=0, W_1V=0\text { and } \Vert W_1\Vert \le 1\}. \end{aligned}$$

With the above notations, it can be verified that \(QY\in \partial \Vert QX\Vert _*\).

Proof

of (Lemma 2 ) Let the skinny SVD of \(D-E_k+Y_k/\mu _k\) be \(D-E_k+Y_k/\mu _k=U_k\varSigma _k{}V_k^T\), then it can be calculated that

$$\begin{aligned} Q_{k+1}=\fancyscript{P}[(D-E_k+\frac{Y_k}{\mu _k})J_k^T]=\fancyscript{P}[U_k\varSigma _k{}V_k^TJ_k^T]. \end{aligned}$$

Let the full SVD of \(\varSigma _kV_k^TJ_k^T\) be \(\varSigma _kV_k^TJ_k^T=U\varSigma {}V^T\) (note that \(U\) and \(V\) are orthogonal matrices), then it can be calculated that

$$\begin{aligned} Q_{k+1}=\fancyscript{P}[U_k\varSigma _k{}V_k^TJ_k^T]=\fancyscript{P}[U_kU\varSigma {}V^T]=U_kUV^T, \end{aligned}$$

which simply leads to

$$\begin{aligned} Q_{k+1}Q_{k+1}^T=U_kUV^TVU^TU_k^T=U_kU_k^T. \end{aligned}$$

Hence,

$$\begin{aligned} \hat{Y}_{k+1}-Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}&= \mu _k((D-E_k+\frac{Y_k}{\mu _k})-Q_{k+1}Q_{k+1}^T(D-E_k+\frac{Y_k}{\mu _k}))\\&= \mu _k(U_k\varSigma _kV_k^T-Q_{k+1}Q_{k+1}^TU_k\varSigma _kV_k^T)\\&= \mu _k(U_k\varSigma _kV_k^T-U_kU_k^TU_k\varSigma _kV_k^T)\\&= \mu _k(U_k\varSigma _kV_k^T-U_k\varSigma _kV_k^T)=0, \end{aligned}$$

i.e.,

$$\begin{aligned} \hat{Y}_{k+1}=Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}. \end{aligned}$$

According to (12) and Lemma 5, we have

$$Q_{k+1}Q_{k+1}^T\hat{Y}_{k+1}\in \partial \Vert Q_{k+1}J_{k+1}\Vert _*.$$

Hence,

$$\begin{aligned} \hat{Y}_{k+1}\in \partial \Vert Q_{k+1}J_{k+1}\Vert _* \,\text {and}\,Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1,\forall {}k. \end{aligned}$$

where the conclusion of \(Y_{k+1}\in {}\lambda \partial \left\| E_{k+1} \right\| _1\) is quoted from (11). Since the above conclusion holds for any \(k\), it naturally holds at \((Q^*,J^*,E^*)\):

$$\begin{aligned} \hat{Y}^*=\hat{Y}_{k^*+1}\in \partial \Vert Q^*J^*\Vert _* \,\text {and}\,Y^*=Y_{k^*+1}\in {}\lambda \partial \left\| E^* \right\| _1. \end{aligned}$$
(13)

Given any feasible solution \((Q,J,E)\) to problem (5), by the convexity of matrix norms and (13), it can be calculated that

$$\begin{aligned} \left\| J \right\| _*+\lambda \left\| E \right\| _1&=\left\| QJ \right\| _*+\lambda \left\| E \right\| _1\\&\ge {}\Vert Q^*J^*\Vert _*+\langle \hat{Y}^*,QJ-Q^*J^*\rangle + \lambda \Vert E^*\Vert _1+\langle {}Y^*,E-E^*\rangle \\&=\Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}\hat{Y}^*,QJ+E-Q^*J^*-E^*\rangle \\&\quad +\langle {}Y^*-\hat{Y}^*,E-E^*\rangle . \end{aligned}$$

By Lemma 1, we have that \(\Vert QJ+E-Q^*J^*-E^*\Vert _{\infty }\le \Vert D-Q^*J^*-E^*\Vert _{\infty }<\varepsilon \), which leads to

$$\begin{aligned} |\langle {}\hat{Y}_{*},QJ+E-Q^*J^*-E^*\rangle |&\le \Vert \hat{Y}_*\Vert _{\infty }\Vert QJ+E-Q^*J^*-E^*\Vert _1\\&\le \Vert \hat{Y}_*\Vert \Vert D-Q^*J^*-E^*\Vert _1\\&\le mn\Vert D-Q^*J^*-E^*\Vert _{\infty }<mn\varepsilon . \end{aligned}$$

where \(\hat{Y}_*\le 1\) is due to (13). Hence,

$$\left\| J \right\| _*+\lambda \left\| E \right\| _1\ge {}\Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}Y^*-\hat{Y}^*,E-E^*\rangle -mn\varepsilon .$$

   \(\square \)

1.3 Proof of Theorem 1

Proof

Notice that \((Q^*,J=0,E=D)\) is feasible to (5). Let \((Q^g,J^g,E^g)\) be a globally optimal solution to (5), then we have

$$\lambda \Vert E^g\Vert _1\le \Vert J^g\Vert _*+\lambda \Vert E^g\Vert _1\le \lambda \Vert D\Vert _1.$$

By the proof procedure of Lemma 4, we have that \(E^*\) is bounded by

$$\begin{aligned} \lambda \Vert E^*\Vert _1&\le \Vert J^*\Vert _*+\lambda \Vert E\Vert _1\\&\le \fancyscript{L}(Q_{k^*+1},J_{k^*+1},E_{k^*+1},Y_{k^*},\mu _{k^*})+\frac{\Vert Y_{k^*}\Vert _F^2}{2\mu _{k^*}}\\&\le \frac{mn\lambda ^2}{\mu _0}(\frac{\rho (1+\rho )}{\rho -1}+\frac{1}{2\rho ^{k^*}})\\&= mn\Vert D\Vert \lambda ^2(\frac{\rho (1+\rho )}{\rho -1}+\frac{1}{2\rho ^{k^*}}). \end{aligned}$$

Hence,

$$\begin{aligned} \Vert E^g-E^*\Vert _1\le \Vert E^g\Vert _1+\Vert E^*\Vert _1\le {}c_1. \end{aligned}$$
(14)

Note that \(|\langle {}M,N\rangle |\le \Vert M\Vert _{\infty }\Vert N\Vert _1\) holds for any matrices \(M\) and \(N\). By Lemma 2 and (14), we have

$$\begin{aligned} f^g=\left\| J^g \right\| _*+\lambda \left\| E^g \right\| _1&\ge \Vert J^*\Vert _*+\lambda \Vert E^*\Vert _1+\langle {}Y^*-\hat{Y}^*,E^g-E^*\rangle -mn\varepsilon \\&\ge f^*-\Vert Y^*-\hat{Y}^*\Vert _{\infty }\Vert E^g-E^*\Vert _1-mn\varepsilon \\&= f^*-\varepsilon _1\Vert E^g-E^*\Vert _1-mn\varepsilon \\&\ge f^*-c_1\varepsilon _1-mn\varepsilon , \end{aligned}$$

which simply leads to the inequality stated in Theorem 1.   \(\square \)

1.4 Proof of Theorem 2

Proof

Let \(X=Q^*J^*\) and \(E=E^*\), then \((X,E)\) is a feasible solution to the original RPCA problem. By the convexity of the RPCA problem and the optimality of \((X^o,E^o)\), it naturally follows that

$$f^0\le {}f^*.$$

Let \(X^o=U^o\varSigma ^o{}(V^o)^T\) be the skinny SVD of \(X^o\). Construct \(Q^{\prime }=U^o\), \(J^{\prime }=\varSigma ^o{}(V^o)^T\) and \(E^{\prime }=E^o\). When \(r\ge {}r_0\), we have

$$D=X^o+E^o=U^o\varSigma ^o{}(V^o)^T+E^o=Q^{\prime }J^{\prime }+E^{\prime },$$

i.e., \((Q^{\prime },J^{\prime },E^{\prime })\) is a feasible solution to problem (5). By Theorem 1, it can be concluded that

$$ f^*-c_1\varepsilon _1-mn\varepsilon \le {}\Vert J^{\prime }\Vert _*+\lambda \Vert E^{\prime }\Vert _*=\Vert \varSigma ^o\Vert _*+\lambda \Vert E^o\Vert _1=f^o. $$

For \(r<r_0\), we decompose the skinny SVD of \(X^o\) as

$$ X^o=U_0\varSigma {}_0V_0^T+U_1\varSigma {}_1V_1^T, $$

where \(U_0,V_0\) (resp. \(U_1, V_1\)) are the singular vectors associated with the \(r\) largest singular values (resp. the rest singular values smaller than or equal to \(\sigma _{r}\)). With these notations, we have a feasible solution to problem (5) by constructing

$$\begin{aligned} Q^{\prime \prime }=U_0, J^{\prime \prime }=\varSigma {}_0V_0^T \text { and }E^{\prime \prime }=D-U_0\varSigma {}_0V_0^T=E^o+U_1\varSigma {}_1V_1^T. \end{aligned}$$

By Theorem 1, it can be calculated that

$$\begin{aligned} f^*-c_1\varepsilon _1-mn\varepsilon&\le f^g\le \Vert J^{\prime \prime }\Vert _*+\lambda \Vert E^{\prime \prime }\Vert _1\\&=\Vert \varSigma {}_0V_0^T\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&=\Vert \varSigma {}_0\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&\le \Vert X^o\Vert _*-\Vert \varSigma _1\Vert _*+\lambda {}\Vert E^o+U_1\varSigma {}_1V_1^T\Vert _1\\&\le \Vert X^o\Vert _*-\Vert \varSigma _1\Vert _*+\lambda {}\Vert E^o\Vert _1+\lambda \Vert U_1\varSigma {}_1V_1^T\Vert _1\\&=f^o-\Vert \varSigma _1\Vert _*+\lambda \Vert U_1\varSigma {}_1V_1^T\Vert _1\\&\le f^o-\Vert \varSigma _1\Vert _*+\lambda \sqrt{mn}\Vert U_1\varSigma {}_1V_1^T\Vert _F\\&\le f^o-\Vert \varSigma _1\Vert _*+\lambda \sqrt{mn}\Vert U_1\varSigma {}_1V_1^T\Vert _*\\&= f^o+(\lambda \sqrt{mn}-1)\Vert \varSigma {}_1\Vert _*\\&\le f^o+(\lambda \sqrt{mn}-1)\sigma _{r+1}(r_0-r). \end{aligned}$$

   \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Liu, G., Yan, S. (2014). Scalable Low-Rank Representation. In: Fu, Y. (eds) Low-Rank and Sparse Modeling for Visual Analysis. Springer, Cham. https://doi.org/10.1007/978-3-319-12000-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12000-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11999-1

  • Online ISBN: 978-3-319-12000-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics