Skip to main content

CUR LRA at Sublinear Cost Based on Volume Maximization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11989))

Abstract

A matrix algorithm runs at sublinear cost if it uses much fewer memory cells and arithmetic operations than the input matrix has entries. Such algorithms are indispensable for Big Data Mining and Analysis, where input matrices are so immense that one can only access a small fraction of all their entries. Typically, however, such matrices admit their Low Rank Approximation (LRA), which one can access and process at sublinear cost. Can, however, we compute LRA at sublinear cost? Adversary argument shows that no algorithm running at sublinear cost can output accurate LRA of worst case input matrices or even of the matrices of small families of our Appendix A, but we prove that some sublinear cost algorithms output a reasonably close LRA of a matrix W if (i) this matrix is sufficiently close to a low rank matrix or (ii) it is a Symmetric Positive Semidefinite (SPSD) matrix that admits LRA. In both cases supporting algorithms are deterministic and output LRA in its special form of CUR LRA, particularly memory efficient. The design of our algorithms and the proof of their correctness rely on the results of extensive previous study of CUR LRA in Numerical Linear Algebra using volume maximization. In case (i) we apply Cross-Approximation (C-A) iterations, running at sublinear cost and computing accurate LRA worldwide for more than a decade. We provide the first formal support for this long-known empirical efficiency assuming non-degeneracy of the initial submatrix of at least one C-A iteration. We cannot ensure non-degeneracy at sublinear cost for a worst case input but prove that it holds with a high probability (whp) for any initialization in the case of a random or randomized input. Empirically we can replace randomization with sparse multiplicative preprocessing of an input matrix, performed at sublinear cost. In case (ii) we make no additional assumptions about the input class of SPSD matrices admitting LRA or about initialization of our sublinear cost algorithms for CUR LRA, which promise to be practically valuable. We hope that proper combination of our deterministic techniques with randomized LRA methods, popular among Computer Science researchers, will lead them to further progress in LRA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The papers [PLSZ16], unsuccessfully submitted to ACM STOC 2017 and widely circulated at that time, and [PLSZ17] provided the first formal support for LRA at sublinear cost, which they called “superfast” LRA. Their approach has extended to LRA the earlier study in [PQY15, PZ17a], and [PZ17b] of randomized Gaussian elimination with no pivoting and other fundamental matrix computations. It was followed by sublinear cost randomized LRA algorithms of [MW17].

  2. 2.

    The theorem first appeared in [GT01, Corollary 2.3] in the special case where \(k=l=r\) and \(m=n\).

  3. 3.

    For \(r=1\) an input matrix turns into a vector of dimension m or n, and then we compute its absolutely maximal coordinate just by applying \(m-1\) or \(n-1\) comparisons, respectively (cf. [O17]).

References

  1. Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)

    Article  MathSciNet  Google Scholar 

  2. Chandrasekaran, S., Ipsen, I.: On rank revealing QR factorizations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994)

    Article  MathSciNet  Google Scholar 

  3. Cortinovis, A., Kressner, D., Massei, S.: MATHICSE technical report: on maximum volume submatrices and cross approximation for symmetric semidefinite and diagonally dominant matrices. MATHICSE, 12 February 2019

    Google Scholar 

  4. Çivril, A., Magdon-Ismail, M.: On selecting a maximum volume sub-matrix of a matrix and related problems. Theor. Comput. Sci. 410(47–49), 4801–4811 (2009)

    Article  MathSciNet  Google Scholar 

  5. Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30(2), 844–881 (2008)

    Article  MathSciNet  Google Scholar 

  6. Gu, M., Eisenstat, S.C.: An efficient algorithm for computing a strong rank revealing QR factorization. SIAM J. Sci. Comput. 17, 848–869 (1996)

    Article  MathSciNet  Google Scholar 

  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)

    MATH  Google Scholar 

  8. Goreinov, S.A., Tyrtyshnikov, E.E.: The maximal-volume concept in approximation by low rank matrices. Contemp. Math. 208, 47–51 (2001)

    Article  MathSciNet  Google Scholar 

  9. Goreinov, S.A., Tyrtyshnikov, E.E., Zamarashkin, N.L.: A theory of pseudo-skeleton approximations. Linear Algebra Appl. 261, 1–21 (1997)

    Article  MathSciNet  Google Scholar 

  10. Luan, Q., Pan, V.Y.: Low rank approximation of a matrix at sublinear cost, 21 July 2019. arXiv:1907.10481

  11. Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA 106, 697–702 (2009)

    Article  MathSciNet  Google Scholar 

  12. Musco, C., Woodruff, D.P.: Sublinear time low-rank approximation of positive semidefinite matrices. In: IEEE 58th FOCS, pp. 672–683 (2017)

    Google Scholar 

  13. Osinsky, A.I.: Probabilistic estimation of the rank 1 cross approximation accuracy, submitted on 30 June 2017. arXiv:1706.10285

  14. Osinsky, A.I., Zamarashkin, N.L.: Pseudo-skeleton approximations with better accuracy estimates. Linear Algebra Appl. 537, 221–249 (2018)

    Article  MathSciNet  Google Scholar 

  15. Pan, V.Y., Luan, Q.: Refinement of low rank approximation of a matrix at sub-linear cost, submitted on 10 June 2019. arXiv:1906.04223

  16. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: Primitive and cynical low rank approximation, preprocessing and extensions, submitted on 3 November 2016. arXiv:1611.01391v1

  17. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: Superfast accurate approximation of low rank matrices, submitted on 22 October 2017. arXiv:1710.07946v1

  18. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: CUR low rank approximation at sub-linear cost, submitted on 10 June 2019. arXiv:1906.04112

  19. Pan, V.Y., Qian, G., Yan, X.: Random multipliers numerically stabilize Gaussian and block Gaussian elimination: proofs and an extension to low-rank approximation. Linear Algebra Appl. 481, 202–234 (2015)

    Article  MathSciNet  Google Scholar 

  20. Pan, V.Y., Zhao, L.: New studies of randomized augmentation and additive preprocessing. Linear Algebra Appl. 527, 256–305 (2017)

    Article  MathSciNet  Google Scholar 

  21. Pan, V.Y., Zhao, L.: Numerically safe Gaussian elimination with no pivoting. Linear Algebra Appl. 527, 349–383 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Our research has been supported by NSF Grants CCF-1116736, CCF-1563942, and CCF-133834 and PSC CUNY Award 69813 00 48. We also thank A. Cortinovis, A. Osinsky, N. L. Zamarashkin for pointers to their papers [CKM19] and [OZ18], S. A. Goreinov for reprints, of his papers, and E. E. Tyrtyshnikov for pointers to the bibliography and the challenge of formally supporting empirical power of C-A algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Y. Pan .

Editor information

Editors and Affiliations

Appendices

Appendix

A Small Families of Hard Inputs for Sublinear Cost LRA

Any sublinear cost LRA algorithm fails on the following small input families.

Example 1

Define a family of \(m\times n\) matrices of rank 1 (we call them \(\delta \)-matrices):

$$\{\varDelta _{i,j},~{i=1, \dots ,m;~j=1,\dots ,n}\}.$$

Also include the \(m\times n\) null matrix \(O_{m,n}\) into this family. Now fix any sublinear cost algorithm; it does not access the (ij)th entry of its input matrices for some pair of i and j. Therefore it outputs the same approximation of the matrices \(\varDelta _{i,j}\) and \(O_{m,n}\), with an undetected error at least 1/2. Apply the same argument to the set of \(mn+1\) small-norm perturbations of the matrices of the above family and to the \(mn+1\) sums of the latter matrices with any fixed \(m\times n\) matrix of low rank. Finally, the same argument shows that a posteriori estimation of the output errors of an LRA algorithm applied to the same input families cannot run at sublinear cost.

This example actually covers randomized LRA algorithms as well. Indeed suppose that with a positive constant probability an LRA algorithm does not access K entries of an input matrix. Apply this algorithm to two matrices of low rank whose difference at all these K entries is equal to a large constant C. Then, clearly, with a positive constant probability the algorithm has errors at least C/2 at at least K/2 of these entries.

B Definitions for Matrix Computations and a Lemma

Next we recall some basic definitions for matrix computations (cf. [GL13]).

\(\mathbb C^{m\times n}\) is the class of \(m\times n\) matrices with complex entries.

\(I_s\) denotes the \(s\times s\) identity matrix. \(O_{q,s}\) denotes the \(q\times s\) matrix filled with zeros.

\(\mathrm {diag}(B_1,\dots ,B_k)=\mathrm {diag}(B_j)_{j=1}^k\) denotes a \(k\times k\) block diagonal matrix with diagonal blocks \(B_1,\dots ,B_k\).

\((B_1~|~\dots ~|~B_k)\) and \((B_1,\dots ,B_k)\) denote a \(1\times k\) block matrix with blocks \(B_1,\dots ,B_k\).

\(W^T\) and \(W^*\) denote the transpose and the Hermitian transpose of an \(m\times n\) matrix \(W=(w_{ij})_{i,j=1}^{m,n}\), respectively. \(W^*=W^T\) if the matrix W is real.

For two sets \(\mathcal I\subseteq \{1,\dots ,m\}\) and \(\mathcal J\subseteq \{1,\dots ,n\}\) define the submatrices

$$\begin{aligned} W_{\mathcal I,:}:=(w_{i,j})_{i\in \mathcal I; j=1,\dots , n}, W_{:,\mathcal J}:=(w_{i,j})_{i=1,\dots , m;j\in \mathcal J},~ W_{\mathcal I,\mathcal J}:=(w_{i,j})_{i\in \mathcal I;j\in \mathcal J}. \end{aligned}$$
(B.1)

An \(m\times n\) matrix W is unitary (also orthogonal when real) if \(W^*W=I_n\) or \(WW^*=I_m\).

Compact SVD of a matrix W, hereafter just SVD, is defined by the equations

$$\begin{aligned} \begin{array}{c} W=S_W\varSigma _WT_W^*, \\ \mathrm{where}~S_W^*S_W= T_W^*T_W=I_{\rho },~\varSigma _W:=\mathrm {diag}(\sigma _j(W))_{j=1}^{\rho },~\rho ={\mathrm {rank}(W)}, \end{array} \end{aligned}$$
(B.2)

\(\sigma _j(W)\) denotes the jth largest singular value of W for \(j=1,\dots ,\rho \); \(\sigma _j(W)=0~\mathrm{for}~j>\rho \).

\(||W||=||W||_2\), \(||W||_F\), and \(||W||_C\) denote spectral, Frobenius, and Chebyshev norms of a matrix W, respectively, such that (see [GL13, Section 2.3.2 and Corollary 2.3.2])

$$||W||=\sigma _1(W),~||W||_F^2:=\sum _{i,j=1}^{m,n}|w_{ij}|^2=\sum _{j=1}^{\mathrm {rank}(W)}\sigma _j^2(W),~ ||W||_C:=\max _{i,j=1}^{m,n}|w_{ij}|,$$
$$\begin{aligned} ||W||_C\le ||W||\le ||W||_F\le \sqrt{mn}~||W||_C,~ ||W||_F^2\le \min \{m,n\}~||W||^2. \end{aligned}$$
(B.3)

\(W^+:=T_W\varSigma _W^{-1}S_W^*\) is the Moore–Penrose pseudo inverse of an \(m\times n\) matrix W.

$$\begin{aligned} ||W^+||\sigma _{r}(W)=1 \end{aligned}$$
(B.4)

for a full rank matrix W.

A matrix W has \(\epsilon \)-rank at most \(r>0\) for a fixed tolerance \(\epsilon >0\) if there is a matrix \(W'\) of rank r such that \(||W'-W||/||W||\le \epsilon \). We write \(\mathrm {nrank}(W)=r\) and say that a matrix W has numerical rank r if it has \(\epsilon \)-rank r for a small \(\epsilon \).

Lemma 3

Let \(G\in \mathbb C^{k\times r}\), \(\varSigma \in \mathbb C^{r\times r}\) and \(H\in \mathbb C^{r\times l}\) and let the matrices G, H and \(\varSigma \) have full rank \(r\le \min \{k,l\}\). Then \(||(G\varSigma H)^+|| \le ||G^+||~||\varSigma ^+||~||H^+||\).

A proof of this well-known result is included in [LPa].

C The Volume and r-Projective Volume of a Perturbed Matrix

Theorem 13

Suppose that \(W'\) and E are \(k\times l\) matrices, \(\mathrm {rank}(W')=r\le \min \{k,l\}\), \(W=W'+E\), and \(||E||\le \epsilon \). Then

$$\begin{aligned} \Big (1-\frac{\epsilon }{\sigma _r(W)}\Big )^{r} \le \prod _{j=1}^r\Big (1-\frac{\epsilon }{\sigma _j(W)}\Big ) \le \frac{v_{2,r}(W)}{v_{2,r}(W')}\le \prod _{j=1}^r\Big (1+\frac{\epsilon }{\sigma _j(W)}\Big )\le \Big (1+\frac{\epsilon }{\sigma _r(W)}\Big )^r. \end{aligned}$$
(C.1)

If \(\min \{k,l\}=r\), then \(v_2(W)=v_{2,r}(W)\), \(v_2(W')=v_{2,r}(W')\), and

$$\begin{aligned} \Big (1-\frac{\epsilon }{\sigma _r(W)}\Big )^{r} \le \frac{v_2(W)}{v_2(W')}= \frac{v_{2,r}(W)}{v_{2,r}(W')}\le \Big (1+\frac{\epsilon }{\sigma _r(W)}\Big )^r. \end{aligned}$$
(C.2)

Proof

Bounds (C.1) follow because a perturbation of a matrix within a norm bound \(\epsilon \) changes its singular values by at most \(\epsilon \) (see [GL13, Corollary 8.6.2]). Bounds (C.2) follow because \(v_2(M)=v_{2,r}(M)=\prod _{j=1}^r\sigma _j(M)\) for any \(k\times l\) matrix M with \(\min \{k,l\}=r\), in particular for \(M=W'\) and \(M=W=W'+E\).

If the ratio \(\frac{\epsilon }{\sigma _r(W)}\) is small, then \(\Big (1-\frac{\epsilon }{\sigma _r(W)}\Big )^{r}=1-O\Big (\frac{r\epsilon }{\sigma _r(W)}\Big )\) and \(\Big (1+\frac{\epsilon }{\sigma _r(W)}\Big )^r= 1+O\Big (\frac{r\epsilon }{\sigma _r(W)}\Big )\), which shows that the relative perturbation of the volume is amplified by at most a factor of r in comparison to the relative perturbation of the r largest singular values.

D The Volume and r-Projective Volume of a Matrix Product

Theorem 14

(Cf. [OZ18]). [Examples 2 and 3 below show some limitations on the extension of the theorem.]

Suppose that \(W=GH\) for an \(m\times q\) matrix G and a \(q\times n\) matrix H. Then

  1. (i)

    \(v_2(W)=v_2(G)v_2(H)\) if \(q=\min \{m,n\}\); \(v_2(W)=0\le v_2(G)v_2(H)\) if \(q<\min \{m,n\}\).

  2. (ii)

    \(v_{2,r}(W)\le v_{2,r}(G)v_{2,r}(H)\) for \(1\le r\le q\),

  3. (iii)

    \(v_2(W)\le v_2(G)v_2(H)\) if \(m=n\le q\).

Example 2

If G and H are unitary matrices and if \(GH=O\), then \(v_2(G)=v_2(H)=v_{2,r}(G)=v_{2,r}(H)=1\) and \(v_2(GH)=v_{2,r}(GH)=0\) for all \(r\le q\).

Example 3

If \(G=(1~|~0)\) and \(H=\mathrm {diag}(1,0)\), then \(v_2(G)=v_2(GH)=1\) and \(v_2(H)=0\).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luan, Q., Pan, V.Y. (2020). CUR LRA at Sublinear Cost Based on Volume Maximization. In: Slamanig, D., Tsigaridas, E., Zafeirakopoulos, Z. (eds) Mathematical Aspects of Computer and Information Sciences. MACIS 2019. Lecture Notes in Computer Science(), vol 11989. Springer, Cham. https://doi.org/10.1007/978-3-030-43120-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43120-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43119-8

  • Online ISBN: 978-3-030-43120-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics