Skip to main content

Abstract

We prove that any real matrix A contains a subset of at most 4k/ε+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+ε) times the error of the best rank-k approximation of A. We complement it with an almost matching lower bound by constructing matrices where the span of any k/2ε rows does not “contain” a relative (1+ε)-approximation of rank k. Our existence result leads to an algorithm that finds such rank-k approximation in time

\( O \left( M \left( \frac{k}{\epsilon} + k^{2} \log k \right) + (m+n) \left( \frac{k^{2}}{\epsilon^{2}} + \frac{k^{3} \log k}{\epsilon} + k^{4} \log^{2} k \right) \right), \)

i.e., essentially O(Mk/ε), where M is the number of nonzero entries of A. The algorithm maintains sparsity, and in the streaming model [12,14,15], it can be implemented using only 2(k+1)(log(k+1)+1) passes over the input matrix and \(O \left( \min \{ m, n \} (\frac{k}{\epsilon} + k^{2} \log k) \right)\) additional space. Previous algorithms for low-rank approximation use only one or two passes but obtain an additive approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arora, S., Hazan, E., Kale, S.: A Fast Random Sampling Algorithm for Sparsifying Matrices. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds.) APPROX 2006 and RANDOM 2006. LNCS, vol. 4110, pp. 272–279. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Achlioptas, D., McSherry, F.: Fast Computation of Low Rank Approximations. In: Proceedings of the 33rd Annual Symposium on Theory of Computing (2001)

    Google Scholar 

  3. Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., Park, J.: Fast Algorithms for Projected Clustering. In: Proceedings of SIGMOD (1999)

    Google Scholar 

  4. Bar-Yosseff, Z.: Sampling Lower Bounds via Information Theory. In: Proceedings of the 35th Annual Symposium on Theory of Computing (2003)

    Google Scholar 

  5. de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing (2003)

    Google Scholar 

  6. Drineas, P.: Personal communication (2006)

    Google Scholar 

  7. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: Proceedings of the 10th SODA (1999)

    Google Scholar 

  8. Drineas, P., Kannan, R.: Pass Efficient Algorithm for approximating large matrices. In: Proceedings of 14th SODA (2003)

    Google Scholar 

  9. Drineas, P., Kannan, R., Mahoney, M.: Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix. Yale University Technical Report, YALEU/DCS/TR-1270 (2004)

    Google Scholar 

  10. Drineas, P., Mahoney, M., Muthukrishnan, S.: Polynomial time algorithm for column-row based relative error low-rank matrix approximation. DIMACS Technical Report 2006-04 (2006)

    Google Scholar 

  11. Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix Approximation and Projective Clustering via Volume Sampling. In: Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms (SODA) (2006)

    Google Scholar 

  12. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On Graph Problems in a Semi-Streaming Model. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142. Springer, Heidelberg (2004)

    Google Scholar 

  13. Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM 51(6), 1025–1041 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  14. Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of 33rd ACM Symposium on Theory of Computing (2001)

    Google Scholar 

  15. Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on Data Streams. Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA (May 1998)

    Google Scholar 

  16. Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry, 61–84 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deshpande, A., Vempala, S. (2006). Adaptive Sampling and Fast Low-Rank Matrix Approximation. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2006 2006. Lecture Notes in Computer Science, vol 4110. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11830924_28

Download citation

  • DOI: https://doi.org/10.1007/11830924_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-38044-3

  • Online ISBN: 978-3-540-38045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics