Skip to main content

Improved Practical Matrix Sketching with Guarantees

  • Conference paper
Algorithms - ESA 2014 (ESA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8737))

Included in the following conference series:

Abstract

Matrices have become essential data representations for many large-scale problems in data analytics, and hence matrix sketching is a critical task. Although much research has focused on improving the error/size tradeoff under various sketching paradigms, we find a simple heuristic iSVD, with no guarantees, tends to outperform all known approaches. In this paper we adapt the best performing guaranteed algorithm, FrequentDirections, in a way that preserves the guarantees, and nearly matches iSVD in practice. We also demonstrate an adversarial dataset for which iSVD performs quite poorly, but our new technique has almost no error. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Concept drift in machine learning and knowledge discovery group, http://mlkd.csd.auth.gr/concept_drift.html

  2. vision.caltech, http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

  3. Achlioptas, D., McSherry, F.: Fast computation of low rank matrix approximations. In: STOC (2001)

    Google Scholar 

  4. Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)

    Google Scholar 

  5. Arasu, A., Babu, S., Widom, J.: An abstract semantics and concrete language for continuous queries over streams and relations (2002)

    Google Scholar 

  6. Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near optimal column-based matrix reconstruction. In: FOCS (2011)

    Google Scholar 

  8. Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. ACM SIGMOD Record 29(2), 379–390 (2000)

    Article  Google Scholar 

  10. Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC (2009)

    Google Scholar 

  11. Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: VLDB (2008)

    Google Scholar 

  12. Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: KDD (2000)

    Google Scholar 

  13. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, p. 348. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Deshpande, A., Vempala, S.S.: Adaptive Sampling and Fast Low-Rank Matrix Approximation. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds.) APPROX 2006 and RANDOM 2006. LNCS, vol. 4110, pp. 292–303. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Drineas, P., Kannan, R.: Pass efficient algorithms for approximating large matrices. In: SODA (2003)

    Google Scholar 

  16. Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 3636(1), 158–183 (2006)

    Article  MathSciNet  Google Scholar 

  17. Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications 30, 844–881 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  18. Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low-rank approximations. In: FOCS (1998)

    Google Scholar 

  19. Ghashami, M., Liberty, E., Phillips, J.M.: Frequent directions: Simple and deterministic matrix sketchings. In: Personal Communication (2014)

    Google Scholar 

  20. Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximation. In: SODA (2014)

    Google Scholar 

  21. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Quicksand: Quick summary and analysis of network data. Technical report, DIMACS 2001-43 (2001)

    Google Scholar 

  22. Golub, G.H., van Loan, C.F.: Matrix Computations, vol. 3. JHUP (2012)

    Google Scholar 

  23. Hall, P., Marshall, D., Martin, R.: Incremental eigenanalysis for classification. In: British Machine Vision Conference (1998)

    Google Scholar 

  24. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM ToDS 28, 51–55 (2003)

    Article  Google Scholar 

  25. Levey, A., Lindenbaum, M.: Sequential Karhunen-Loeve basis extraction and its application to images. IEEE ToIP 9, 1371–1374 (2000)

    Google Scholar 

  26. Liberty, E.: Simple and deterministic matrix sketching. In: KDD (2013)

    Google Scholar 

  27. Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. PNAS 104(51), 20167–20172 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  28. Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. PNAS 106, 697–702 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  29. Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM ToDS 31, 1095–1133 (2006)

    Article  Google Scholar 

  30. Misra, J., Gries, D.: Finding repeated elements. Sc. Comp. Prog. 2, 143–152 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  31. Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: PODS (1998)

    Google Scholar 

  32. Ricci, R.: Apt (adaptable profile-driven testbed) (2014), http://www.flux.utah.edu/project/apt

  33. Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. IJCV 77, 125–141 (2008)

    Article  Google Scholar 

  34. Rudelson, M., Vershynin, R.: Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM 54(4), 21 (2007)

    Article  MathSciNet  Google Scholar 

  35. Sarlos, T.: Improved approximation algorithms for large matrices via random projections. In: FOCS (2006)

    Google Scholar 

  36. Sullivan, M., Heybey, A.: A system for managing large databases of network traffic. In: Proceedings of USENIX (1998)

    Google Scholar 

  37. Emulab testbed, http://www.flux.utah.edu/project/emulab

  38. Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hashing for large scale multitask learning. In: ICML (2009)

    Google Scholar 

  39. Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghashami, M., Desai, A., Phillips, J.M. (2014). Improved Practical Matrix Sketching with Guarantees. In: Schulz, A.S., Wagner, D. (eds) Algorithms - ESA 2014. ESA 2014. Lecture Notes in Computer Science, vol 8737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44777-2_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44777-2_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44776-5

  • Online ISBN: 978-3-662-44777-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics