Skip to main content

A Survey on Streaming Algorithms for Massive Graphs

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

Streaming is an important paradigm for handling massive graphs that are too large to fit in the main memory. In the streaming computational model, algorithms are restricted to use much less space than they would need to store the input. Furthermore, the input is accessed in a sequential fashion, therefore, can be viewed as a stream of data elements. The restriction limits the model and yet, algorithms exist for many graph problems in the streaming model. We survey a set of algorithms that compute graph statistics, matching and distance in a graph, and random walks. These are basic graph problems and the algorithms that compute them may be used as building blocks in graph-data management and mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl. On the streaming model augmented with a sorting primitive. In IEEE Symposium on Foundations of Computer Science, pages 540–549, 2004.

    Google Scholar 

  2. N. Alon, S. Hoory, and N. Linial. The moore bound for irregular graphs. Graphs and Combinatorics, 18(1):53–57, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  3. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137–147, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  4. I. Althofer, G. Das, D. Dobkin, and D. Joseph. Generating sparse spanners for weighted graphs. In Proc. 2nd Scandinavian Workshop on Algorithm Theory, LNCS 447, pages 26–37, 1990.

    Google Scholar 

  5. B. Awerbuch, B. Berger, L. Cowen, and D. Peleg. Near-linear time construction of sparse neighborhood covers. SIAM Journal on Computing, 28(1):263–277, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  6. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. 13th ACM-SIAM Symposium on Discrete Algorithms, pages 623–632, 2002.

    Google Scholar 

  7. B. Bollobas. Extremal Graph Theory. Academic Press, New York, 1978.

    MATH  Google Scholar 

  8. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In Proceedings of ACM Symposium on Principles of Database Systems, pages 253–262, 2006.

    Google Scholar 

  9. A. Chakrabarti, G. Cormode, and A. McGregor. A near-optimal algorithm for computing the entropy of a stream. In ACM-SIAM Symposium on Discrete Algorithms, pages 328–335, 2007.

    Google Scholar 

  10. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312, 2004.

    Google Scholar 

  11. E. Cohen. Fast algorithms for t-spanners and stretch-t paths. In Proc. 34th IEEE Symposium on Foundation of Computer Science, pages 648–658, 1993.

    Google Scholar 

  12. E. Cohen. Fast algorithms for constructing t-spanners and paths with stretch t. SIAM Journal on Computing, 28:210–236, 1998.

    Article  MATH  Google Scholar 

  13. Cormode and Muthukrishnan. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Transactions on Database Systems, 30, 2005.

    Google Scholar 

  14. G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In Proceedings of ACM Symposium on Principles of Database Systems, pages 271–282, 2005.

    Google Scholar 

  15. C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In ACM-SIAM Symposium on Discrete Algorithms, pages 714–723, 2006.

    Google Scholar 

  16. P. Drineas and R. Kannan. Pass efficient algorithms for approximating large matrices. In Proc. 14th ACM-SIAM Symposium on Discrete Algorithms, pages 223–232, 2003.

    Google Scholar 

  17. R. D. Dutton and R. C. Brigham. Edges in graphs with large girth. Graphs and Combinatorics, 7(4):315–321, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  18. M. Elkin. Computing almost shortest paths. In Proc. 20th ACM Symposium on Principles of Distributed Computing, pages 53–62, 2001.

    Google Scholar 

  19. M. Elkin. A fast distributed protocol for constructing the minimum spanning tree. In Proc. 15th ACM-SIAM Symposium on Discrete Algorithms, pages 352–361, 2004.

    Google Scholar 

  20. M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. In International Col loquium on Automata, Languages and Programming, pages 716–727, 2007.

    Google Scholar 

  21. M. Elkin and J. Zhang. Efficient algorithms for constructing (1 + ε, β)-spanners in the distributed and streaming models. In Proc. 23rd ACM Symposium on Principles of Distributed Computing, pages 160–168, 2004.

    Google Scholar 

  22. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In Proc. 31st International Colloquium on Automata, Languages and Programming, LNCS 3142, pages 531–543, 2004.

    Google Scholar 

  23. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the streaming model: The value of space. In Proc. 16th ACM-SIAM Symposium on Discrete Algorithms, pages 745–754, 2005.

    Google Scholar 

  24. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L 1 difference algorithm for massive data streams. SIAM Journal on Computing, 32(1):131–151, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  25. P. Flajolet and G. Martin. Probabilistic counting. In Proc. 24th IEEE Symposium on Foundation of Computer Science, pages 76–82, 1983.

    Google Scholar 

  26. A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proc. 34th ACM Symposium on Theory of Computing, pages 389–398, 2002.

    Google Scholar 

  27. S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. 33rd ACM Symposium on Theory of Computing, pages 471–475, 2001.

    Google Scholar 

  28. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In Proc. 41st IEEE Symposium on Foundations of Computer Science, pages 359–366, 2000.

    Google Scholar 

  29. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998–001, DEC Systems Research Center, 1998.

    Google Scholar 

  30. J. Hopcroft and J. Ullman. Some results on tape-bounded turing machines. Journal of the ACM, 16:160–177, 1969.

    Article  MathSciNet  Google Scholar 

  31. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Proc. 41st IEEE Symposium on Foundations of Computer Science, pages 189–197, 2000.

    Google Scholar 

  32. P. Indyk. Algorithms for dynamic geometric problems over data streams. In Proc. 36th ACM Symposium on Theory of Computing, pages 373–380, 2004.

    Google Scholar 

  33. Jowhari and Ghodsi. New streaming algorithms for counting triangles in graphs. In Annual International Conference on Computing and Combinatorics, pages 710–716, 2005.

    Google Scholar 

  34. L. Lovasz and M. Simonovits. The mixing rate of markov chains, an isoperimetric inequality, and computing the volume. In IEEE Symposium on Foundations of Computer Science, pages 346–354, 1990.

    Google Scholar 

  35. A. McGregor. Finding graph matchings in data streams. In APPROX-RANDOM, pages 170–181, 2005.

    Google Scholar 

  36. J. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315–323, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  37. S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.

    Google Scholar 

  38. S. Muthukrishnan and M. Strauss. Rangesum histograms. In ACM-SIAM Symposium on Discrete Algorithms, pages 233–242, 2003.

    Google Scholar 

  39. D. Peleg and J. Ullman. An optimal synchronizer for the hypercube. SIAM Journal on Computing, 18:740–747, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  40. A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. In ACM Symposium on Principles of Database Systems, pages 69–78, 2008.

    Google Scholar 

  41. A. D. Sarma, S. Gollapudi, and R. Panigrahy. Sparse cut projections in graph streams. In European Symposium on Algorithms, 2009.

    Google Scholar 

  42. D. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In ACM Symposium on Theory of Computing, pages 81–90, 2004.

    Google Scholar 

  43. J. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw, 11(1):37–57, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  44. J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209–271, 2001.

    Article  Google Scholar 

  45. M. Zelke. k-connectivity in the semi-streaming model. CoRR, cs/0608066, 2006.

    Google Scholar 

  46. M. Zelke. Weighted matching in the semi-streaming model. In Symposium on Theoretical Aspects of Computer Science, pages 669–680, 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Zhang, J. (2010). A Survey on Streaming Algorithms for Massive Graphs. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_13

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics