Skip to main content

Scalable Approximation Algorithm for Graph Summarization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Abstract

Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead. A summary is a compressed version of the graph that removes several details, yet preserves it’s essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://snap.stanford.edu/.

  2. 2.

    https://bitbucket.org/M_AnwarBeg/scalablesumm/.

References

  1. LeFevre, K., Terzi, E.: GraSS: graph structure summarization. In: SIAM International Conference on Data Mining SDM, pp. 454–465 (2010)

    Chapter  Google Scholar 

  2. Riondato, M., García-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: IEEE International Conference on Data Mining ICDM, pp. 947–952 (2014)

    Google Scholar 

  3. Storer, J.: Data compression. Elsevier, Amsterdam (1988)

    Google Scholar 

  4. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  Google Scholar 

  5. Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: ACM International Conference on Management of Data SIGMOD, pp. 419–432 (2008)

    Google Scholar 

  6. Khan, K., Nawaz, W., Lee, Y.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)

    Article  MathSciNet  Google Scholar 

  7. Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: SIAM International Conference on Data Mining SDM, pp. 91–99 (2014)

    Chapter  Google Scholar 

  8. Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: ACM International Conference on Information and Knowledge Management CIKM, pp. 397–406 (2016)

    Google Scholar 

  9. Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 965–973 (2011)

    Google Scholar 

  10. Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: ACM International Conference on Management of Data SIGMOD, pp. 157–168 (2012)

    Google Scholar 

  11. Liu, Z., Yu, J.X., Cheng, H.: Approximate homogeneous graph summarization. J. Inf. Process. 20(1), 77–88 (2012)

    Google Scholar 

  12. Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: International Conference on World Wide Web WWW, pp. 595–602 (2004)

    Google Scholar 

  13. Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference DCC, pp. 203–212 (2001)

    Google Scholar 

  14. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 219–228 (2009)

    Google Scholar 

  15. Liu, Y., Dighe, A., Safavi, T., Koutra, D.: A graph summarization: a survey (2016). arXiv preprint arXiv:1612.04883

  16. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  17. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)

    Article  Google Scholar 

  18. White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SIAM International Conference on Data Mining SDM, pp. 274–285 (2005)

    Chapter  Google Scholar 

  19. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 150–160 (2000)

    Google Scholar 

  20. Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, Boca Raton (2010)

    MATH  Google Scholar 

  21. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  Google Scholar 

  22. Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imdadullah Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beg, M.A., Ahmad, M., Zaman, A., Khan, I. (2018). Scalable Approximation Algorithm for Graph Summarization. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics