Scalable Approximation Algorithm for Graph Summarization

Beg, Maham Anwar; Ahmad, Muhammad; Zaman, Arif; Khan, Imdadullah

doi:10.1007/978-3-319-93040-4_40

Scalable Approximation Algorithm for Graph Summarization

Maham Anwar Beg¹⁹,
Muhammad Ahmad¹⁹,
Arif Zaman¹⁹ &
…
Imdadullah Khan¹⁹

Conference paper
First Online: 17 June 2018

3474 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Abstract

Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead. A summary is a compressed version of the graph that removes several details, yet preserves it’s essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

LeFevre, K., Terzi, E.: GraSS: graph structure summarization. In: SIAM International Conference on Data Mining SDM, pp. 454–465 (2010)
Chapter Google Scholar
Riondato, M., García-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: IEEE International Conference on Data Mining ICDM, pp. 947–952 (2014)
Google Scholar
Storer, J.: Data compression. Elsevier, Amsterdam (1988)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article Google Scholar
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: ACM International Conference on Management of Data SIGMOD, pp. 419–432 (2008)
Google Scholar
Khan, K., Nawaz, W., Lee, Y.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)
Article MathSciNet Google Scholar
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: SIAM International Conference on Data Mining SDM, pp. 91–99 (2014)
Chapter Google Scholar
Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: ACM International Conference on Information and Knowledge Management CIKM, pp. 397–406 (2016)
Google Scholar
Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 965–973 (2011)
Google Scholar
Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: ACM International Conference on Management of Data SIGMOD, pp. 157–168 (2012)
Google Scholar
Liu, Z., Yu, J.X., Cheng, H.: Approximate homogeneous graph summarization. J. Inf. Process. 20(1), 77–88 (2012)
Google Scholar
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: International Conference on World Wide Web WWW, pp. 595–602 (2004)
Google Scholar
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference DCC, pp. 203–212 (2001)
Google Scholar
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 219–228 (2009)
Google Scholar
Liu, Y., Dighe, A., Safavi, T., Koutra, D.: A graph summarization: a survey (2016). arXiv preprint arXiv:1612.04883
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Article Google Scholar
White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SIAM International Conference on Data Mining SDM, pp. 274–285 (2005)
Chapter Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 150–160 (2000)
Google Scholar
Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, Boca Raton (2010)
MATH Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet Google Scholar
Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, School of Science and Engineering, Lahore University of Management Sciences, Lahore, Pakistan
Maham Anwar Beg, Muhammad Ahmad, Arif Zaman & Imdadullah Khan

Authors

Maham Anwar Beg
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Arif Zaman
View author publications
You can also search for this author in PubMed Google Scholar
Imdadullah Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Imdadullah Khan .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beg, M.A., Ahmad, M., Zaman, A., Khan, I. (2018). Scalable Approximation Algorithm for Graph Summarization. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_40
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics