Abstract
Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead. A summary is a compressed version of the graph that removes several details, yet preserves it’s essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
LeFevre, K., Terzi, E.: GraSS: graph structure summarization. In: SIAM International Conference on Data Mining SDM, pp. 454–465 (2010)
Riondato, M., García-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: IEEE International Conference on Data Mining ICDM, pp. 947–952 (2014)
Storer, J.: Data compression. Elsevier, Amsterdam (1988)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: ACM International Conference on Management of Data SIGMOD, pp. 419–432 (2008)
Khan, K., Nawaz, W., Lee, Y.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: SIAM International Conference on Data Mining SDM, pp. 91–99 (2014)
Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: ACM International Conference on Information and Knowledge Management CIKM, pp. 397–406 (2016)
Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 965–973 (2011)
Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: ACM International Conference on Management of Data SIGMOD, pp. 157–168 (2012)
Liu, Z., Yu, J.X., Cheng, H.: Approximate homogeneous graph summarization. J. Inf. Process. 20(1), 77–88 (2012)
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: International Conference on World Wide Web WWW, pp. 595–602 (2004)
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference DCC, pp. 203–212 (2001)
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 219–228 (2009)
Liu, Y., Dighe, A., Safavi, T., Koutra, D.: A graph summarization: a survey (2016). arXiv preprint arXiv:1612.04883
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SIAM International Conference on Data Mining SDM, pp. 274–285 (2005)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 150–160 (2000)
Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, Boca Raton (2010)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Beg, M.A., Ahmad, M., Zaman, A., Khan, I. (2018). Scalable Approximation Algorithm for Graph Summarization. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)