Abstract
Graph is a powerful tool to model interactions in disparate applications, and how to assess the structure of a graph is an essential task across all the domains. As a classic measure to characterize the connectivity of graphs, clustering coefficient and its variants are of particular interest in graph structural analysis. However, the largest of today’s graphs may have nodes and edges in billion scale, which makes the simple task of computing clustering coefficients quite complicated and expensive. Thus, approximate solutions have attracted much attention from researchers recently. However, they only target global and binned degree-wise clustering coefficient estimation, and their techniques are not suitable for local clustering coefficient estimation that is of great importance for individual nodes. In this paper, we propose a new sampling scheme to estimate the local clustering coefficient with error bounded, where global and binned degree-wise clustering coefficients can be considered as special cases. Meanwhile, based on our sampling scheme, we propose a new framework to estimate all the three clustering coefficients in a unified way. To make it scalable on massive graphs, we further design an efficient MapReduce algorithm under this framework. Extensive experiments validate the efficiency and effectiveness of our algorithms, which significantly outperform state-of-the-art exact and approximate algorithms on many real graph datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. TKDD 4(3) (2010). Article no. 13
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Chen, D.-B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), e77455 (2013)
Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: KDD, pp. 672–680. ACM (2011)
Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)
Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: SIGMOD, pp. 325–336. ACM (2013)
Jha, M., Seshadhri, C., Pinar, A.: A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD 9(3), 15:1–15:21 (2015)
Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. SISC 36(5), S48–S77 (2014)
Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW 2010: Proceedings of the 19th International Conference on World wide web, pp. 591–600. ACM, New York (2010)
Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoret. Comput. Sci. 407(1), 458–473 (2008)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015)
Lin, Y., Xiong, H., Chen, M., Ding, L., Cao, Y., Wang, G., Liu, M.: Dynamical model and analysis of cascading failures on the complex power grids. Kybernetes 40(5/6), 814–823 (2011)
Masuda, N.: Clustering in large networks does not promote upstream reciprocity. PloS one 6(10), e25190 (2011)
McGregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014)
Menegola, B.: An external memory algorithm for listing triangles (2010)
Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)
Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)
Park, H.-M., Chung, C.-W.: An efficient mapreduce algorithm for counting triangles in a very large graph. In: CIKM, pp. 539–548. ACM (2013)
Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748. ACM (2014)
Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9(2), 265–275 (2005)
Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Stat. 2(1), 39–48 (1974)
Seshadhri, C., Kolda, T.G., Pinar, A.: Community structure and scale-free collections of erdős-rényi graphs. Phys. Rev. E 85(5), 056109 (2012)
Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. In: SDM, vol. 4, p. 5. Citeseer (2013)
Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18. SIAM (2013)
Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: counting local and global triangles in fully-dynamic streams with fixed memory size. In: KDD, pp. 825–834 (2016)
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614. ACM (2011)
Trpevski, D., Tang, W.K., Kocarev, L.: Model for rumor spreading over networks. Phys. Rev. E 81(5), 056102 (2010)
Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp 608–617 (2008)
Tsourakakis, C.E., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Soc. Netw. Anal. Mining 1(2), 75–81 (2011)
Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846. ACM (2009)
Tsourakakis, C.E., Kolountzakis, M.N., Miller, G.L.: Triangle sparsifiers. J. Graph Algorithms Appl. 15(6), 703–726 (2011)
Wu, X., Lu, H.: Cluster synchronization in the adaptive complex dynamical networks via a novel approach. Phys. Lett. A 375(14), 1559–1565 (2011)
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. TKDD 8(1), 2 (2014)
Acknowledgements
This work was partially supported by the grants from the National Science Foundation of China (61502349), Hubei Provincial Natural Science Foundation of China (2015CFB339), the Scientific and Technologic Development Program of SuZhou (SYG201442), Research Grants Council of the Hong Kong SAR, China (14209314 and 14221716), Australian Research Council (DE140100999 and DP160101513), Microsoft Research Asia Collaborative Research Grant and Chinese University of Hong Kong Direct Grant (4055048). Yuanyuan Zhu is a corresponding author.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X. (2017). Efficient Local Clustering Coefficient Estimation in Massive Graphs. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-55699-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)