Skip to main content

Efficient Local Clustering Coefficient Estimation in Massive Graphs

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Included in the following conference series:

Abstract

Graph is a powerful tool to model interactions in disparate applications, and how to assess the structure of a graph is an essential task across all the domains. As a classic measure to characterize the connectivity of graphs, clustering coefficient and its variants are of particular interest in graph structural analysis. However, the largest of today’s graphs may have nodes and edges in billion scale, which makes the simple task of computing clustering coefficients quite complicated and expensive. Thus, approximate solutions have attracted much attention from researchers recently. However, they only target global and binned degree-wise clustering coefficient estimation, and their techniques are not suitable for local clustering coefficient estimation that is of great importance for individual nodes. In this paper, we propose a new sampling scheme to estimate the local clustering coefficient with error bounded, where global and binned degree-wise clustering coefficients can be considered as special cases. Meanwhile, based on our sampling scheme, we propose a new framework to estimate all the three clustering coefficients in a unified way. To make it scalable on massive graphs, we further design an efficient MapReduce algorithm under this framework. Extensive experiments validate the efficiency and effectiveness of our algorithms, which significantly outperform state-of-the-art exact and approximate algorithms on many real graph datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. TKDD 4(3) (2010). Article no. 13

    Google Scholar 

  2. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  3. Chen, D.-B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), e77455 (2013)

    Article  Google Scholar 

  4. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: KDD, pp. 672–680. ACM (2011)

    Google Scholar 

  5. Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)

    Article  Google Scholar 

  6. Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)

    Article  MathSciNet  Google Scholar 

  7. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: SIGMOD, pp. 325–336. ACM (2013)

    Google Scholar 

  9. Jha, M., Seshadhri, C., Pinar, A.: A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD 9(3), 15:1–15:21 (2015)

    Google Scholar 

  10. Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. SISC 36(5), S48–S77 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW 2010: Proceedings of the 19th International Conference on World wide web, pp. 591–600. ACM, New York (2010)

    Google Scholar 

  13. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoret. Comput. Sci. 407(1), 458–473 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data

  15. Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015)

    Google Scholar 

  16. Lin, Y., Xiong, H., Chen, M., Ding, L., Cao, Y., Wang, G., Liu, M.: Dynamical model and analysis of cascading failures on the complex power grids. Kybernetes 40(5/6), 814–823 (2011)

    Article  Google Scholar 

  17. Masuda, N.: Clustering in large networks does not promote upstream reciprocity. PloS one 6(10), e25190 (2011)

    Article  Google Scholar 

  18. McGregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014)

    Article  MathSciNet  Google Scholar 

  19. Menegola, B.: An external memory algorithm for listing triangles (2010)

    Google Scholar 

  20. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)

    Google Scholar 

  21. Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Park, H.-M., Chung, C.-W.: An efficient mapreduce algorithm for counting triangles in a very large graph. In: CIKM, pp. 539–548. ACM (2013)

    Google Scholar 

  23. Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748. ACM (2014)

    Google Scholar 

  24. Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9(2), 265–275 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Stat. 2(1), 39–48 (1974)

    Google Scholar 

  26. Seshadhri, C., Kolda, T.G., Pinar, A.: Community structure and scale-free collections of erdős-rényi graphs. Phys. Rev. E 85(5), 056109 (2012)

    Article  Google Scholar 

  27. Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. In: SDM, vol. 4, p. 5. Citeseer (2013)

    Google Scholar 

  28. Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18. SIAM (2013)

    Google Scholar 

  29. Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: counting local and global triangles in fully-dynamic streams with fixed memory size. In: KDD, pp. 825–834 (2016)

    Google Scholar 

  30. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614. ACM (2011)

    Google Scholar 

  31. Trpevski, D., Tang, W.K., Kocarev, L.: Model for rumor spreading over networks. Phys. Rev. E 81(5), 056102 (2010)

    Article  Google Scholar 

  32. Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp 608–617 (2008)

    Google Scholar 

  33. Tsourakakis, C.E., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Soc. Netw. Anal. Mining 1(2), 75–81 (2011)

    Article  Google Scholar 

  34. Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846. ACM (2009)

    Google Scholar 

  35. Tsourakakis, C.E., Kolountzakis, M.N., Miller, G.L.: Triangle sparsifiers. J. Graph Algorithms Appl. 15(6), 703–726 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  36. Wu, X., Lu, H.: Cluster synchronization in the adaptive complex dynamical networks via a novel approach. Phys. Lett. A 375(14), 1559–1565 (2011)

    Article  MATH  Google Scholar 

  37. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. TKDD 8(1), 2 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the grants from the National Science Foundation of China (61502349), Hubei Provincial Natural Science Foundation of China (2015CFB339), the Scientific and Technologic Development Program of SuZhou (SYG201442), Research Grants Council of the Hong Kong SAR, China (14209314 and 14221716), Australian Research Council (DE140100999 and DP160101513), Microsoft Research Asia Collaborative Research Grant and Chinese University of Hong Kong Direct Grant (4055048). Yuanyuan Zhu is a corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X. (2017). Efficient Local Clustering Coefficient Estimation in Massive Graphs. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55699-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55698-7

  • Online ISBN: 978-3-319-55699-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics