Skip to main content

Spectral Clustering in Social Networks

  • Conference paper
Book cover Advances in Web Mining and Web Usage Analysis (SNAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5439))

Included in the following conference series:

Abstract

We evaluate various heuristics for hierarchical spectral clustering in large telephone call and Web graphs. Spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components, a fact that appears to be common for several data sources but, to our knowledge, no general solution provided so far. Divide-and-Merge, a recently described postfiltering procedure may be used to eliminate bad quality branches in a binary tree hierarchy. We propose an alternate solution that enables k-way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them further.

Our experiments are performed on graphs with various weight and normalization built based on call detail records and Web crawls. We measure clustering quality both by modularity as well as by the geographic and topical homogeneity of the clusters. Compared to divide-and-merge, we give more homogeneous clusters with a more desirable distribution of the cluster sizes.

Support from a Yahoo Faculty Research Grant and by grant ASTOR NKFP 2/004/05. This work is based on an earlier work: Spectral Clustering in Telephone Call Graphs, in Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, Pages 82–91 (2007) (C) ACM, 2007. http://doi.acm.org/10.1145/1348549.1348559

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC), pp. 171–180 (2000)

    Google Scholar 

  2. Alpert, C.J., Kahng, A.B.: Multiway partitioning via geometric embeddings, orderings, and dynamic programming. IEEE Trans. on CAD of Integrated Circuits and Systems 14(11), 1342–1358 (1995)

    Article  Google Scholar 

  3. Alpert, C.J., Kahng, A.B.: Recent directions in netlist partitioning: a survey. Integr. VLSI J. 19(1-2), 1–81 (1995)

    Article  MATH  Google Scholar 

  4. Alpert, C.J., Yao, S.-Z.: Spectral partitioning: the more eigenvectors, the better. In: DAC 1995: Proceedings of the 32nd ACM/IEEE conference on Design automation, pp. 195–200. ACM Press, New York (1995)

    Google Scholar 

  5. Au, W.-H., Chan, K.C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evolutionary Computation 7(6), 532–545 (2003)

    Article  Google Scholar 

  6. Barnes, E.R.: An algorithm for partitioning the nodes of a graph. SIAM Journal on Algebraic and Discrete Methods 3(4), 541–550 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  7. Benczúr, A.A., Csalogány, K., Kurucz, M., Lukács, A., Lukács, L.: Sociodemographic exploration of telecom communities. In: NSF US-Hungarian Workshop on Large Scale Random Graphs Methods for Modeling Mesoscopic Behavior in Biological and Physical Systems (2006)

    Google Scholar 

  8. Berry, M.W.: SVDPACK: A Fortran-77 software library for the sparse singular value decomposition. Technical report, University of Tennessee, Knoxville, TN, USA (1992)

    Google Scholar 

  9. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34(8), 721–726 (2004)

    Google Scholar 

  10. Chan, P.K., Schlag, M.D.F., Zien, J.Y.: Spectral k-way ratio-cut partitioning and clustering. In: DAC 1993: Proceedings of the 30th international conference on Design automation, pp. 749–754. ACM Press, New York (1993)

    Google Scholar 

  11. Cheng, D., Kannan, R., Vempala, S., Wang, G.: On a recursive spectral algorithm for clustering from pairwise similarities. Technical report, MIT LCS Technical Report MIT-LCS-TR-906 (2003)

    Google Scholar 

  12. Cheng, D., Vempala, S., Kannan, R., Wang, G.: A divide-and-merge methodology for clustering. In: PODS 2005: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 196–205. ACM Press, New York (2005)

    Chapter  Google Scholar 

  13. Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the United States of America 99(25), 15879–15882 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chung, F., Lu, L., Vu, V.: Eigenvalues of random power law graphs. Annals of Combinatorics (2003)

    Google Scholar 

  15. Chung, F., Lu, L., Vu, V.: Spectra of random graphs with given expected degrees. Proceedings of National Academy of Sciences 100, 6313–6318 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cormode, G., Indyk, P., Koudas, N., Muthukrishnan, S.: Fast mining of massive tabular data via approximate distance computations. In: ICDE 2002: Proceedings of the 18th International Conference on Data Engineering, p. 605. IEEE Computer Society, Washington (2002)

    Google Scholar 

  17. Cox, K.C., Eick, S.G., Wills, G.J., Brachman, R.J.: Brief application description; visual data mining: Recognizing telephone calling fraud. Data Min. Knowl. Discov. 1(2), 225–231 (1997)

    Article  Google Scholar 

  18. Derényi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physical Review Letters 94, 49–60 (2005)

    Article  MATH  Google Scholar 

  19. Ding, C.H.Q., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 275–280. ACM Press, New York (2001)

    Google Scholar 

  20. Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 107–114. IEEE Computer Society, Washington (2001)

    Chapter  Google Scholar 

  21. Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM Journal of Research and Development 17(5), 420–425 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  22. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. In: Machine Learning, pp. 9–33 (2004)

    Google Scholar 

  23. Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Mathematical Journal 23(98) (1973)

    Google Scholar 

  24. Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low rank approximations. In: Proceedings of the 39th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 370–378 (1998)

    Google Scholar 

  25. Gorny, E.: Russian livejournal: National specifics in the development of a virtual community. pdf online (May 2004)

    Google Scholar 

  26. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Web content categorization using link information. Technical report, Stanford University (2006–2007)

    Google Scholar 

  27. Hagen, L.W., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems 11(9), 1074–1085 (1992)

    Article  Google Scholar 

  28. Kannan, R., Vempala, S., Vetta, A.: On clusterings — good, bad and spectral. In: IEEE: 2000: ASF, pp. 367–377 (2000)

    Google Scholar 

  29. Karypis, G.: CLUTO: A clustering toolkit, release 2.1. Technical Report 02-017, University of Minnesota, Department of Computer Science (2002)

    Google Scholar 

  30. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: Structure and evolution of blogspace. Commun. ACM 47(12), 35–39 (2004)

    Article  Google Scholar 

  31. Lang, K.: Finding good nearly balanced cuts in power law graphs. Technical report, Yahoo! Inc. (2004)

    Google Scholar 

  32. Lang, K.: Fixing two weaknesses of the spectral method. In: NIPS 2005: Advances in Neural Information Processing Systems, vol. 18, Vancouver, Canada (2005)

    Google Scholar 

  33. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vision 43(1), 7–27 (2001)

    Article  MATH  Google Scholar 

  34. Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AISTATS (2001)

    Google Scholar 

  35. Nanavati, A.A., Gurumurthy, S., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom graphs: Findings and implications. In: CIKM (2006)

    Google Scholar 

  36. Onnela, J.P., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz, J., Barabasi, A.L.: Structure and tie strengths in mobile communication networks (October 2006)

    Google Scholar 

  37. Open Directory Project (ODP), http://www.dmoz.org

  38. Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 61–70. ACM Press, New York (2002)

    Google Scholar 

  39. Sarlós, T.: Improved approximation algorithms for large matrices via random projections. In: Proceedings of the 47th IEEE Symposium on Foundations of Computer Science (FOCS) (2006)

    Google Scholar 

  40. Sarlós, T., Benczúr, A.A., Csalogány, K., Fogaras, D., Rácz, B.: To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference (WWW), pp. 297–306 (2006)

    Google Scholar 

  41. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2000)

    Google Scholar 

  42. Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD 2007: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 647–656. ACM Press, New York (2007)

    Google Scholar 

  43. von Luxburg, U., Bousquet, O., Belkin, M.: Limits of spectral clustering, pp. 857–864. MIT Press, Cambridge (2005)

    MATH  Google Scholar 

  44. Wei, C.-P., Chiu, I.-T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(2), 103–112 (2002)

    Article  Google Scholar 

  45. Weiss, Y.: Segmentation using eigenvectors: A unifying view. In: ICCV (2), pp. 975–982 (1999)

    Google Scholar 

  46. Wills, G.J.: NicheWorks — interactive visualization of very large graphs. Journal of Computational and Graphical Statistics 8(2), 190–212 (1999)

    Google Scholar 

  47. Zakharov, P.: Structure of livejournal social network. In: Proceedings of SPIE, vol. 6601, Noise and Stochastics in Complex Systems and Finance (2007)

    Google Scholar 

  48. Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D.: Spectral relaxation for k-means clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS, pp. 1057–1064. MIT Press, Cambridge (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kurucz, M., Benczúr, A.A., Csalogány, K., Lukács, L. (2009). Spectral Clustering in Social Networks. In: Zhang, H., et al. Advances in Web Mining and Web Usage Analysis. SNAKDD 2007. Lecture Notes in Computer Science(), vol 5439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00528-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00528-2_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00527-5

  • Online ISBN: 978-3-642-00528-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics