Advertisement

Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction

  • Xin Sui
  • Tsung-Hsien Lee
  • Joyce Jiyoung Whang
  • Berkant Savas
  • Saral Jain
  • Keshav Pingali
  • Inderjit Dhillon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)

Abstract

Social network analysis has become a major research area that has impact in diverse applications ranging from search engines to product recommendation systems. A major problem in implementing social network analysis algorithms is the sheer size of many social networks, for example, the Facebook graph has more than 900 million vertices and even small networks may have tens of millions of vertices. One solution to dealing with these large graphs is dimensionality reduction using spectral or SVD analysis of the adjacency matrix of the network, but these global techniques do not necessarily take into account local structures or clusters of the network that are critical in network analysis. A more promising approach is clustered low-rank approximation: instead of computing a global low-rank approximation, the adjacency matrix is first clustered, and then a low-rank approximation of each cluster (i.e., diagonal block) is computed. The resulting algorithm is challenging to parallelize not only because of the large size of the data sets in social network analysis, but also because it requires computing with very diverse data structures ranging from extremely sparse matrices to dense matrices. In this paper, we describe the first parallel implementation of a clustered low-rank approximation algorithm for large social network graphs, and use it to perform link prediction in parallel. Experimental results show that this implementation scales well on large distributed-memory machines; for example, on a Twitter graph with roughly 11 million vertices and 63 million edges, our implementation scales by a factor of 86 on 128 processes and takes less than 2300 seconds, while on a much larger Twitter graph with 41 million vertices and 1.2 billion edges, our implementation scales by a factor of 203 on 256 processes with a running time about 4800 seconds.

Keywords

Social network analysis link prediction parallel computing graph computations clustered low-rank approximation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    SNAP - Stanford Network Analysis Package, http://snap.stanford.edu/snap/
  7. 7.
    Social Computing Data Repository, http://socialcomputing.asu.edu/datasets/Twitter
  8. 8.
    Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: IPDPS (2006)Google Scholar
  9. 9.
    Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., Wu, Y.-J.J.: Plapack: parallel linear algebra package design overview. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pp. 1–16. ACM (1997)Google Scholar
  10. 10.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
  11. 11.
    Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)CrossRefGoogle Scholar
  12. 12.
    Cong, G., Almasi, G., Saraswat, V.: Fast pgas connected components algorithms. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009 (2009)Google Scholar
  13. 13.
    Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)CrossRefGoogle Scholar
  14. 14.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press (1996)Google Scholar
  15. 15.
    Huang, Z.: Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In: Workshop on Link Analysis, KDD (2006)Google Scholar
  16. 16.
    Kang, U., Meeder, B., Faloutsos, C.: Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 13–25. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Karypis, G., Kumar, V.: A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: Proceedings of SIAM International Conference on Parallel Processing for Scientific Computing (1997)Google Scholar
  18. 18.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953)CrossRefGoogle Scholar
  19. 19.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)Google Scholar
  20. 20.
    Lehoucq, R., Sorensen, D., Yang, C.: Arpack Users’ Guide: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1998)CrossRefGoogle Scholar
  21. 21.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  22. 22.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  23. 23.
    Lu, Z., Savas, B., Tang, W., Dhillon, I.S.: Link prediction using multiple sources of information. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 923–928 (2010)Google Scholar
  24. 24.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  25. 25.
    Savas, B., Dhillon, I.S.: Clustered low rank approximation of graphs in information science applications. In: SIAM Data Mining Conference, pp. 164–175 (2011)Google Scholar
  26. 26.
    Song, H.H., Savas, B., Cho, T.W., Dave, V., Lu, Z., Dhillon, I.S., Zhang, Y., Qiu, L.: Clustered embedding of massive social networks. In: SIGMETRICS (2012)Google Scholar
  27. 27.
    Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  29. 29.
    Vasuki, V., Natarajan, N., Lu, Z., Savas, B., Dhillon, I.S.: Scalable affiliation recommendation using auxiliary networks. ACM Transactions on Intelligent Systems and Technology 3, 3:1–3:20 (2011)Google Scholar
  30. 30.
    Whang, J., Sui, X., Dhillon, I.: Scalable and memory-efficient clustering of large-scale social networks. In: Proceedings of the IEEE International Conference on Data Mining (2012)Google Scholar
  31. 31.
    Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on bluegene/l. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, pp. 25–43 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Xin Sui
    • 1
  • Tsung-Hsien Lee
    • 2
  • Joyce Jiyoung Whang
    • 1
  • Berkant Savas
    • 3
  • Saral Jain
    • 1
  • Keshav Pingali
    • 1
  • Inderjit Dhillon
    • 1
  1. 1.Department of Computer ScienceThe University of TexasAustinUSA
  2. 2.Department of Electrical and Computer EngineeringThe University of TexasAustinUSA
  3. 3.Department of MathematicsLinköping UniversitySweden

Personalised recommendations