Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction

Sui, Xin; Lee, Tsung-Hsien; Whang, Joyce Jiyoung; Savas, Berkant; Jain, Saral; Pingali, Keshav; Dhillon, Inderjit

doi:10.1007/978-3-642-37658-0_6

Xin Sui¹⁷,
Tsung-Hsien Lee¹⁸,
Joyce Jiyoung Whang¹⁷,
Berkant Savas¹⁹,
Saral Jain¹⁷,
Keshav Pingali¹⁷ &
…
Inderjit Dhillon¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7760))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

1172 Accesses
6 Citations

Abstract

Social network analysis has become a major research area that has impact in diverse applications ranging from search engines to product recommendation systems. A major problem in implementing social network analysis algorithms is the sheer size of many social networks, for example, the Facebook graph has more than 900 million vertices and even small networks may have tens of millions of vertices. One solution to dealing with these large graphs is dimensionality reduction using spectral or SVD analysis of the adjacency matrix of the network, but these global techniques do not necessarily take into account local structures or clusters of the network that are critical in network analysis. A more promising approach is clustered low-rank approximation: instead of computing a global low-rank approximation, the adjacency matrix is first clustered, and then a low-rank approximation of each cluster (i.e., diagonal block) is computed. The resulting algorithm is challenging to parallelize not only because of the large size of the data sets in social network analysis, but also because it requires computing with very diverse data structures ranging from extremely sparse matrices to dense matrices. In this paper, we describe the first parallel implementation of a clustered low-rank approximation algorithm for large social network graphs, and use it to perform link prediction in parallel. Experimental results show that this implementation scales well on large distributed-memory machines; for example, on a Twitter graph with roughly 11 million vertices and 63 million edges, our implementation scales by a factor of 86 on 128 processes and takes less than 2300 seconds, while on a much larger Twitter graph with 41 million vertices and 1.2 billion edges, our implementation scales by a factor of 203 on 256 processes with a running time about 4800 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ARPACK++, http://www.ime.unicamp.br/~chico/arpack++/
Elemental, http://elemental.googlecode.com/hg/doc/build/html/core/matrix.html
GotoBLAS, http://www.tacc.utexas.edu/tacc-projects/gotoblas2/
Mahout, http://lucene.apache.org/mahout/
Ranger, http://services.tacc.utexas.edu/index.php/ranger-user-guide
SNAP - Stanford Network Analysis Package, http://snap.stanford.edu/snap/
Social Computing Data Repository, http://socialcomputing.asu.edu/datasets/Twitter
Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: IPDPS (2006)
Google Scholar
Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., Wu, Y.-J.J.: Plapack: parallel linear algebra package design overview. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pp. 1–16. ACM (1997)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book Google Scholar
Cong, G., Almasi, G., Saraswat, V.: Fast pgas connected components algorithms. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009 (2009)
Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Article Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press (1996)
Google Scholar
Huang, Z.: Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In: Workshop on Link Analysis, KDD (2006)
Google Scholar
Kang, U., Meeder, B., Faloutsos, C.: Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 13–25. Springer, Heidelberg (2011)
Chapter Google Scholar
Karypis, G., Kumar, V.: A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: Proceedings of SIAM International Conference on Parallel Processing for Scientific Computing (1997)
Google Scholar
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953)
Article Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)
Google Scholar
Lehoucq, R., Sorensen, D., Yang, C.: Arpack Users’ Guide: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1998)
Book Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)
Article Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Article Google Scholar
Lu, Z., Savas, B., Tang, W., Dhillon, I.S.: Link prediction using multiple sources of information. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 923–928 (2010)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Savas, B., Dhillon, I.S.: Clustered low rank approximation of graphs in information science applications. In: SIAM Data Mining Conference, pp. 164–175 (2011)
Google Scholar
Song, H.H., Savas, B., Cho, T.W., Dave, V., Lu, Z., Dhillon, I.S., Zhang, Y., Qiu, L.: Clustered embedding of massive social networks. In: SIGMETRICS (2012)
Google Scholar
Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)
Chapter Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Vasuki, V., Natarajan, N., Lu, Z., Savas, B., Dhillon, I.S.: Scalable affiliation recommendation using auxiliary networks. ACM Transactions on Intelligent Systems and Technology 3, 3:1–3:20 (2011)
Google Scholar
Whang, J., Sui, X., Dhillon, I.: Scalable and memory-efficient clustering of large-scale social networks. In: Proceedings of the IEEE International Conference on Data Mining (2012)
Google Scholar
Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on bluegene/l. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, pp. 25–43 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Texas, Austin, USA
Xin Sui, Joyce Jiyoung Whang, Saral Jain, Keshav Pingali & Inderjit Dhillon
Department of Electrical and Computer Engineering, The University of Texas, Austin, USA
Tsung-Hsien Lee
Department of Mathematics, Linköping University, Sweden
Berkant Savas

Authors

Xin Sui
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Hsien Lee
View author publications
You can also search for this author in PubMed Google Scholar
Joyce Jiyoung Whang
View author publications
You can also search for this author in PubMed Google Scholar
Berkant Savas
View author publications
You can also search for this author in PubMed Google Scholar
Saral Jain
View author publications
You can also search for this author in PubMed Google Scholar
Keshav Pingali
View author publications
You can also search for this author in PubMed Google Scholar
Inderjit Dhillon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Department of Computer Science and Engineering, Waseda University, 27 Waseda-machi, 162-0042, Shinjuku-ku, Tokyo, Japan
Hironori Kasahara & Keiji Kimura &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sui, X. et al. (2013). Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-37658-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics