Parallel Simrank Computing on Large Scale Dataset on Mapreduce

Li, Lina; Li, Cuiping; Chen, Hong

doi:10.1007/978-3-642-41629-3_3

Lina Li³,
Cuiping Li³ &
Hong Chen³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 387))

786 Accesses

Abstract

Many fields need computing the similarity between objects, such as recommendation system, search engine etc. Simrank is one of the simple and intuitive algorithms. It is rigidly based on the random walk theorem. There are three existing iterative ways to compute simrank, however, all of them have one problem, that is time consuming; moreover, with the rapidly growing data on the Internet, we need a novel parallel method to compute simrank on large scale dataset. Hadoop is one of the popular distributed platforms. This paper combines the features of the Hadoop and computes the simrank parallel with different methods, and compars them in the performance.

This work is supported by National Core-High-Base Major Special Subject‘Research on Key technology on High Performance High security Domestic database’(2010ZX01042-001-002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI 04: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (2004)
Google Scholar
Zheng, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation
Google Scholar
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endownment 3(1), 285–296 (2010)
Google Scholar
Kambatla, K., Rapolu, N., Jagannathan, S., Grama, A.: Asynchronous algorithm in MapReduce. In: 2010 IEEE International Conference on Cluster Computing
Google Scholar
Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)
Article Google Scholar
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on MapReduce. In: SIGMOD 11, 12–16 June 2011, Athens, Greece
Google Scholar
Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW 2005, Chiba, Japan
Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M.: Accuracy estimate and optimization techmiques for SimRank computation. VLDB J. 19(1), 45–66 (2010)
Article Google Scholar
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD 02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM Press, New York (2002)
Chapter Google Scholar
Li, C., Han, J., He, G.: Fast computation of SimRank for static and dynamic information networks. In: EDBT 2010, 22–26 March 2010, Lausanne, Switzerland
Google Scholar
He, G., Feng, H., Li, C.: Parallel simrank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD 2010
Google Scholar
Feng, H.: Research on Parallel Simrank. BeiJing Renmin University of China (2010)
Google Scholar
Fogaras, D., Rácz, B.: Towards scaling fully personalized pageRank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)
Google Scholar
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 280–296. Springer, Heidelberg (2010)
Google Scholar
Li, P., Liu, H., et al.: Fast single-pair SimRank computation. In: 2010 SIAM International Conference on Data Mining, pp. 571–582 (2010)
Google Scholar
Langville, A.N., Meyer, C.D.: Updating pagerank with iterative aggregation. In: WWW Alt. 04: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 392–393. ACM, New York (2004)
Chapter Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web.Technical report, Stanford University Database Group. http://citeseer.nj.nec.com/368196.html (1998)
http://snap.stanford.edu/data/

Download references

Author information

Authors and Affiliations

Renmin University of China, Beijing, China
Lina Li, Cuiping Li & Hong Chen

Authors

Lina Li
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lina Li .

Editor information

Editors and Affiliations

Fudan University School of Computer Science, Shanghai, People's Republic of China
Shuigeng Zhou
University of Finance and Economics, Nanjing, People's Republic of China
Zhiang Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Li, C., Chen, H. (2013). Parallel Simrank Computing on Large Scale Dataset on Mapreduce. In: Zhou, S., Wu, Z. (eds) Social Media Retrieval and Mining. Communications in Computer and Information Science, vol 387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41629-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-41629-3_3
Published: 16 November 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41628-6
Online ISBN: 978-3-642-41629-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics