Abstract
Many fields need computing the similarity between objects, such as recommendation system, search engine etc. Simrank is one of the simple and intuitive algorithms. It is rigidly based on the random walk theorem. There are three existing iterative ways to compute simrank, however, all of them have one problem, that is time consuming; moreover, with the rapidly growing data on the Internet, we need a novel parallel method to compute simrank on large scale dataset. Hadoop is one of the popular distributed platforms. This paper combines the features of the Hadoop and computes the simrank parallel with different methods, and compars them in the performance.
This work is supported by National Core-High-Base Major Special Subject‘Research on Key technology on High Performance High security Domestic database’(2010ZX01042-001-002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI 04: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (2004)
Zheng, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endownment 3(1), 285–296 (2010)
Kambatla, K., Rapolu, N., Jagannathan, S., Grama, A.: Asynchronous algorithm in MapReduce. In: 2010 IEEE International Conference on Cluster Computing
Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on MapReduce. In: SIGMOD 11, 12–16 June 2011, Athens, Greece
Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW 2005, Chiba, Japan
Lizorkin, D., Velikhov, P., Grinev, M.: Accuracy estimate and optimization techmiques for SimRank computation. VLDB J. 19(1), 45–66 (2010)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD 02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM Press, New York (2002)
Li, C., Han, J., He, G.: Fast computation of SimRank for static and dynamic information networks. In: EDBT 2010, 22–26 March 2010, Lausanne, Switzerland
He, G., Feng, H., Li, C.: Parallel simrank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD 2010
Feng, H.: Research on Parallel Simrank. BeiJing Renmin University of China (2010)
Fogaras, D., Rácz, B.: Towards scaling fully personalized pageRank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)
Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 280–296. Springer, Heidelberg (2010)
Li, P., Liu, H., et al.: Fast single-pair SimRank computation. In: 2010 SIAM International Conference on Data Mining, pp. 571–582 (2010)
Langville, A.N., Meyer, C.D.: Updating pagerank with iterative aggregation. In: WWW Alt. 04: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 392–393. ACM, New York (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web.Technical report, Stanford University Database Group. http://citeseer.nj.nec.com/368196.html (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, L., Li, C., Chen, H. (2013). Parallel Simrank Computing on Large Scale Dataset on Mapreduce. In: Zhou, S., Wu, Z. (eds) Social Media Retrieval and Mining. Communications in Computer and Information Science, vol 387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41629-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-41629-3_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41628-6
Online ISBN: 978-3-642-41629-3
eBook Packages: Computer ScienceComputer Science (R0)