Skip to main content

Parallel Simrank Computing on Large Scale Dataset on Mapreduce

  • Conference paper
  • First Online:
Social Media Retrieval and Mining

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 387))

  • 786 Accesses

Abstract

Many fields need computing the similarity between objects, such as recommendation system, search engine etc. Simrank is one of the simple and intuitive algorithms. It is rigidly based on the random walk theorem. There are three existing iterative ways to compute simrank, however, all of them have one problem, that is time consuming; moreover, with the rapidly growing data on the Internet, we need a novel parallel method to compute simrank on large scale dataset. Hadoop is one of the popular distributed platforms. This paper combines the features of the Hadoop and computes the simrank parallel with different methods, and compars them in the performance.

This work is supported by National Core-High-Base Major Special Subject‘Research on Key technology on High Performance High security Domestic database’(2010ZX01042-001-002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI 04: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (2004)

    Google Scholar 

  2. Zheng, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation

    Google Scholar 

  3. Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endownment 3(1), 285–296 (2010)

    Google Scholar 

  4. Kambatla, K., Rapolu, N., Jagannathan, S., Grama, A.: Asynchronous algorithm in MapReduce. In: 2010 IEEE International Conference on Cluster Computing

    Google Scholar 

  5. Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)

    Article  Google Scholar 

  6. Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on MapReduce. In: SIGMOD 11, 12–16 June 2011, Athens, Greece

    Google Scholar 

  7. Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW 2005, Chiba, Japan

    Google Scholar 

  8. Lizorkin, D., Velikhov, P., Grinev, M.: Accuracy estimate and optimization techmiques for SimRank computation. VLDB J. 19(1), 45–66 (2010)

    Article  Google Scholar 

  9. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD 02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM Press, New York (2002)

    Chapter  Google Scholar 

  10. Li, C., Han, J., He, G.: Fast computation of SimRank for static and dynamic information networks. In: EDBT 2010, 22–26 March 2010, Lausanne, Switzerland

    Google Scholar 

  11. He, G., Feng, H., Li, C.: Parallel simrank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD 2010

    Google Scholar 

  12. Feng, H.: Research on Parallel Simrank. BeiJing Renmin University of China (2010)

    Google Scholar 

  13. Fogaras, D., Rácz, B.: Towards scaling fully personalized pageRank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)

    Google Scholar 

  14. Yu, W., Lin, X., Le, J.: Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 280–296. Springer, Heidelberg (2010)

    Google Scholar 

  15. Li, P., Liu, H., et al.: Fast single-pair SimRank computation. In: 2010 SIAM International Conference on Data Mining, pp. 571–582 (2010)

    Google Scholar 

  16. Langville, A.N., Meyer, C.D.: Updating pagerank with iterative aggregation. In: WWW Alt. 04: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 392–393. ACM, New York (2004)

    Chapter  Google Scholar 

  17. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web.Technical report, Stanford University Database Group. http://citeseer.nj.nec.com/368196.html (1998)

  18. http://snap.stanford.edu/data/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, L., Li, C., Chen, H. (2013). Parallel Simrank Computing on Large Scale Dataset on Mapreduce. In: Zhou, S., Wu, Z. (eds) Social Media Retrieval and Mining. Communications in Computer and Information Science, vol 387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41629-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41629-3_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41628-6

  • Online ISBN: 978-3-642-41629-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics