Advertisement

A Semi-clustering Scheme for High Performance PageRank on Hadoop

  • Seungtae Hong
  • Jeonghoon Lee
  • Jaewoo Chang
  • Dong Hoon Choi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8823)

Abstract

As global Internet business has been evolving, large-scale graphs are becoming popular. PageRank computation on the large-scale graphs using Hadoop with default data partitioning method suffers from poor performance because Hadoop scatters even a set of directly connected vertices to arbitrary multiple nodes. In this paper we propose a semi-clustering scheme to address this problem and improve the performance of PageRank on Hadoop. Our scheme divides a graph into a set of semi-clusters, each of which consists of connected vertices, and assigns a semi-cluster to a single data partition in order to reduce the cost of data exchange between nodes during the computation of PageRank. The semi-clusters are merged and split before the PageRank computation, in order to evenly distribute a large-scale graph into a number of data partitions. Our semi-clustering scheme drastically improves the performance: total elapsed time including the cost of the semi-clustering computation reduced by up to 36%. Furthermore, the effectiveness of our scheme increases as the size of the graph increases.

Keywords

Large-scale graph analysis semi-clustering Hadoop PageRank 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford InfoLab (1999)Google Scholar
  2. 2.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Word Wide Web (1998)Google Scholar
  3. 3.
    Avrachenkov, K., Dobrynin, K.V., Nemirovsky, D., Pham, S., Smirnova, E.: PageRank based clustering of hypertext document collections. SIGIR (2008)Google Scholar
  4. 4.
    Pedroche, F.: Modeling social network sites with PageRank and social competences. International Journal of Complex Systems in Science 1, 65–68 (2011)Google Scholar
  5. 5.
    Ivn, G., Grolmusz, V.: When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks. Bioinformatics Advance Access (2010)Google Scholar
  6. 6.
    Busa, N., Jagtap, U., Prateek, U., Arms, W.: PageRank calculation using MapReduce. Technical Report, Cornell University (2008)Google Scholar
  7. 7.
    Chang, S.-H., Zhu, Y., Malshe, P., Li, H.: Large scale PageRank with MapReduce. In: CloudCom (2010)Google Scholar
  8. 8.
    Abdullah, I.B.: Incremental PageRank for Twitter data using Hadoop. Technical Report, University of Edinburgh (2010)Google Scholar
  9. 9.
    Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites, MASCOTS (2011)Google Scholar
  10. 10.
    Lin, J., Schatz, M.: Design pattern for efficient graph algorithms in MapReduce, MLG 2010 (2010)Google Scholar
  11. 11.
    Rastogi, V., Machanavajjhala, A., Chitnis, L., Das Sarma, A.: Finding Connected Components on Map-reduce in Logarithmic Rounds. Computing Research Repository (CoRR), abs/1203.5387 (2012)Google Scholar
  12. 12.
  13. 13.
    Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I.: Pregel: A system for large-scale graph processing, SIGMOD (2010)Google Scholar
  14. 14.
    Shinnar, A., Cunningham, D., Herta, B., Saraswat, V.: M3R: Increased performance for in-memory Hadoop jobs, VLDB (2012)Google Scholar
  15. 15.
    Salihoglu, S., Widom, J.: GPS: A graph processing system, SSDBM (2013)Google Scholar
  16. 16.
  17. 17.
    Chakrabarti, D., Faloutsos, C.: Graph mining: Laws, generators, and algorithms. ACM Computing Survey 38 (March 2006)Google Scholar
  18. 18.
  19. 19.
    Stanford Large Network Dataset Collection, http://snap.stanford.edu/data/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Seungtae Hong
    • 1
  • Jeonghoon Lee
    • 2
  • Jaewoo Chang
    • 1
  • Dong Hoon Choi
    • 2
  1. 1.Dept. of Computer EngineeringChonbuk National UniversityJeonjuSouth Korea
  2. 2.Korea Institute of Science and Technology Information (KISTI)DaejeonSouth Korea

Personalised recommendations