Advertisement

Journal of Computer Science and Technology

, Volume 34, Issue 2, pp 339–371 | Cite as

A Survey on Graph Processing Accelerators: Challenges and Opportunities

  • Chuang-Yi Gui
  • Long ZhengEmail author
  • Bingsheng He
  • Cheng Liu
  • Xin-Yu Chen
  • Xiao-Fei Liao
  • Hai Jin
Survey
  • 7 Downloads

Abstract

Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving the performance and/or energy efficiency on traditional architectures, dedicated hardware solutions, also referred to as graph processing accelerators, are essential and emerging to provide the benefits significantly beyond what those pure software solutions can offer. In this paper, we conduct a systematical survey regarding the design and implementation of graph processing accelerators. Specifically, we review the relevant techniques in three core components toward a graph processing accelerator: preprocessing, parallel graph computation, and runtime scheduling. We also examine the benchmarks and results in existing studies for evaluating a graph processing accelerator. Interestingly, we find that there is not an absolute winner for all three aspects in graph acceleration due to the diverse characteristics of graph processing and the complexity of hardware configurations. We finally present and discuss several challenges in details, and further explore the opportunities for the future research.

Keywords

graph processing accelerator domain-specific architecture performance energy efficiency 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2019_1914_MOESM1_ESM.pdf (668 kb)
ESM 1 (PDF 667 kb)

References

  1. [1]
    Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 2010, pp.135-146.Google Scholar
  2. [2]
    Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.CrossRefGoogle Scholar
  3. [3]
    Shun J, Blelloch G E. Ligra: A lightweight graph processing framework for shared memory. In Proc. the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2013, pp.135-146.Google Scholar
  4. [4]
    Kyrola A, Blelloch G E, Guestrin C. GraphChi: Large-scale graph computation on just a PC. In Proc. the 10th USENIX Conf. Operating Systems Design and Implementation, October 2012, pp.31-46.Google Scholar
  5. [5]
    Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edge-centric graph processing using streaming partitions. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, November 2013, pp.472-488.Google Scholar
  6. [6]
    Zhong J, He B. Medusa: A parallel graph processing system on graphics processors. ACM SIGMOD Record, 2014, 43(2): 35-40.CrossRefGoogle Scholar
  7. [7]
    Khorasani F, Vora K, Gupta R, Bhuyan L N. CuSha: Vertex-centric graph processing on GPUs. In Proc. the 23rd Int. Symp. High-Performance Parallel and Distributed Computing, June 2014, pp.239-252.Google Scholar
  8. [8]
    Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, March 2016, Article No. 11.Google Scholar
  9. [9]
    Shi X, Luo X, Liang J, Zhao P, Di S, He B, Jin H. Frog: Asynchronous graph processing on GPU with hybrid coloring model. IEEE Trans. Knowledge and Data Engineering, 2018, 30(1): 29-42.CrossRefGoogle Scholar
  10. [10]
    Fu Z, Personick M, Thompson B. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proc. the 2nd International Workshop on Graph Data Management Experiences and Systems, June 2014, Article No. 2.Google Scholar
  11. [11]
    Liu H, Huang H H. Enterprise: Breadth-first graph traversal on GPUs. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 68.Google Scholar
  12. [12]
    Beamer S, Asanovic K, Patterson D. Locality exists in graph processing: Workload characterization on an ivy bridge server. In Proc. IEEE Int. Symp. Workload Characterization, November 2015, pp.56-65.Google Scholar
  13. [13]
    Malicevic J, Lepers B, Zwaenepoel W. Everything you always wanted to know about multicore graph processing but were afraid to ask. In Proc. the 2017 USENIX Annual Technical Conf., July 2017, pp.631-643.Google Scholar
  14. [14]
    Nai L, Hadidi R, Sim J, Kim H, Kumar P, Kim H. Graph-PIM: Enabling instruction-level PIM offloading in graph computing frameworks. In Proc. the 2007 IEEE Int. Symp. High Performance Computer Architecture, February 2017, pp.457-468.Google Scholar
  15. [15]
    Yao P, Zheng L, Liao X, Jin H, He B. An efficient graph accelerator with parallel data conflict management. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, November 2018, Article No. 8.Google Scholar
  16. [16]
    Ham T J, Wu L, Sundaram N, Satish N, Martonosi M. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016, Article No. 56.Google Scholar
  17. [17]
    Nai L, Xia Y, Tanase I G, Kim H, Lin C Y. GraphBIG: Understanding graph computing in the context of industrial solutions. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 69.Google Scholar
  18. [18]
    Satish N, Sundaram N, Patwary M M, Seo J, Park J, Hassaan M A, Sengupta S, Yin Z, Dubey P. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 2014, pp.979-990.Google Scholar
  19. [19]
    Ben-Nun T, Sutton M, Pai S, Pingali K. Groute: An asynchronous multi-GPU programming model for irregular computations. In Proc. the 22nd ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2017, pp.235-248.Google Scholar
  20. [20]
    Hennessy J, Patterson D. Domain specific architectures. In Computer Architecture: A Quantitative Approach (6th edition), Merken S, McFadden N (eds.), Elsevier, 2017, pp.540-606.Google Scholar
  21. [21]
    Ceze L, Hill M D, Sankaralingam K, Wenisch T F. Democratizing design for future computing platforms. arXiv:1706.08597, 2017. http://arxiv.org/abs/1706.08597, Jun. 2017.
  22. [22]
    Lee Y, Waterman A, Cook H et al. An agile approach to building RISC-V microprocessors. IEEE Micro, 2016, 36(2): 8-20.CrossRefGoogle Scholar
  23. [23]
    Caulfield A M, Chung E S, Putnam A et al. A cloud-scale acceleration architecture. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016, Article No. 7.Google Scholar
  24. [24]
    de Lorimier M, Kapre N, Mehta N et al. GraphStep: A system architecture for sparse-graph algorithms. In Proc. the 14th Annual IEEE Symp. Field-Programmable Custom Computing Machines, April 2006, pp.143-151.Google Scholar
  25. [25]
    Attia O G, Johnson T, Townsend K, Jones P, Zambreno J. CyGraph: A reconfigurable architecture for parallel breadth-first search. In Proc. the 2004 Int. Parallel and Distributed Processing Symp. Workshops, May 2014, pp.228-235.Google Scholar
  26. [26]
    Dai G, Chi Y, Wang Y, Yang H. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proc. the 2006 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.105-110.Google Scholar
  27. [27]
    Zhou S, Chelmis C, Prasanna V K. High-throughput and energy-efficient graph processing on FPGA. In Proc. the 24th IEEE Annual Int. Symp. Field-Programmable Custom Computing Machines, May 2016, pp.103-110.Google Scholar
  28. [28]
    Dai G, Huang T, Chi Y, Xu N, Wang Y, Yang H. Fore-Graph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.217-226.Google Scholar
  29. [29]
    Ozdal M M, Yesil S, Kim T, Ayupov A, Greth J, Burns S, Özturk Ö. Energy efficient architecture for graph analytics accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.166-177.Google Scholar
  30. [30]
    Zhou J, Liu S, Guo Q, Zhou X, Zhi T, Liu D, Wang C, Zhou X, Chen Y, Chen T. TuNao: A high-performance and energy-efficient reconfigurable accelerator for graph processing. In Proc. the 17th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2017, pp.731-734.Google Scholar
  31. [31]
    Ayupov A, Yesil S, Ozdal M M, Kim T, Burns S, Özturk Ö. A template-based design methodology for graph-parallel hardware accelerators. IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, 2018, 37(2): 420-430.CrossRefGoogle Scholar
  32. [32]
    Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. the 42nd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2015, pp.105-117.Google Scholar
  33. [33]
    Pawlowski J T. Hybrid memory cube (HMC). In Proc. the 23rd IEEE Hot Chips Symp., August 2011, Article No. 15.Google Scholar
  34. [34]
    Kim J, Kim Y. HBM: Memory solution for bandwidth-hungry processors. In Proc. the 26th IEEE Hot Chips Symp., August 2014, Article No. 19.Google Scholar
  35. [35]
    Wong H S, Lee H Y, Yu S, Chen Y S, Wu Y, Chen P S, Lee B, Chen F T, Tsai M J. Metal-oxide RRAM. Proceedings of the IEEE, 2012, 100(6): 1951-1970.CrossRefGoogle Scholar
  36. [36]
    Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford InfoLab, 1999. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf, Jan. 2019.
  37. [37]
    McCune R R, Weninger T, Madey G. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Trans. Computing Surveys, 2015, 48(2): Article No. 25.Google Scholar
  38. [38]
    Shi X, Zheng Z, Zhou Y, Jin H, He L, Liu B, Hua Q. Graph processing on GPUs: A survey. ACM Trans. Computing Surveys, 2018, 50(6): Article No. 81.Google Scholar
  39. [39]
    Heidari S, Simmhan Y, Calheiros R N, Buyya R. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Trans. Computing Surveys, 2018, 51(3): Article No. 60.Google Scholar
  40. [40]
    Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Power-Graph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symp. Operating Systems Design and Implementation, October 2012, pp.17-30.Google Scholar
  41. [41]
    Avery C. Giraph: Large-scale graph processing infrastructure on Hadoop. In Proc. the 2011 Hadoop Summit, June 2011, pp.5-9.Google Scholar
  42. [42]
    Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX: Graph processing in a distributed dataflow framework. In Proc. the 11th USENIX Symp. Operating Systems Design and Implementation, October 2014, pp.599-613.Google Scholar
  43. [43]
    Teixeira C H, Fonseca A J, Serafini M, Siganos G, Zaki M J, Aboulnaga A. Arabesque: A system for distributed graph mining. In Proc. the 25th Symp. Operating Systems Principles, October 2015, pp.425-440.Google Scholar
  44. [44]
    Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conf. Computer Systems, April 2015, Article No. 1.Google Scholar
  45. [45]
    Zhu X, Chen W, Zheng W, Ma X. Gemini: A computation-centric distributed graph processing system. In Proc. the 12th USENIX Symp. Operating Systems Design and Implementation, November 2016, pp.301-316.Google Scholar
  46. [46]
    Khayyat Z, Awara K, Alonazi A, Jamjoom H, Williams D, Kalnis P. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proc. the 8th ACM European Conf. Computer Systems, April 2013, pp.169-182.Google Scholar
  47. [47]
    Randles M, Lamb D, Taleb-Bendiab A. A comparative study into distributed load balancing algorithms for cloud computing. In Proc. the 24th IEEE Int. Conf. Advanced Information Networking and Applications Workshops, April 2010, pp.551-556.Google Scholar
  48. [48]
    Zhao Y, Yoshigoe K, Xie M, Zhou S, Seker R, Bian J. LightGraph: Lighten communication in distributed graph-parallel processing. In Proc. the 2004 IEEE Int. Congress on Big Data, June 2014, pp.717-724.Google Scholar
  49. [49]
    Wang P, Zhang K, Chen R, Chen H, Guan H. Replication-based fault-tolerance for large-scale graph processing. In Proc. the 44th Annual IEEE/IFIP Int. Conf. Dependable Systems and Networks, June 2014, pp.562-573.Google Scholar
  50. [50]
    Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, November 2013, pp.456-471.Google Scholar
  51. [51]
    Sundaram N, Satish N, Patwary M M, Dulloor S R, Anderson M J, Vadlamudi S G, Das D, Dubey P. GraphMat: High performance graph analytics made productive. Proceedings of the VLDB Endowment, 2015, 8(11): 1214-1225.CrossRefGoogle Scholar
  52. [52]
    Zhang K, Chen R, Chen H. NUMA-aware graph-structured analytics. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.183-193.Google Scholar
  53. [53]
    Han W S, Lee S, Park K, Lee J H, Kim M S, Kim J, Yu H. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In Proc. the 19th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2013, pp.77-85.Google Scholar
  54. [54]
    Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K. Fast iterative graph computation: A path centric approach. In Proc. the 2004 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2014, pp.401-412.Google Scholar
  55. [55]
    Zhu X, Han W, Chen W. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. the 2005 USENIX Annual Technical Conf., July 2015, pp.375-386.Google Scholar
  56. [56]
    Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: An efficient graph processing system on a single machine. In Proc. the 32nd IEEE Int. Conf. Data Engineering, May 2016, pp.409-420.Google Scholar
  57. [57]
    Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T. Mosaic: Processing a trillion-edge graph on a single machine. In Proc. the 12th ACM European Conf. Computer Systems, April 2017, pp.527-543.Google Scholar
  58. [58]
    Seo H, Kim J, Kim M S. GStream: A graph streaming processing method for large-scale graphs on GPUs. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.253-254.Google Scholar
  59. [59]
    Soman J, Kishore K, Narayanan P J. A fast GPU algorithm for graph connectivity. In Proc. the 24th IEEE Int. Symp. Parallel & Distributed Processing, Workshops and PhD Forum, April 2010, Article No. 87.Google Scholar
  60. [60]
    McLaughlin A, Bader D A. Scalable and high performance betweenness centrality on the GPU. In Proc. the 2014 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2014, pp.572-583.Google Scholar
  61. [61]
    Sariyüce A E, Kaya K, Saule E, Çatalyürek Ü V. Betweenness centrality on GPUs and heterogeneous architectures. In Proc. the 6th Workshop on General Purpose Processor Using Graphics Processing Units, March 2013, pp.76-85.Google Scholar
  62. [62]
    Davidson A A, Baxter S, Garland M, Owens J D. Work-efficient parallel GPU methods for single-source shortest paths. In Proc. the 28th IEEE Int. Parallel and Distributed Processing Symp., May 2014, pp.349-359.Google Scholar
  63. [63]
    Hong S, Chafi H, Sedlar E, Olukotun K. Green-Marl: A DSL for easy and efficient graph analysis. In Proc. the 17th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2012, pp.349-362.Google Scholar
  64. [64]
    Gharaibeh A, Reza T, Santos-Neto E, Costa L B, Sallinen S, Ripeanu M. Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018, 2013. http://arxiv.org/abs/1312.3018, Dec. 2018.
  65. [65]
    Zhang T, Zhang J, Shu W, Wu M Y, Liang X. Efficient graph computation on hybrid CPU and GPU systems. The Journal of Supercomputing, 2015, 71(4): 1563-1586.CrossRefGoogle Scholar
  66. [66]
    Liu H, Huang H H, Hu Y. iBFS: Concurrent breadth-first search on GPUs. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.403-416.Google Scholar
  67. [67]
    Sengupta D, Song S L, Agarwal K, Schwan K. GraphReduce: Processing large-scale graphs on accelerator-based systems. In Proc. the 2015 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 28.Google Scholar
  68. [68]
    Kim M S, An K, Park H, Seo H, Kim J. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.447-461.Google Scholar
  69. [69]
    Han L, Shen Z, Shao Z, Huang H H, Li T. A novel ReRAM-based processing-in-memory architecture for graph computing. In Proc. the 6th IEEE Non-Volatile Memory Systems and Applications Symp., August 2017, Article No. 13.Google Scholar
  70. [70]
    Song L, Zhuo Y, Qian X, Li H, Chen Y. GraphR: Accelerating graph processing using ReRAM. In Proc. the 2018 IEEE Int. Symp. High Performance Computer Architecture, February 2018, pp.531-543.Google Scholar
  71. [71]
    Zhang J, Khoram S, Li J. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth first search. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.207-216.Google Scholar
  72. [72]
    Oguntebi T, Olukotun K. GraphOps: A dataflow library for graph analytics acceleration. In Proc. the 2016 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.111-117.Google Scholar
  73. [73]
    Dai G, Huang T, Chi Y, Zhao J, Sun G, Liu Y, Wang Y, Xie Y, Yang H. GraphH: A processing-in-memory architecture for large-scale graph processing. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems. doi: https://doi.org/10.1109/TCAD.2018.2821565.
  74. [74]
    Zhang J, Li J. Degree-aware hybrid graph traversal on FPGA-HMC platform. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.229-238.Google Scholar
  75. [75]
    Zhou S, Kannan R, Min Y, Prasanna V K. FASTCF: FPGA-based accelerator for stochastic-gradient-descent-based collaborative filtering. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.259-268.Google Scholar
  76. [76]
    Khoram S, Zhang J, Strange M, Li J. Accelerating graph analytics by co-optimizing storage and access on an FPGAHMC platform. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.239-248.Google Scholar
  77. [77]
    Han L, Shen Z, Liu D, Shao Z, Huang H H, Li T. A novel ReRAM-based processing-in-memory architecture for graph traversal. ACM Trans. Storage, 2018, 14(1): Article No. 9.Google Scholar
  78. [78]
    Wang Q, Jiang W, Xia Y, Prasanna V. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Proc. the 2010 Int. Conf. Field-Programmable Technology, December 2010, pp.70-77.Google Scholar
  79. [79]
    Umuroglu Y, Morrison D, Jahre M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In Proc. the 25th Int. Conf. Field Programmable Logic and Applications, September 2015, Article No. 12.Google Scholar
  80. [80]
    Zhou S, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proc. the 29th Int. Symp. Computer Architecture and High Performance Computing, October 2017, pp.137-144.Google Scholar
  81. [81]
    Zhang M, Zhuo Y, Wang C, Gao M, Wu Y, Chen K, Kozyrakis C, Qian X. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In Proc. the 2018 IEEE Int. Symp. High Performance Computer Architecture, February 2018, pp.544-557.Google Scholar
  82. [82]
    Huang T, Dai G, Wang Y, Yang H. HyVE: Hybrid vertexedge memory hierarchy for energy-efficient graph processing. In Proc. the 2018 Design, Automation and Test in Europe Conference and Exhibition, March 2018, pp.973-978.Google Scholar
  83. [83]
    Ozdal M M, Yesil S, Kim T, Ayupov A, Greth J, Burns S, Ozturk O. Graph analytics accelerators for cognitive systems. IEEE Micro, 2017, 37(1): 42-51.CrossRefGoogle Scholar
  84. [84]
    Kapre N. Custom FPGA-based soft-processors for sparse graph acceleration. In Proc. the 26th IEEE Int. Conf. Application-Specific Systems, Architectures and Processors, July 2015, pp.9-16.Google Scholar
  85. [85]
    Betkaoui B, Thomas D B, Luk W, Przulj N. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In Proc. the 2011 Int. Conf. Field-Programmable Technology, December 2011, Article No. 2.Google Scholar
  86. [86]
    Betkaoui B, Wang Y, Thomas D B, Luk W. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In Proc. the 23rd IEEE Int. Conf. Application-Specific Systems, Architectures and Processors, July 2012, pp.8-15.Google Scholar
  87. [87]
    Betkaoui B, Wang Y, Thomas D B, Luk W. Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study. In Proc. the 22nd Int. Conf. Field Programmable Logic and Applications, August 2012, pp.99-104.Google Scholar
  88. [88]
    Nurvitadhi E, Weisz G, Wang Y, Hurkat S, Nguyen M, Hoe J C, Martínez J F, Guestrin C. GraphGen: An FPGA framework for vertex-centric graph computation. In Proc. the 22nd IEEE Annual Int. Symp. Field-Programmable Custom Computing Machines, May 2014, pp.25-28.Google Scholar
  89. [89]
    Attia O G, Grieve A, Townsend K R, Jones P, Zambreno J. Accelerating all-pairs shortest path using a message-passing reconfigurable architecture. In Proc. the 2015 Int. Conf. Reconfigurable Computing and FPGAs, December 2015, Article No. 5.Google Scholar
  90. [90]
    Engelhardt N, So H K. GraVF: A vertex-centric distributed graph processing framework on FPGAs. In Proc. the 26th Int. Conf. Field Programmable Logic and Applications, August 2016, Article No. 62.Google Scholar
  91. [91]
    Jin H, Yao P, Liao X, Zheng L, Li X. Towards dataflow-based graph accelerator. In Proc. the 37th IEEE Int. Conf. Distributed Computing Systems, June 2017, pp.1981-1992.Google Scholar
  92. [92]
    Zhou S, Chelmis C, Prasanna V K. Accelerating large-scale single-source shortest path on FPGA. In Proc. the 2015 Int. Parallel and Distributed Processing Symposium Workshop, May 2015, pp.129-136.Google Scholar
  93. [93]
    Zhou S, Chelmis C, Prasanna V K. Optimizing memory performance for FPGA implementation of PageRank. In Proc. the 2015 Int. Conf. Reconfigurable Computing and FPGAs, December 2015, Article No. 53.Google Scholar
  94. [94]
    Jun S W, Wright A, Zhang S, Xu S, Arvind. GraFBoost: Using accelerated flash storage for external graph analytics. In Proc. the 45th ACM/IEEE Int. Symp. Computer Architecture, June 2018, pp.411-424.Google Scholar
  95. [95]
    Thomas D, Moorby P. The Verilog® Hardware Description Language (5th edition). Springer, 2002.Google Scholar
  96. [96]
    Ashenden P J. The Designer’s Guide to VHDL (3rd edition). Morgan Kaufmann, 2008.Google Scholar
  97. [97]
    Lee J, Kim H, Yoo S, Choi K, Hofstee H P, Nam G J, Nutter M R, Jamsek D. ExtraV: Boosting graph processing near storage with a coherent accelerator. Proceedings of the VLDB Endowment, 2017, 10(12): 1706-1717.CrossRefGoogle Scholar
  98. [98]
    Kim G, Kim J, Ahn J H, Kim J. Memory-centric system interconnect design with hybrid memory cubes. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, September 2013, pp.145-155.Google Scholar
  99. [99]
    Xu C, Niu D, Muralimanohar N, Balasubramonian R, Zhang T, Yu S, Xie Y. Overcoming the challenges of crossbar resistive memory architectures. In Proc. the 21st IEEE Int. Symp. High Performance Computer Architecture, February 2015, pp.476-488.Google Scholar
  100. [100]
    Do J, Kee Y S, Patel J M, Park C, Park K, DeWitt D J. Query processing on smart SSDs: Opportunities and challenges. In Proc. the 2013 ACM SIGMOD Int. Conf. Management of Data, June 2013, pp.1221-1230.Google Scholar
  101. [101]
    Jun S W, Liu M, Lee S, Hicks J, Ankcorn J, King M, Xu S, Arvind. BlueDBM: An appliance for big data analytics. In Proc. the 42nd ACM Annual Int. Symp. Computer Architecture, June 2015, pp.1-13.Google Scholar
  102. [102]
    Zhang J, Jung M. FlashAbacus: A self-governing flash-based accelerator for low-power systems. In Proc. the 13th EuroSys Conf., April 2018, Article No. 15.Google Scholar
  103. [103]
    Ozdal M M. Emerging accelerator platforms for data centers. IEEE Design & Test, 2018, 35(1): 47-54.Google Scholar
  104. [104]
    Weisz G, Melber J, Wang Y, Fleming K, Nurvitadhi E, Hoe J C. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In Proc. the 2016 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.264-273.Google Scholar
  105. [105]
    Gu B, Yoon A S, Bae D H, Jo I, Lee J, Yoon J, Kang J U, Kwon M, Yoon C, Cho S, Jeong J, Chang D. Biscuit: A framework for near-data processing of big data workloads. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.153-165.Google Scholar
  106. [106]
    Son Y, Choi J, Jeon J, Min C, Kim S, Yeom H Y, Han H. SSD-assisted backup and recovery for database systems. In Proc. the 33rd IEEE Int. Conf. Data Engineering, April 2017, pp.285-296.Google Scholar
  107. [107]
    Song W S, Gleyzer V, Lomakin A, Kepner J. Novel graph processor architecture, prototype system, and results. In Proc. the 2016 IEEE High Performance Extreme Computing Conference, September 2016, Article No. 59.Google Scholar
  108. [108]
    Jin H, Yao P, Liao X. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): Article No. 126102.Google Scholar
  109. [109]
    Windh S, Budhkar P, Najjar W A. CAMs as synchronizing caches for multithreaded irregular applications on FPGAs. In Proc. the 2015 ACM/IEEE Int. Conf. Computer-Aided Design, November 2015, pp.331-336.Google Scholar
  110. [110]
    Wang L, Yang X, Dai H. Scratchpad memory allocation for arrays in permutation graphs. Science China Information Sciences, 2013, 56(5): 1-13.MathSciNetGoogle Scholar
  111. [111]
    Gao M, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frameworks. In Proc. the 2015 Int. Conf. Parallel Architecture and Compilation, October 2015, pp.113-124.Google Scholar
  112. [112]
    Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. ACM SIGCOMM Computer Communication Review, 1999, 29(4): 251-262.CrossRefGoogle Scholar
  113. [113]
    Xie C, Chen R, Guan H, Zang B, Chen H. SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.194-204.Google Scholar
  114. [114]
    Ozdal M M, Yesil S, Kim T, Ayupov A, Burns S, Ozturk O. Architectural requirements for energy efficient execution of graph analytics applications. In Proc. the 2015 IEEE/ACM Int. Conf. Computer-Aided Design, November 2015, pp.676-681.Google Scholar
  115. [115]
    Beamer S, Asanović K, Patterson D. Direction-optimizing breadth-first search. In Proc. the 2012 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 12.Google Scholar
  116. [116]
    Beamer S, Asanović K, Patterson D. The GAP benchmark suite. arXiv:1508.03619, 2015. http://arxiv.org/abs/1508.03619, May 2017.
  117. [117]
    Scarpazza D P, Villa O, Petrini F. Efficient breadth-first search on the Cell/B.E. processor. IEEE Trans. Parallel and Distributed Systems, 2008, 19(10): 1381-95.Google Scholar
  118. [118]
    Milenković T, Lai J, Pržulj N. GraphCrunch: A tool for large network analyses. BMC Bioinformatics, 2008, 9: Article No. 70.Google Scholar
  119. [119]
    Hong S, Oguntebi T, Olukotun K. Efficient parallel graph exploration on multi-core CPU and GPU. In Proc. the 2011 Int. Conf. Parallel Architectures and Compilation Techniques, October 2011, pp.78-88.Google Scholar
  120. [120]
    Matsumoto K, Nakasato N, Sedukhin S G. Blocked all-pairs shortest paths algorithm for hybrid CPU-GPU system. In Proc. the 13th IEEE Int. Conf. High Performance Computing and Communications, September 2011, pp.145-152.Google Scholar
  121. [121]
    Siek J G, Lee L Q, Lumsdaine A. The Boost Graph Library: User Guide and Reference Manual (PAP/CDR edition). Addison-Wesley Professional, 2001.Google Scholar
  122. [122]
    Ma X, Zhang D, Chiou D. FPGA-accelerated transactional execution of graph workloads. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.227-236.Google Scholar
  123. [123]
    Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe C E, SzalayA S. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proc. the 13th USENIX Conf. File and Storage Technologies, February 2015, pp.45-58.Google Scholar
  124. [124]
    Rodeh O. B-trees, shadowing, and clones. ACM Transactions on Storage, 2008, 3(4): Article No. 2.Google Scholar
  125. [125]
    Sha M, Li Y, He B, Tan K L. Accelerating dynamic graph analytics on GPUs. Proceedings of the VLDB Endowment, 2017, 11(1): 107-120.CrossRefGoogle Scholar
  126. [126]
    Shi X, Cui B, Shao Y, Tong Y. Tornado: A system for real-time iterative analysis over evolving data. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.417-430.Google Scholar
  127. [127]
    Chen H, Sun Z, Yi F, Su J. BufferBank storage: An economic, scalable and universally usable in-network storage model for streaming data applications. Science China Information Sciences, 2016, 59(1): 1-15.Google Scholar
  128. [128]
    Zhang M, Wu Y, Chen K, Qian X, Li X, Zheng W. Exploring the hidden dimension in graph processing. In Proc. the 12th USENIX Conf. Operating Systems Design and Implementation, November 2016, pp.285-300.Google Scholar
  129. [129]
    Battaglia P W, Hamrick J B, Bapst V et al. Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261, 2018. http://arxiv.org/abs/1806.01261, Jun. 2018.
  130. [130]
    Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S. graph2vec: Learning distributed representations of graphs. arXiv:1707.05005, 2017. https://arxiv.org/abs/1707.05005, Jun. 2018.
  131. [131]
    Ribeiro L F, Saverese P H, Figueiredo D R. Struc2vec: Learning node representations from structural identity. In Proc. the 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2017, pp.385-394.Google Scholar
  132. [132]
    Zheng L, Liao X, Jin H. Efficient and scalable graph parallel processing with symbolic execution. ACM Trans. Architecture and Code Optimization, 2018, 15(1): Article No. 3.Google Scholar
  133. [133]
    Li Z, Liu L, Deng Y, Yin S, Wang Y, Wei S. Aggressive pipelining of irregular applications on reconfigurable hardware. In Proc. the 44th Annual Int. Symp. Computer Architecture, June 2017, pp.575-586.Google Scholar
  134. [134]
    Zheng L, Liao X, Jin H, Zhao J, Wang Q. Scalable concurrency debugging with distributed graph processing. In Proc. the 2018 Int. Symp. Code Generation and Optimization, February 2018, pp.188-199.Google Scholar
  135. [135]
    Jouppi N P, Young C, Patil N et al. In-datacenter performance analysis of a tensor processing unit. In Proc. the 44th Annual Int. Symp. Computer Architecture, June 2017, pp.1-12.Google Scholar
  136. [136]
    Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284.Google Scholar

Copyright information

© Springer Science+Business Media, LLC & Science Press, China 2019

Authors and Affiliations

  • Chuang-Yi Gui
    • 1
    • 2
    • 3
  • Long Zheng
    • 1
    • 2
    • 3
    Email author
  • Bingsheng He
    • 4
  • Cheng Liu
    • 4
    • 5
  • Xin-Yu Chen
    • 4
  • Xiao-Fei Liao
    • 1
    • 2
    • 3
  • Hai Jin
    • 1
    • 2
    • 3
  1. 1.National Engineering Research Center for Big Data Technology and System, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.Services Computing Technology and System Laboratory, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  3. 3.Cluster and Grid Computing Laboratory, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  4. 4.School of ComputingNational University of SingaporeSingaporeSingapore
  5. 5.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations