PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

  • Menghan Jia
  • Yiming Zhang
  • Dongsheng Li
  • Songzhu MeiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11334)


Parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.


Breadth First Search Graph 500 Communication pruning 



This work is sponsored in part by the National Basic Research Program of China (793) under Grant No. 2014CB340303 and by National Natural Science Foundation of China (NSFC) under Grant No. 61772541.


  1. 1.
    Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)Google Scholar
  2. 2.
    Ajwani, D., Meyer, U., Osipov, V.:. Improved external memory BFS implementation. In: The Workshop on Algorithm Engineering & Experiments (2007)Google Scholar
  3. 3.
    Akkary, H., Driscoll, M.A.: A dynamic multithreading processor. In: 1998 Proceedings of ACM/IEEE International Symposium on Microarchitecture, Micro-31, pp. 226–236 (1998)Google Scholar
  4. 4.
    Awerbuch, B., Gallager, R.: A new distributed algorithm to find breadth first search trees. IEEE Trans. Inf. Theory 33(3), 315–322 (2003)CrossRefGoogle Scholar
  5. 5.
    Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, vol. 34, no. 2, pp. 523–530 (2006)Google Scholar
  6. 6.
    Beamer, S., Patterson, D.: Direction-optimizing breadth-first search. In: International Conference on High Performance Computing, Networking, Storage and Analysis, p. 12 (2012)Google Scholar
  7. 7.
    Bidstrup, S.M., Grady, C.P.L.: SSSP: simulation of single-sludge processes. Journal 60(3), 351–361 (1988)Google Scholar
  8. 8.
    Bulu, A.: Parallel breadth-first search on distributed memory systems, pp. 1–12 (2011)Google Scholar
  9. 9.
    Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: IEEE International Parallel and Distributed Processing Symposium, pp. 425–434 (2014)Google Scholar
  10. 10.
    Chow, E., Henderson, K., Yoo, A.: Distributed breadth-first search with 2-D partitioning. Lawrence Livermore National Laboratory (2005)Google Scholar
  11. 11.
    Dongarra, J., et al.: Special issue - MPI - a message passing interface standard. Int. J. Supercomput. Appl. High Perform. Comput. 8, 165 (1994)Google Scholar
  12. 12.
    Duran, A., Klemm, M.: The Intel® many integrated core architecture. In: International Conference on High Performance Computing and Simulation, pp. 365–366 (2012)Google Scholar
  13. 13.
    Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: High Performance Computing, Networking, Storage, pp. 769–780 (2015)Google Scholar
  14. 14.
    Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+ OpenSHMEM programming models (2013)Google Scholar
  15. 15.
    Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA 2010: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June, pp. 303–314 (2010)Google Scholar
  16. 16.
    Lu, H., Tan, G., Chen, M., Sun, N.: Reducing communication in parallel breadth-first search on distributed memory systems, pp. 1261–1268 (2015)Google Scholar
  17. 17.
    Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Luo, L., Wong, M., Hwu, W.M.: An effective GPU implementation of breadth-first search. In: Design Automation Conference, pp. 52–55 (2010)Google Scholar
  19. 19.
    Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)Google Scholar
  20. 20.
    Sallinen, S., Gharaibeh, A., Ripeanu, M.: Accelerating direction-optimized breadth first search on hybrid architectures. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 233–245. Springer, Cham (2015). Scholar
  21. 21.
    Snir, M.: MPI : The Complete Reference, pp. 4038–4040 (2010)Google Scholar
  22. 22.
    Su, B.Y., Brutch, T.G., Keutzer, K.: Parallel BFS graph traversal on images using structured grid, pp. 4489–4492 (2010)Google Scholar
  23. 23.
    Yoo, A., Chow, E., Henderson, K., Mclendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p. 25 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Menghan Jia
    • 1
  • Yiming Zhang
    • 1
  • Dongsheng Li
    • 1
  • Songzhu Mei
    • 1
    Email author
  1. 1.National University of Defense TechnologyChangshaChina

Personalised recommendations