A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies

  • Christopher D. Krieger
  • Michelle Mills Strout
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)


Graph partitioners play an important role in many parallel work distribution and locality optimization approaches. Surprisingly, however, to our knowledge there is no freely available parallel graph partitioner designed for execution on a shared memory multicore system. This paper presents a shared memory parallel graph partitioner, ParCubed, for use in the context of sparse tiling run-time data and computation reordering. Sparse tiling is a run-time scheduling technique that schedules groups of iterations across loops together when they access the same data and one or more of the loops contains indirect array accesses. For sparse tiling, which is implemented with an inspector/executor strategy, the inspector needs to find an initial seed partitioning of adequate quality very quickly. We compare our presented hierarchical clustering partitioner, ParCubed, with GPart and METIS in terms of partitioning speed, partitioning quality, and the effect the generated seed partitions have on executor speed. We find that the presented partitioner is 25 to 100 times faster than METIS on a 16 core machine. The total edge cut of the partitioning generated by ParCubed was found not to exceed 1.27x that of the partitioning found by METIS.


inspector/executor strategies graph partitioning irregular applications sparse tiling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, R.J., Woll, H.: Wait-free parallel algorithms for the union-find problem. In: Proceedings of the Twenty-Third Annual ACM Symposium on Theory of Computing, STOC 1991, pp. 370–380. ACM, New York (1991)CrossRefGoogle Scholar
  2. 2.
    I. Berman. Multicore programming in the face of metamorphosis: Union-find as an example. Master’s thesis, Tel-Aviv University, July 2010.Google Scholar
  3. 3.
    Chevalier, C., Pellegrini, F.: PT-Scotch: A tool for efficient parallel graph ordering. Parallel Comput. 34(6-8), 318–331 (2008)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cybenko, G., Allen, T.G., Polito, J.E.: Practical parallel union-find algorithms for transitive closure and clustering. Int. J. Parallel Program 17(5), 403–423 (1989)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiss, C.: Cache optimization for structured and unstructured grid multigrid. Electronic Tranactions on Numerical Analysis 10, 21–40 (2000)zbMATHGoogle Scholar
  6. 6.
    Han, H., Tseng, C.-W.: A Comparison of Locality Transformations for Irregular Codes. In: Dwarkadas, S. (ed.) LCR 2000. LNCS, vol. 1915, pp. 70–84. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Han, H., Tseng, C.-W.: Improving Locality for Adaptive Irregular Scientific Codes. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 173–188. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing 1996, IEEE Computer Society, Washington, DC (1996)Google Scholar
  9. 9.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 36:1–36:12. ACM, New York (2009)Google Scholar
  11. 11.
    Strout, M.M., Carter, L., Ferrante, J., Freeman, J., Kreaseck, B.: Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 90–110. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Strout, M.M., Carter, L., Ferrante, J., Kreaseck, B.: Sparse tiling for stationary iterative methods. International Journal of High Performance Computing Applications 18(1), 95–114 (2004)CrossRefGoogle Scholar
  13. 13.
    Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Walshaw, C., Cross, M.: Parallel optimisation algorithms for multilevel mesh partitioning. Parallel Comput. 26(12), 1635–1660 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    B. Wu, E. Z. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, pp. 243–252. IEEE Computer Society, Washington, DC (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Christopher D. Krieger
    • 1
  • Michelle Mills Strout
    • 1
  1. 1.Colorado State UniversityFort CollinsUSA

Personalised recommendations