Skip to main content

Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

Abstract

Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    ExaFMM is an open-source code base to utilize fast multipole algorithms, in parallel, and with GPU capability. Algorithms pertaining to partitioning and communication reduction are all available on the public repository https://github.com/exafmm/exafmm.

References

  1. Hoefler, T., Siebert, C., Lumsdaine, A.: Scalable communication protocols for dynamic sparse data exchange. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP 2010, pp. 159–168. ACM, New York (2010)

    Google Scholar 

  2. Appel, A.W.: An efficient program for many-body simulation. SIAM J. Sci. Stat. Comput. 6(1), 85–103 (1985)

    Article  MathSciNet  Google Scholar 

  3. Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)

    Article  MathSciNet  Google Scholar 

  4. Beatson, R., Greengard, L.: A short course on fast multipole methods. Wavelets Multilevel Methods Elliptic PDEs 1, 1–37 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Lu, B., Cheng, X., Huang, J., McCammon, J.A.: Order \(N\) algorithm for computation of electrostatic interactions in biomolecular systems. Proc. Natl. Acad. Sci. 103(51), 19314–19319 (2006)

    Article  Google Scholar 

  6. Yokota, R., Bardhan, J.P., Knepley, M.G., Barba, L.A., Hamada, T.: Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput. Phys. Commun. 182(6), 1272–1283 (2011)

    Article  Google Scholar 

  7. Ohno, Y., Yokota, R., Koyama, H., Morimoto, G., Hasegawa, A., Masumoto, G., Okimoto, N., Hirano, Y., Ibeid, H., Narumi, T., et al.: Petascale molecular dynamics simulation using the fast multipole method on K computer. Comput. Phys. Commun. 185(10), 2575–2585 (2014)

    Article  Google Scholar 

  8. Rui, P., Chen, R.: An efficient sparse approximate inverse preconditioning for FMM implementation. Microw. Opt. Technol. Lett. 49(7), 1746–1750 (2007)

    Article  Google Scholar 

  9. Bédorf, J., Gaburov, E., Zwart, S.P.: A sparse octree gravitational \(N\)-body code that runs entirely on the GPU processor. J. Comput. Phys. 231(7), 2825–2839 (2012)

    Article  MathSciNet  Google Scholar 

  10. Price, D., Monaghan, J.: An energy-conserving formalism for adaptive gravitational force softening in smoothed particle hydrodynamics and \(N\)-body codes. Mon. Not. R. Astron. Soc. 374(4), 1347–1358 (2007)

    Article  Google Scholar 

  11. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)

    Google Scholar 

  12. Warren, M.S., Salmon, J.K.: A fast tree code for many-body problems. Los Alamos Sci. 22(10), 88–97 (1994)

    Google Scholar 

  13. Bédorf, J., Gaburov, E., Fujii, M.S., Nitadori, K., Ishiyama, T., Portegies Zwart, S.: 24.77 Pflops on a gravitational tree-code to simulate the Milky Way Galaxy with 18600 GPUs. In: Proceedings of the 2014 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2014)

    Google Scholar 

  14. Speck, R., Ruprecht, D., Krause, R., Emmett, M., Minion, M., Winkel, M., Gibbon, P.: A massively space-time parallel \(N\)-body solver. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 92. IEEE Computer Society Press (2012)

    Google Scholar 

  15. Winkel, M., Speck, R., Hubner, H., Arnold, L., Krause, R., Gibbon, P.: A massively parallel, multi-disciplinary barnes-hut tree code for extreme-scale \(N\)-body simulations. Comput. Phys. Commun. 183(4), 880–889 (2012)

    Article  MathSciNet  Google Scholar 

  16. Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., Biros, G.: A massively parallel adaptive fast multipole method on heterogeneous architectures. Commun. ACM 55(5), 101–109 (2012)

    Article  Google Scholar 

  17. Zandifar, M., Abdul Jabbar, M., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ser. ICS 2015, pp. 415–424. ACM, New York (2015)

    Google Scholar 

  18. AbdulJabbar, M., Yokota, R., Keyes, D.: Asynchronous execution of the fast multipole method using charm++. arXiv preprint arXiv:1405.7487 (2014)

  19. Salmon, J.K.: Parallel hierarchical N-body methods. Ph.D. dissertation, California Institute of Technology (1991)

    Google Scholar 

  20. Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree \(N\)-body algorithm. In: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 12–21. ACM (1993)

    Google Scholar 

  21. Makino, J.: A fast parallel treecode with GRAPE. Publ. Astron. Soc. Jpn. 56, 521–531 (2004)

    Article  Google Scholar 

  22. Solomonik, E., Kalé, L.V.: Highly scalable parallel sorting. In: Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2010)

    Google Scholar 

  23. Haverkort, H.: An inventory of three-dimensional Hilbert space-filling curves. arXiv preprint arXiv:1109.2323 (2011)

  24. Dubinski, J.: A parallel tree code. New Astron. 1, 133–147 (1996)

    Article  Google Scholar 

  25. Warren, M.S., Salmon, J.K.: Astrophysical \(N\)-body simulations using hierarchical tree data structures. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, ser. Supercomputing 1992, pp. 570–576. IEEE Computer Society Press, Los Alamitos (1992)

    Google Scholar 

  26. Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., Biros, G.: A massively parallel adaptive fast multipole method on heterogeneous architectures. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009)

    Google Scholar 

  27. Teng, S.-H.: Provably good partitioning and load balancing algorithms for parallel adaptive \(N\)-body simulation. SIAM J. Sci. Comput. 19(2), 635–656 (1998)

    Article  MathSciNet  Google Scholar 

  28. Yokota, R., Turkiyyah, G., Keyes, D.: Communication complexity of the fast multipole method and its algebraic variants. Supercomput. Front. Innov.: Int. J. 1(1), 63–84 (2014)

    Google Scholar 

  29. Malhotra, D., Biros, G.: PVFMM: a parallel kernel independent fmm for particle and volume potentials. Commun. Comput. Phys. 18(3), 808–830 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant-in-Aid for Young Scientists A Grant Number 16H05859. This work is partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan. The authors are grateful to the KAUST Supercomputing Laboratory for the use of the Shaheen XC40 system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Abduljabbar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Abduljabbar, M., Markomanolis, G.S., Ibeid, H., Yokota, R., Keyes, D. (2017). Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58667-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58666-3

  • Online ISBN: 978-3-319-58667-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics