Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions

Abduljabbar, Mustafa; Markomanolis, George S.; Ibeid, Huda; Yokota, Rio; Keyes, David

doi:10.1007/978-3-319-58667-0_5

Mustafa Abduljabbar ORCID: orcid.org/0000-0003-1280-130X¹⁹,
George S. Markomanolis²⁰,
Huda Ibeid¹⁹,
Rio Yokota²¹ &
…
David Keyes¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

International Conference on High Performance Computing

2231 Accesses
6 Citations

Abstract

Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
ExaFMM is an open-source code base to utilize fast multipole algorithms, in parallel, and with GPU capability. Algorithms pertaining to partitioning and communication reduction are all available on the public repository https://github.com/exafmm/exafmm.

References

Hoefler, T., Siebert, C., Lumsdaine, A.: Scalable communication protocols for dynamic sparse data exchange. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP 2010, pp. 159–168. ACM, New York (2010)
Google Scholar
Appel, A.W.: An efficient program for many-body simulation. SIAM J. Sci. Stat. Comput. 6(1), 85–103 (1985)
Article MathSciNet Google Scholar
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)
Article MathSciNet Google Scholar
Beatson, R., Greengard, L.: A short course on fast multipole methods. Wavelets Multilevel Methods Elliptic PDEs 1, 1–37 (1997)
MathSciNet MATH Google Scholar
Lu, B., Cheng, X., Huang, J., McCammon, J.A.: Order \(N\) algorithm for computation of electrostatic interactions in biomolecular systems. Proc. Natl. Acad. Sci. 103(51), 19314–19319 (2006)
Article Google Scholar
Yokota, R., Bardhan, J.P., Knepley, M.G., Barba, L.A., Hamada, T.: Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput. Phys. Commun. 182(6), 1272–1283 (2011)
Article Google Scholar
Ohno, Y., Yokota, R., Koyama, H., Morimoto, G., Hasegawa, A., Masumoto, G., Okimoto, N., Hirano, Y., Ibeid, H., Narumi, T., et al.: Petascale molecular dynamics simulation using the fast multipole method on K computer. Comput. Phys. Commun. 185(10), 2575–2585 (2014)
Article Google Scholar
Rui, P., Chen, R.: An efficient sparse approximate inverse preconditioning for FMM implementation. Microw. Opt. Technol. Lett. 49(7), 1746–1750 (2007)
Article Google Scholar
Bédorf, J., Gaburov, E., Zwart, S.P.: A sparse octree gravitational \(N\)-body code that runs entirely on the GPU processor. J. Comput. Phys. 231(7), 2825–2839 (2012)
Article MathSciNet Google Scholar
Price, D., Monaghan, J.: An energy-conserving formalism for adaptive gravitational force softening in smoothed particle hydrodynamics and \(N\)-body codes. Mon. Not. R. Astron. Soc. 374(4), 1347–1358 (2007)
Article Google Scholar
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
Google Scholar
Warren, M.S., Salmon, J.K.: A fast tree code for many-body problems. Los Alamos Sci. 22(10), 88–97 (1994)
Google Scholar
Bédorf, J., Gaburov, E., Fujii, M.S., Nitadori, K., Ishiyama, T., Portegies Zwart, S.: 24.77 Pflops on a gravitational tree-code to simulate the Milky Way Galaxy with 18600 GPUs. In: Proceedings of the 2014 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2014)
Google Scholar
Speck, R., Ruprecht, D., Krause, R., Emmett, M., Minion, M., Winkel, M., Gibbon, P.: A massively space-time parallel \(N\)-body solver. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 92. IEEE Computer Society Press (2012)
Google Scholar
Winkel, M., Speck, R., Hubner, H., Arnold, L., Krause, R., Gibbon, P.: A massively parallel, multi-disciplinary barnes-hut tree code for extreme-scale \(N\)-body simulations. Comput. Phys. Commun. 183(4), 880–889 (2012)
Article MathSciNet Google Scholar
Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., Biros, G.: A massively parallel adaptive fast multipole method on heterogeneous architectures. Commun. ACM 55(5), 101–109 (2012)
Article Google Scholar
Zandifar, M., Abdul Jabbar, M., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ser. ICS 2015, pp. 415–424. ACM, New York (2015)
Google Scholar
AbdulJabbar, M., Yokota, R., Keyes, D.: Asynchronous execution of the fast multipole method using charm++. arXiv preprint arXiv:1405.7487 (2014)
Salmon, J.K.: Parallel hierarchical N-body methods. Ph.D. dissertation, California Institute of Technology (1991)
Google Scholar
Warren, M.S., Salmon, J.K.: A parallel hashed oct-tree \(N\)-body algorithm. In: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 12–21. ACM (1993)
Google Scholar
Makino, J.: A fast parallel treecode with GRAPE. Publ. Astron. Soc. Jpn. 56, 521–531 (2004)
Article Google Scholar
Solomonik, E., Kalé, L.V.: Highly scalable parallel sorting. In: Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2010)
Google Scholar
Haverkort, H.: An inventory of three-dimensional Hilbert space-filling curves. arXiv preprint arXiv:1109.2323 (2011)
Dubinski, J.: A parallel tree code. New Astron. 1, 133–147 (1996)
Article Google Scholar
Warren, M.S., Salmon, J.K.: Astrophysical \(N\)-body simulations using hierarchical tree data structures. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, ser. Supercomputing 1992, pp. 570–576. IEEE Computer Society Press, Los Alamitos (1992)
Google Scholar
Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., Biros, G.: A massively parallel adaptive fast multipole method on heterogeneous architectures. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009)
Google Scholar
Teng, S.-H.: Provably good partitioning and load balancing algorithms for parallel adaptive \(N\)-body simulation. SIAM J. Sci. Comput. 19(2), 635–656 (1998)
Article MathSciNet Google Scholar
Yokota, R., Turkiyyah, G., Keyes, D.: Communication complexity of the fast multipole method and its algebraic variants. Supercomput. Front. Innov.: Int. J. 1(1), 63–84 (2014)
Google Scholar
Malhotra, D., Biros, G.: PVFMM: a parallel kernel independent fmm for particle and volume potentials. Commun. Comput. Phys. 18(3), 808–830 (2015)
Article MathSciNet Google Scholar

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant-in-Aid for Young Scientists A Grant Number 16H05859. This work is partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan. The authors are grateful to the KAUST Supercomputing Laboratory for the use of the Shaheen XC40 system.

Author information

Authors and Affiliations

Extreme Computing Research Center (ECRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Mustafa Abduljabbar, Huda Ibeid & David Keyes
KAUST Supercomputing Laboratory (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
George S. Markomanolis
Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology (TITECH), Tokyo, Japan
Rio Yokota

Authors

Mustafa Abduljabbar
View author publications
You can also search for this author in PubMed Google Scholar
George S. Markomanolis
View author publications
You can also search for this author in PubMed Google Scholar
Huda Ibeid
View author publications
You can also search for this author in PubMed Google Scholar
Rio Yokota
View author publications
You can also search for this author in PubMed Google Scholar
David Keyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mustafa Abduljabbar .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
Argonne National Laboratory, Argonne, IL, USA
Pavan Balaji
KAUST, Thuwal, Saudi Arabia
David Keyes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abduljabbar, M., Markomanolis, G.S., Ibeid, H., Yokota, R., Keyes, D. (2017). Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-58667-0_5
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics