Abstract
Parallel computing of 3D Discrete Element Method (DEM) simulations can be achieved in different modes, and two of them are pure MPI and hybrid MPI-OpenMP. The hybrid MPI-OpenMP mode allows flexibly combined mapping schemes on contemporary multiprocessing supercomputers. This paper profiles computational components and floating-point operation features of complex-shaped 3D DEM, develops a space decomposition-based MPI parallelism and various thread-based OpenMP parallelism, and carries out performance comparison and analysis from intranode to internode scales across four orders of magnitude of problem size (namely, number of particles). The influences of memory/cache hierarchy, processes/threads pinning, variation of hybrid MPI-OpenMP mapping scheme, ellipsoid versus poly-ellipsoid are carefully examined. It is found that OpenMP is able to achieve high efficiency in interparticle contact detection, but the unparallelizable code prevents it from achieving the same high efficiency for overall performance; pure MPI achieves not only lower computational granularity (thus higher spatial locality of particles) but also lower communication granularity (thus faster MPI transmission) than hybrid MPI-OpenMP using the same computational resources; the cache miss rate is sensitive to the memory consumption shrinkage per processor, and the last level cache contributes most significantly to the strong superlinear speedup among all of the three cache levels of modern microprocessors; in hybrid MPI-OpenMPI mode, as the number of MPI processes increases (and the number of threads per MPI processes decreases accordingly), the total execution time decreases, until the maximum performance is obtained at pure MPI mode; the processes/threads pinning on NUMA architectures improves performance significantly when there are multiple threads per process, whereas the improvement becomes less pronounced when the number of threads per process decreases; both the communication time and computation time increase substantially from ellipsoids to poly-ellipsoids. Overall, pure MPI outperforms hybrid MPI-OpenMP in 3D DEM modeling of ellipsoidal and poly-ellipsoidal particles.
Similar content being viewed by others
References
Baugh JW Jr, Konduri R (2001) Discrete element modelling on a cluster of workstations. Eng Comput 17(1):1–15
Chandra R (2001) Parallel programming in OpenMP. Morgan kaufmann, Burlington
Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174
Dagum L, Enon R (1998) Openmp: an industry standard api for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55
Delaney GW, Cleary PW, Sinnott MD, Morrison RD (2010) Novel application of DEM to modelling comminution processes. In: IOP conference series: materials science and engineering, vol 10. IOP Publishing, p 012099
Drosinos N, Koziris N (2004) Performance comparison of pure MPI vs hybrid MPI-Openmp parallelization models on SMP clusters. In: Parallel and distributed processing symposium, 2004. Proceedings. 18th International. IEEE, p 15
Grest GS, Dünweg B, Kremer K (1989) Vectorized link cell fortran code for molecular dynamics simulations for a large number of particles. Comput Phys Commun 55(3):269–285
Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message-passing interface, vol 1. MIT press, Cambridge
Henty DS (2000) Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling. In: Proceedings of the 2000 ACM/IEEE conference on supercomputing. IEEE Computer Society, p 10
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397
Jost G, Jin HQ, anMey D, Hatay FF (2003) Comparing the openmp, mpi, and hybrid programming paradigm on an smp cluster. ntrsnasagov
Lim KW, Andrade JE (2014) Granular element method for three-dimensional discrete element calculations. Int J Numer Anal Methods Geomech 38(2):167–188
Luecke G, Weiss O, Kraeva M, Coyle J, Hoekstra J (2010) Performance analysis of pure MPI versus MPI+ OpenMP for jacobi iteration and a 3D FFT on the cray XT5. In: Cray user group 2010 proceedings
Maknickas A, Kačeniauskas A, Kačianauskas R, Balevičius R, Džiugys A (2006) Parallel DEM software for simulation of granular media. Informatica 17(2):207–224
Michael JQ (2003) Parallel programming in C with MPI and OpenMP. McGraw-Hill Press, New York
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP 1(2):331–340
Munjiza A, Andrews K (1998) Nbs contact detection algorithm for bodies of similar size. Int J Numer Methods Eng 43(1):131–149
Ng TT (1994) Numerical simulations of granular soil using elliptical particles. Comput Geotech 16(2):153–169
Ng TT (2004) Triaxial test simulations with discrete element method and hydrostatic boundaries. J Eng Mech 130(10):1188–1194
Pacheco PS (1997) Parallel programming with MPI. Morgan Kaufmann, Burlington
Pal A, Agarwala A, Raha S, Bhattacharya B (2014) Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions. J Parallel Distrib Comput 74(3):2203–2214
Peters JF, Hopkins MA, Kala R, Wahl RE (2009) A polyellipsoid particle for nonspherical discrete element method. Eng Comput 26(6):645–657
Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: 2009 17th euromicro international conference on parallel, distributed and network-based processing. IEEE, pp 427–436
Vedachalam V, Virdee D (2011) Discrete element modelling of granular snow particles using liggghts. M.Sc. University of Edinburgh
Washington DW, Meegoda JN (2003) Micro-mechanical simulation of geotechnical problems using massively parallel computers. Int J Numer Anal Methods Geomech 27(14):1227–1234
Wellmann C, Lillie C, Wriggers P (2008) A contact detection algorithm for superellipsoids based on the common-normal concept. Eng Comput 25(5):432–442
Williams JR, Pentland AP (1992) Superquadrics and modal dynamics for discrete elements in interactive design. Eng Comput 9(2):115–127
Williams JR, Perkins E, Cook B (2004) A contact algorithm for partitioning n arbitrary sized objects. Eng Comput 21(2/3/4):235–248
Yan B, Regueiro RA (2018a) A comprehensive study of MPI parallelism in three dimensional discrete element method (DEM) simulation of complex-shaped granular particles. Comput Part Mech 5(4):553–577
Yan B, Regueiro R (2018b) Comparison between o(n\(^2\)) and o(n) neighbor search algorithm and its influence on superlinear speedup in parallel discrete element method (DEM) for complex-shaped particles. Eng Comput 35(6):2327–2348
Yan B, Regueiro RA (2018c) Superlinear speedup phenomenon in parallel 3d discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87
Yan B, Regueiro RA, Sture S (2010) Three dimensional ellipsoidal discrete element modeling of granular materials and its coupling with finite element facets. Eng Comput 27(4):519–550
Acknowledgements
We would like to acknowledge the support provided by ONR MURI Grant N00014-11-1-0691, and the DoD High Performance Computing Modernization Program (HPCMP) for granting us the computing resources required to conduct this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, B., Regueiro, R.A. Comparison between pure MPI and hybrid MPI-OpenMP parallelism for Discrete Element Method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comp. Part. Mech. 6, 271–295 (2019). https://doi.org/10.1007/s40571-018-0213-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40571-018-0213-8