Parallelization of MD Algorithms and Load Balancing

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


MD simulation in process engineering features enormous computational demands, and therefore requires efficient parallelization techniques. This chapter describes ls1 mardyn ’s parallelization approach for shared-memory and distributed-memory architectures. This is done by first defining today’s computing architectures and their governing design principles: Heterogeneity, massive amounts of cores and data parallelism. Based on this, we are then able to reengineer ls1 mardyn in such a way that it can optimally leverage important hardware features, and describe our parallelization approach for shared- and distributed-memory systems at the example of the Intel Xeon processor and the Intel Xeon Phi coprocessor, respectively. We close this section by describing load-balancing techniques in case of a distributed-memory parallelization and heterogeneous particle distributions in the computational domain.


Molecular dynamics simulation Shared-memory parallelization  Distributed-memory parallelization MPI OpenMP Load-balancing KD-trees Spatial decomposition 


  1. 1.
    S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)CrossRefzbMATHGoogle Scholar
  2. 2.
    H. Meuer, E. Strohmaier, J. Dongarra, H. Simon, Top500 list (2013), Accessed 23 June 2013
  3. 3.
    J.L. Hennessy, D.A. Patterson, Computer Architecture—A Quantitative Approach, 5th edn. (Morgan Kaufmann, San Francisco, 2012)zbMATHGoogle Scholar
  4. 4.
    E. Rotenberg, S. Bennett, J. Smith, Trace cache: a low latency approach to high bandwidth instruction fetching, in Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-29, pp. 24–34 (1996)Google Scholar
  5. 5.
    OpenMP Architecture Review Board, OpenMP Application Program Interface Version 3 (2008)Google Scholar
  6. 6.
    J. Reinders, Intel threading building blocks, 1st edn. (O’Reilly & Associates Inc., Sebastopol, 2007)Google Scholar
  7. 7.
    Intel Cooperation, Intel(R) MPI Library for Linux OS, Version 4.1. Update 1 (2013)Google Scholar
  8. 8.
    S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni, D.K.D. Panda, MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’13 (ACM, New York, 2013), pp. 1–11Google Scholar
  9. 9.
    A. Heinecke, Boosting scientific computing applications through leveraging data parallel architectures. Ph.D. thesis, Institut für Informatik, Technische Universität München, 2014. Dissertation available from publishing house Dr. Hut under ISBN: 978-3-8439-1408-6Google Scholar
  10. 10.
    W. Smith, A replicated data molecular dynamics strategy for the parallel Ewald sum. Comput. Phys. Commun. 67(3), 392–406 (1992)CrossRefGoogle Scholar
  11. 11.
    R.K. Kalia, S. de Leeuw, A. Nakano, P. Vashishta, Molecular-dynamics simulations of Coulombic systems on distributed-memory MIMD machines. Comput. Phys. Commun. 74(3), 316–326 (1993)CrossRefGoogle Scholar
  12. 12.
    Y. Liu, C. Hu, C. Zhao, Efficient parallel implementation of Ewald summation in molecular dynamics simulations on multi-core platforms. Comput. Phys. Commun. 182(5), 1111–1119 (2011)CrossRefzbMATHGoogle Scholar
  13. 13.
    M. Kunaseth, D. F. Richards, J. N. Glosli, R.K. Kalia, A. Nakano, P. Vashishta, Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics. J. Supercomput. 1–25 (2013)Google Scholar
  14. 14.
    M. Buchholz, Framework zur Parallelisierung von Molekulardynamiksimulationen in verfahrenstechnischen Anwendungen. Dissertation, Institut für Informatik, Technische Universität München, 2010Google Scholar
  15. 15.
    J.A. Anderson, C.D. Lorenz, A. Travesset, General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 227, 5342–5359 (2008)CrossRefzbMATHGoogle Scholar
  16. 16.
    D.C. Rapaport, Enhanced molecular dynamics performance with a programmable graphics processor. Comput. Phys. Commun. 182(4), 926–934 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    J.E. Stone, J.C. Phillips, P.L. Freddolino, D.J. Hardy, L.G. Trabuco, K. Schulten, Accelerating molecular modeling applications with graphics processors. J. Comput. Chem. 28, 2618–2640 (2007)CrossRefGoogle Scholar
  18. 18.
    J. van Meel, A. Arnold, D. Frenkel, S. Portegies Zwart, R. Belleman, Harvesting graphics power for MD simulations. Mol. Simul. 34(3), 259–266 (2008)CrossRefGoogle Scholar
  19. 19.
    K.J. Bowers, R.O. Dror, D.E. Shaw, Zonal methods for the parallel execution of range-limited N-body simulations. J. Comput. Phys. 221(1), 303–329 (2007)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Intel CorporationSanta ClaraUSA
  2. 2.Technische Universität MünchenGarchingGermany
  3. 3.University of KaiserslauternKaiserslauternGermany

Personalised recommendations