A Latency-Hiding Scheme for Adjacent Interaction Simulation on Multi-core/Many-Core Clusters

  • Chen Li-li
  • Li Wei
  • Zhang Jing
  • Shi Shuai
  • Huang Jian-xin
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 402)


As the processor has entered the multi-core/many-core era, the parallel processing capability of a single processor can be improved in scale with increasing cores. However, for the high performance computation (HPC) clusters, the improvement of inter-node communication latency is far behind of the performance improvement of processors. As a result, communication latency often becomes the performance bottleneck of most HPC applications. This paper focuses on solving the communication latency problem of adjacent inter-action simulation on multi-core/many-core clusters, and pro-poses an optimized algorithm for adjacent interaction simula-tion on modern general purpose graphic many-core architec-tures and an O(B+2R) algorithm for inter-node latency-hiding. The theoretical analysis and experimental result show that the techniques proposed in this paper can effectively improve the performance of adjacent interaction simulation on multi-core/many-core clusters.


adjacent interaction simulation communication latency-hiding parallel processing multi-core/many-core cluster 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tu, B.B., Zou, M., Zhan, J.F., Zhao, X.F., Fan, J.: Research on Parallel Computation Model with Memory Hierarchy on Multi-Core Clusters. Chinese Journal of Computers 11 (2008) (in Chinese with English abstract)Google Scholar
  2. 2.
    Tang, Y.: Research on the Communication Problem of Large Scale Scientific Computing on High Performance Cluster Environment (Ph.D. Thesis). Chinese Academy of Sciences, Beijing (2004) (in Chinese with English abstract)Google Scholar
  3. 3.
    Chen, S., Doolean, G.D.: Lattice Boltzman Method for Fluid Flows. Annual Review of Fluid Mechanics 30, 329–364 (1998)CrossRefGoogle Scholar
  4. 4.
    Stam, J.: Stable Fluids. In: Proceedings of SIGGRAPH 1999, pp. 121–128 (1999)Google Scholar
  5. 5.
    Harris, M.J.: Fast Fluid Dynamics Simulation on the GPU. In: Fernando, R. (ed.) GPU Gems, pp. 637–665. Addison-Wesley (2004)Google Scholar
  6. 6.
    North, M.J., et al.: Experiences Creating Three Implementations of the Repast Agent Modeling Toolkit. ACM Transactions on Modeling and Computer Simulation 16, 1–25 (2006)CrossRefGoogle Scholar
  7. 7.
    Walter, B., et al.: UAV Swarm Control: Calculating Digital Phermone Fields with the GPU. In: Interservice/Industry Training, Simulation and Education Conference (IITSEC), Orlando, FL (2005)Google Scholar
  8. 8.
    Uhrmacher, A.M., Gugler, K.: Distributed, parallel simulation of multiple, deliberative agents. In: Proceedings of the Fourteenth Workshop on Parallel and Distributed Simulation, Bologna, Italy (2000)Google Scholar
  9. 9.
    Chaturvedi, A., et al.: Bridging Kinetic and Non-kinetic Interactions over Time and Space Continua. In: Interservice/Industry Training, Simulation and Education Conference, Orlando, FL, USA (2005)Google Scholar
  10. 10.
    Parker, J.: A Flexible, Large-scale, Distributed Agent-based Epidemic Model. In: Winter Simulation Conference, Piscataway, NJ (2007)Google Scholar
  11. 11.
    Armstrong, R.C., et al.: Parallel Computing in Enterprise Modeling. Sandia National Laboratory. Techincal Report SAND2008-6172, 2008/08/01 (2008)Google Scholar
  12. 12.
    Aaby, B.G., Perumalla, K.S., Seal, S.K.: Efficient Simulation of Agent-Based Models on Multi-GPU and Multi-Core Clusters. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, vol. (29), ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels (2010)Google Scholar
  13. 13.
    Krishnamoorthy, S., et al.: Effective Automatic Parallelization of Stencil Computations. In: Programming Languages Design and Implementation (PLDI), San Diego, California, USA (2007)Google Scholar
  14. 14.
    Datta, K., et al.: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: Supercomputing, Austin, Texas (2008)Google Scholar
  15. 15.
    NVIDIA Corporation. NVIDIA CUDA SDK code samples,
  16. 16.
    Khronos. Opencl - the open standard for parallel programming of heterogeneous systems,
  17. 17.
    Fujimoto, R.M.: Parallel and Distributed Simulation Systems. John Wiley&Sons, Inc. (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Chen Li-li
    • 1
  • Li Wei
    • 1
  • Zhang Jing
    • 1
  • Shi Shuai
    • 1
  • Huang Jian-xin
    • 1
  1. 1.Science and Technology on Complex System Simulation LaboratoryBeijingChina

Personalised recommendations