Skip to main content

SimK: A Large-Scale Parallel Simulation Engine


Simulation is an important method to evaluate future computer systems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computer system simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputer system and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server.

This is a preview of subscription content, access via your institution.


  1. Chidester M, George A. Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul., 2002, 12(3): 176–200.

    Article  Google Scholar 

  2. Penry D, Fay D, Hodgdon D, Wells R, Schelle G, August D, Connors D. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proc. The Twelfth International Symposium on High-Performance Computer Architecture, Austin, USA, Feb. 11–15, 2006, pp.29–40.

  3. Zheng G, Kakulapati G, Kale L. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In Proc. 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, April 26–30, 2004, p.786.

  4. Prakash S, Bagrodia R L. Mpi-sim: Using parallel simulation to evaluate mpi programs. In Proc. the 30th Conference on Winter Simulation (WSC 1998), Los Alamitos, USA, Dec. 13–16, 1998, pp.467–474.

  5. Sharma A, Nguyen A T, Torrellas J. Augmint: A multiprocessor simulation environment for Intel x86 architectures. Technical report, University of Illinois at Urbana-Champaign, 1996.

  6. Austin T, Larson E, Ernst D. Simplescalar: An infrastructure for computer system modeling. Computer, 2002, 35(2): 59–67.

    Article  Google Scholar 

  7. Vachharajani M, Vachharajani N, Penry D, Blome J, August D. Microarchitectural exploration with liberty. In Proc. 35th Annual International Symposium on Microarchitecture, (MICRO-35), Istanbul, Turkey, Nov. 18–22, 2002, pp.271–282.

  8. Garey M R, Johnson D S, Stockmeyer L. Some simplified np-complete problems. In Proc. the Sixth Annual ACM Symposium on Theory of Computing (STOC 1974), New York, USA, April 30–May 2, 1974, pp.47–63.

  9. Hendrickson B, Leland R. The chaco user's guide: Version 2.0. Technical Report, Sandia National Laboratories, 1994.

  10. Fujimoto R M. Parallel discrete event simulation. Commun. ACM, 1990, 33(10): 30–53.

    Article  Google Scholar 

  11. Francis J A, Berkbigler K, Booker G, Bush B, Davis K, Hoisie A. An approach to extreme-scale simulation of novel architectures. In Proc. the Conference on Systemics, Cybernetics, and Informatics (SCI 2002), Orlando, Florida, July 14–18, 2002, Poster.

  12. Rao D, Wilsey P. An ultra-large scale simulation framework. Journal of Parallel and Distributed Computing, 2002, 62(11): 1670–1693.

    MATH  Article  Google Scholar 

  13. Sharma G D, Radhakrishnan R, Rajasekaran U K V, Abu-Ghazaleh N, Wilsey P A. Time warp simulation on clumps. In Proc. the Thirteenth Workshop on Parallel and Distributed Simulation (PADS 1999), Atlanta, USA, May 1–4, 1999, pp.174–181.

  14. Perumalla K. μsik -a micro-kernel for parallel/distributed simulation systems. In Proc. Workshop on Principles of Advanced and Distributed Simulation (PADS), Monterey, USA, June 1–3, 2005, pp.59–68.

  15. Wu M, Li X F. Task-pushing: A scalable parallel gc marking algorithm without synchronization operations. In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, USA, March 26–30, 2007, pp.1–10.

  16. Fujimoto R M, Panesar K S. Buffer management in shared-memory time warp systems. SIGSIM Simul. Dig., 1995, 25(1): 149–156.

    Article  Google Scholar 

  17. Boukerche A, Das S K. Reducing null messages overhead through load balancing in conservative distributed simulation systems. J. Parallel Distrib. Comput., 2004, 64(3): 330–344.

    MATH  Article  Google Scholar 

  18. Glazer D, Tropper C. On process migration and load balancing in time warp. IEEE Transactions on Parallel and Distributed Systems, Mar. 1993, 4(3): 318–327.

    Article  Google Scholar 

  19. Blumofe R D, Leiserson C E. Scheduling multithreaded computations by work stealing. J. ACM, 1999, 46(5): 720–748.

    MATH  Article  MathSciNet  Google Scholar 

  20. Boukerche A, Das S K. Dynamic load balancing strategies for conservative parallel simulations. SIGSIM Simul. Dig., 1997, 27(1): 20–28.

    Article  Google Scholar 

  21. Boukerche A, Das S K. Null messages cancellation through load balancing in distributed simulations. In Proc. the 5th International Euro-Par Conference on Parallel Processing (Euro-Par 1999), London, UK, Aug. 31–Sept. 3, 1999, pp.562–569.

  22. Jiang M R, Shieh S P, Liu C L. Dynamic load balancing in parallel simulation using time warp mechanism. In Proc. International Conference on Parallel and Distributed Systems, Hsinchu, China, Dec. 19–21, 1994, pp.222–227.

  23. Arora N S, Blumofe R D, Plaxton C G. Thread scheduling for multiprogrammed multiprocessors. In Proc. the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 1998), Puerto Vallarta, Mexico, June 28–July 2, 1998, pp.119–129.

  24. Calder B, Krintz C, John S, Austin T. Cache-conscious data placement. SIGPLAN Not., 1998, 33(11): 139–149.

    Article  Google Scholar 

  25. Mellor-Crummey J M, Scott M L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 1991, 9(1): 21–65.

    Article  Google Scholar 

  26. Sun N, Li K, Chen M, Hpp: An architecture for high performance and utility computing. Chinese Journal of Computer, 2008, 31(9): 1503–1508.

    Article  Google Scholar 

  27. Ceze L, Strauss K, Almasi G, Bohrer P, Brunheroto J, Cascaval C, Castanos J, Lieber D, Martorell X, Moreira J, Sanomiya A, Schenfeld E. Full circle: Simulating Linux clusters on Linux clusters. In Proc. the Fourth LCI International Conference on Linux Clusters: The HPC Revolution 2003, 2003.

  28. Mukherjee S. Wisconsin Wind Tunnel II: A fast and portable parallel architecture simulator. Workshop on Performance Analysis and Its Impact on Design, 1997.

  29. Presley M, Reiher P, Bellenot S. A time warp implementation of sharks world. In Proc. Winter Simulation Conference, New Orleans, USA, Dec. 9–12, 1990, pp.199–203.

  30. Jefferson D R. Virtual time. ACM Trans. Program. Lang. Syst., 1985, 7(3): 404–425.

    Article  MathSciNet  Google Scholar 

  31. High Level Architecture Interface Specification Version 1.3 2, U. D. Defense, 1998.

  32. ssfnet. March 1, 2009,

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ming-Yu Chen.

Additional information

Supported by the National Natural Science Foundation of China under Grant No. 60633040, the National High Technology Research and Development 863 Program of China under Grant Nos. 2006AA01A102 and 2007AA01Z115.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Xu, JW., Chen, MY., Zheng, G. et al. SimK: A Large-Scale Parallel Simulation Engine. J. Comput. Sci. Technol. 24, 1048 (2009).

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI:


  • large scale system simulation
  • fine-grained synchronization
  • simulation framework
  • lock-free synchronization