Advertisement

Supercomputer in a Laptop: Distributed Application and Runtime Development via Architecture Simulation

  • Samuel KnightEmail author
  • Joseph P. Kenny
  • Jeremiah J. Wilke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)

Abstract

Architecture simulation can aid in predicting and understanding application performance, particularly for proposed hardware or large system designs that do not exist. In network design studies for high-performance computing, most simulators focus on the dominant message passing (MPI) model. Currently, many simulators build and maintain their own simulator-specific implementations of MPI. This approach has several drawbacks. Rather than reusing an existing MPI library, simulator developers must implement all semantics, collectives, and protocols. Additionally, alternative runtimes like GASNet cannot be simulated without again building a simulator-specific version. It would be far more sustainable and flexible to maintain lower-level layers like uGNI or IB-verbs and reuse the production runtime code. Directly building and running production communication runtimes inside a simulator poses technical challenges, however. We discuss these challenges and show how they are overcome via the macroscale components for the Structural Simulation Toolkit (SST), leveraging a basic source-to-source tool to automatically adapt production code for simulation. SST is able to encapsulate and virtualize thousands of MPI ranks in a single simulator process, providing a “supercomputer in a laptop” environment. We demonstrate the approach for the production GASNet runtime over uGNI running inside SST. We then discuss the capabilities enabled, including investigating performance with tunable delays, deterministic debugging of race conditions, and distributed debugging with serial debuggers.

Notes

Acknowledgment

This work was funded by Sandia National Laboratories, which is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) under contract DE-NA-0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

References

  1. 1.
    Groves, T., et al.: (SAI) Stalled, Active and Idle: characterizing power and performance of large-scale dragonfly networks. In: IEEE International Conference on Cluster Computing (CLUSTER) 2016, pp. 50–59 (2016)Google Scholar
  2. 2.
    Hoefler, T., et al.: sPIN: high-performance streaming processing in the network. In: SC 2017: International Conference for High Performance Computing, Networking, Storage and Analysis (2017)Google Scholar
  3. 3.
    Bonachea, D.: Gasnet specification, v1.1, Berkeley, CA, USA, Technical report (2002)Google Scholar
  4. 4.
    Barrett, B., et al.: The Portals 4.0.2 Network Programming Interface. Technical report SAND2014-19568Google Scholar
  5. 5.
    Graham, R., et al.: Open MPI: a high performance, flexible implementation of MPI point-to-point communications. Parallel Process. Lett. 17(01), 79–88 (2007)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Gropp, William: MPICH2: a new start for MPI implementations. In: Kranzlmüller, Dieter, Volkert, Jens, Kacsuk, Peter, Dongarra, Jack (eds.) EuroPVM/MPI 2002. LNCS, vol. 2474, pp. 7–7. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45825-5_5CrossRefGoogle Scholar
  7. 7.
    Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)Google Scholar
  8. 8.
    Zheng, Y., et al.: UPC++: A PGAS extension for C++. In: International Parallel and Distributed Processing Symposium (2014)Google Scholar
  9. 9.
    Knüpfer, A., et al.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir, pp. 79–91, January 2012Google Scholar
  10. 10.
    Wilke, J., Kenny, J.: SST/macro GitHub. https://github.com/sstsimulator/sst-macro (2016)
  11. 11.
    Jain, N., et al.: Evaluating HPC networks via simulation of parallel workloads. In: SC 2016: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 154–165 (2016)Google Scholar
  12. 12.
    Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: simulating large-scale applications in the LogGOPS model. In: HPDC, pp. 597–604 (2010)Google Scholar
  13. 13.
    Degomme, A., Legrand, A., Markomanolis, G.S., Quinson, M., Stillwell, M., Suter, F.: Simulating MPI applications: the SMPI approach. IEEE Trans. Parallel Distrib. Sys. 28(8), 2387–2400 (2017)CrossRefGoogle Scholar
  14. 14.
    Jiang, N., et al.: A detailed and flexible cycle-accurate network-on-chip simulator. In: ISPASS, pp. 86-96 (2013)Google Scholar
  15. 15.
    Mubarak, M., Carothers, C.D., Ross, R.B., Carns, P.: A case study in using massively parallel simulation for extreme-scale torus network codesign. In: SIGSIM PADS (2014)Google Scholar
  16. 16.
    Fujimoto, R.M.: Parallel discrete event simulation. Commun. ACM 33, 30–53 (1990)CrossRefGoogle Scholar
  17. 17.
    Clang 5.0 Download (2018). http://releases.llvm.org/download.html
  18. 18.
    SST/macro GitHub Repository. https://github.com/sstsimulator/sst-macro
  19. 19.
    GASNet 1.30 website (2018). https://bitbucket.org/berkeleylab/gasnet
  20. 20.
    Kumar, S., Sun, Y., Kale, L.V.: Acceleration of an asynchronous message driven programming paradigm on IBM Blue Gene/Q. In: IPDPS (2013)Google Scholar
  21. 21.
    Sato, K., et al.: Clock delta compression for scalable order-replay of non-deterministic parallel applications. In: SC 2015: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 62:1–62:12 (2015)Google Scholar
  22. 22.
    Hunold, S., Carpen-Amarie, S., Träff, J.L.: Reproducible MPI micro-benchmarking isn’t as easy as you think. In: EuroMPI/ASIA (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Samuel Knight
    • 1
    Email author
  • Joseph P. Kenny
    • 1
  • Jeremiah J. Wilke
    • 1
  1. 1.Sandia National LaboratoriesLivermoreUSA

Personalised recommendations