Abstract
Achieving the performance potential of an Exascale machine depends on realizing both operational efficiency and scalability in high performance computing applications. This requirement has motivated the emergence of several new programming models which emphasize fine and medium grain task parallelism in order to address the aggravating effects of asynchrony at scale. The performance modeling of Exascale systems for these programming models requires the development of fundamentally new approaches due to the demands of both scale and complexity. This work presents a performance modeling case study of the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH) proxy application where the performance modeling approach has been incorporated directly into a runtime system with two modalities of operation: computation and performance modeling simulation. The runtime system exposes performance sensitivies and projects operation to larger scales while also realizing the benefits of removing global barriers and extracting more parallelism from LULESH. Comparisons between the computation and performance modeling simulation results are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Almost all messages were under 32K for our HPX-5 port of the LULESH application.
References
Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical Report LLNL-TR-490254
Livermore unstructured lagrangian explicit shock hydrodynamics (lulesh). https://codesign.llnl.gov/lulesh.php
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP modelone step closer towards a realistic model for parallel computation. In: Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1995, pp. 95–105. ACM, New York, NY, USA (1995)
Anderson, M., Brodowicz, M., Kulkarni, A., Sterling, T.: Performance modeling of gyrokinetic toroidal simulations for amany-tasking runtime system. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. LNCS, pp. 136–157. Springer, Heidelberg (2014)
Carrington, L., Laurenzano, M., Tiwari, A.: Inferring large-scale computation behavior via trace extrapolation. In: Large-Scale Parallel Processing workshop (IPDPS 2013)
Carrington, L., Snavely, A., Gao, X., Wolter, N.: A performance prediction framework for scientific applications. In: ICCS Workshop on Performance Modeling and Analysis (PMA03), pp. 926–935 (2003)
Clauss, P.-N., Stillwell, M., Genaud, S., Suter, F., Casanova, H., Quinson, M.: Single node on-line simulation of MPI applications with SMPI. In: Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, pp. 664–675 (2011)
Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., von Eicken, T.: LogP: towards a realistic model of parallel computation. In: Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice Of Parallel Programming, PPOPP 1993, pp. 1–12. ACM, New York, NY, USA (1993)
Gao, G., Sterling, T., Stevens, R., Hereld, M., Zhu, W.: ParalleX: a study of a new parallel computation model. In: Parallel and Distributed Processing Symposium. IPDPS 2007. IEEE International, pp. 1–6 (2007)
Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim - simulating large-scale applications in the LogGOPS Model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 597–604. ACM, June 2010
Janssen, C.L., Adalsteinsson, H., Cranford, S., Kenny, J.P., Pinar, A., Evensky, D.A., Mayo, J.: A simulator for large-scale parallel computer architectures. IJDST 1(2), 57–73 (2010)
Sottile, M., Dakshinamurthy, A., Hendry, G., Dechev, D.: Semi-automatic extraction of software skeletons for benchmarking large-scale parallel applications. In: Proceedings of the 2013 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS 2013, pp. 1–10. ACM, New York, NY, USA (2013)
Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 84:1–84:11. IEEE Computer Society Press, Los Alamitos, CA, USA (2012)
Sterling, T.: Towards a ParalleX-enabled Exascale Architecture. Presentation to the DOE Architecture 2 Workshop, 10 August 2011
Totoni, E., Bhatele, A., Bohm, E., Jain, N., Mendes, C., Mokos, R., Zheng, G., Kale, L.: Simulation-based performance analysis and tuning for a two-level directly connected system. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, December 2011
Zheng, G., Wilmarth, T., Lawlor, O.S., Kalé, L.V., Adve, S., Padua, D., Geubelle, P.: Performance modeling and programming environments for Petaflops computers and the Blue Gene machine. In: NSF Next Generation Systems Program Workshop, 18th International Parallel and Distributed Processing Symposium(IPDPS), p. 197. IEEE Press, Santa Fe, New Mexico, April 2004
Acknowledgments
The authors acknowledge Benjamin Martin, Jackson DeBuhr, Ezra Kissel, Luke D’Alessandro, and Martin Swany for their technical assistance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sterling, T., Anderson, M., Bohan, P.K., Brodowicz, M., Kulkarni, A., Zhang, B. (2015). Towards Exascale Co-design in a Runtime System. In: Markidis, S., Laure, E. (eds) Solving Software Challenges for Exascale. EASC 2014. Lecture Notes in Computer Science(), vol 8759. Springer, Cham. https://doi.org/10.1007/978-3-319-15976-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-15976-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15975-1
Online ISBN: 978-3-319-15976-8
eBook Packages: Computer ScienceComputer Science (R0)