Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications

  • Jeffrey S. VetterEmail author
  • Seyong Lee
  • Dong Li
  • Gabriel Marin
  • Collin McCurdy
  • Jeremy Meredith
  • Philip C. Roth
  • Kyle Spafford
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


As detailed in recent reports, HPC architectures will continue to change over the next decade in an effort to improve energy efficiency, reliability, and performance. At this time of significant disruption, it is critically important to understand specific application requirements, so that these architectural changes can include features that satisfy the requirements of contemporary extreme-scale scientific applications. To address this need, we have developed a methodology supported by a toolkit that allows us to investigate detailed computation, memory, and communication behaviors of applications at varying levels of resolution. Using this methodology, we performed a broad-based, detailed characterization of 12 contemporary scalable scientific applications and benchmarks. Our analysis reveals numerous behaviors that sometimes contradict conventional wisdom about scientific applications. For example, the results reveal that only one of our applications executes more floating-point instructions than other types of instructions. In another example, we found that communication topologies are very regular, even for applications that, at first glance, should be highly irregular. These observations emphasize the necessity of measurement-driven analysis of real applications, and help prioritize features that should be included in future architectures.


Message Passing Interface Communication Behavior Memory Bandwidth Single Instruction Multiple Data Collective Operation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dongarra, J., Beckman, P., Moore, T., Aerts, P., Aloisio, G., Andre, J.C., Barkai, D., Berthou, J.Y., Boku, T., Braunschweig, B., Cappello, F., Chapman, B., Chi, X., Choudhary, A., Dosanjh, S., Dunning, T., Fiore, S., Geist, A., Gropp, B., Harrison, R., Hereld, M., Heroux, M., Hoisie, A., Hotta, K., Jin, Z., Ishikawa, Y., Johnson, F., Kale, S., Kenway, R., Keyes, D., Kramer, B., Labarta, J., Lichnewsky, A., Lippert, T., Lucas, B., Maccabe, B., Matsuoka, S., Messina, P., Michielse, P., Mohr, B., Mueller, M.S., Nagel, W.E., Nakashima, H., Papka, M.E., Reed, D., Sato, M., Seidel, E., Shalf, J., Skinner, D., Snir, M., Sterling, T., Stevens, R., Streitz, F., Sugar, B., Sumimoto, S., Tang, W., Taylor, J., Thakur, R., Trefethen, A., Valero, M., van der Steen, A., Vetter, J., Williams, P., Wisniewski, R., Yelick, K.: The international exascale software project roadmap. International Journal of High Performance Computing Applications 25(1), 3–60 (2011)CrossRefGoogle Scholar
  2. 2.
    Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K.: Exascale computing study: Technology challenges in achieving exascale systems. Technical report, DARPA Information Processing Techniques Office (2008)Google Scholar
  3. 3.
    Snir, M., Gropp, W.D., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W. (eds.): MPI-the complete reference (2-volume set) 2nd edn. Scientific and Engineering Computation. MIT Press, Cambridge (1998)Google Scholar
  4. 4.
    Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Husbands, P., Keutzer, K., Patterson, D., Plishker, W., Shalf, J., Williams, S.: The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)Google Scholar
  5. 5.
    Vetter, J.S., Yoo, A.: An empirical performance evaluation of scalable scientific applications. In: SC 2002, Baltimore, MD, USA. IEEE (2002)Google Scholar
  6. 6.
    Shalf, J., Kamil, S., Oliker, L., Skinner, D.: Analyzing ultra-scale application communication requirements for a reconfigurable hybrid interconnect. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 17. IEEE Computer Society (2005)Google Scholar
  7. 7.
    Brightwell, R., Underwood, K.D.: An analysis of the impact of mpi overlap and independent progress. In: Proceedings of the 18th Annual International Conference on Supercomputing, Malo, France, pp. 298–305. ACM (2004)Google Scholar
  8. 8.
    Riesen, R.: Communication patterns. In: 20th International Parallel and Distributed Processing Symposium (IPDPS), 8 p. (2006)Google Scholar
  9. 9.
    Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: International Parallel and Distributed Processing Symposium (IPDPS), Ft. Lauderdale, Florida (2002)Google Scholar
  10. 10.
    Vetter, J.S., Glassbrook, R., Dongarra, J., Schwan, K., Loftis, B., McNally, S., Meredith, J., Rogers, J., Roth, P., Spafford, K., Yalamanchili, S.: Keeneland: Bringing heterogeneous GPU computing to the computational science community. IEEE Computing in Science and Engineering 13(5), 90–95 (2011)CrossRefGoogle Scholar
  11. 11.
    Dongarra, J.J., Luszczek, P.: Introduction to the hpcchallenge benchmark suite. Technical Report ICL-UT-05-01, Innovative Computing Laboratory, University of Tennessee-Knoxville (2005)Google Scholar
  12. 12.
    Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM Journal on Scientific Computing 21(5), 1823–1834 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Smith, M.A., Marin-Lafleche, A., Yang, W.S., Kaushik, D., Siegel, A.: Method of characteristics development targeting the high performance Blue Gene/P computer at argonne national laboratory. In: Proceedings of the International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (MC 2011). American Nuclear Society (2011)Google Scholar
  14. 14.
    Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., Luke, E., Lloyd, S., McGraw, J., Neely, R., Richards, D., Schulz, M., Still, C.H., Wang, F., Wong, D.: Lulesh programming model and performance ports overview. Technical Report LLNL-TR-608824, Lawrence Livermore National Laboratory (December 2012)Google Scholar
  15. 15.
    Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery 2(1) (2009)Google Scholar
  16. 16.
    Spafford, K.L., Meredith, J.S., Vetter, J.S., Chen, J., Grout, R., Sankaran, R.: Accelerating S3D: A GPGPU case study. In: HeteroPar 2009: Proceedings of the Seventh International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (2009)Google Scholar
  17. 17.
    Germann, T.C., Kadau, K.: Trillion-atom molecular dynamics becomes a reality. International Journal of Modern Physics C 19(09), 1315–1319 (2008)CrossRefzbMATHGoogle Scholar
  18. 18.
    Lee, W.W.: Gyrokinetic approach in particle simulation. Physics of Fluids 26, 556–562 (1983)CrossRefzbMATHGoogle Scholar
  19. 19.
    Richards, D.F., Glosli, J.N., Chan, B., Dorr, M.R., Draeger, E.W., Fattebert, J.L., Krauss, W.D., Spelce, T., Streitz, F.H., Surh, M.P., Gunnels, J.A.: Beyond homogeneous decomposition: Scaling long-range forces on massively parallel systems. In: Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis, SC 2009. ACM, New York (2009)Google Scholar
  20. 20.
    Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics 117, 1–19 (1995)CrossRefzbMATHGoogle Scholar
  21. 21.
    Fischer, P., Lottes, J., Kerkemeier, S.: Nek5000 website (2008)Google Scholar
  22. 22.
    Smith, R.D., Dukowicz, J.K., Malone, R.C.: Parallel ocean general circulation modeling. Physica D 60(1–4), 38–61 (1992)CrossRefzbMATHGoogle Scholar
  23. 23.
    Collins, W.D., Blackmon, M.L., Bonan, G.B., Hack, J.J., Henderson, T.B., Kielh, J.T., Large, W.G., McKenna, D.S., Bitz, C.M., Bretherton, C.S., Carton, J.A., Chang, P., Doney, S.C., Santer, B.D., Smith, R.D.: The Community Climate System Model version 3 (CCSM3). Journal of Climate 19(11), 2122–2143 (2006)CrossRefGoogle Scholar
  24. 24.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005)Google Scholar
  25. 25.
  26. 26.
    Intel Corporation: Intel Architecture software developer’s manual, vol. 1: basic architecture (1999)Google Scholar
  27. 27.
    Advanced Micro Devices Inc: 3DNow! technology manual (2000)Google Scholar
  28. 28.
    Intel Corporation: Intel SSE4 programming reference (April 2007)Google Scholar
  29. 29.
    Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications 14, 189–204 (2000)CrossRefGoogle Scholar
  30. 30.
    Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2003)Google Scholar
  31. 31.
    Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: Workshop on Performance Modeling, Evaluation, and Optimization of Ubiquitous Computing and Networked Systems (2010)Google Scholar
  32. 32.
    Ding, C., Zhong, Y.: Reuse distance analysis. Technical Report UR-CS-TR-741, Computer Science Department, University of Rochester (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jeffrey S. Vetter
    • 1
    • 2
    Email author
  • Seyong Lee
    • 1
  • Dong Li
    • 1
  • Gabriel Marin
    • 3
  • Collin McCurdy
    • 1
  • Jeremy Meredith
    • 1
  • Philip C. Roth
    • 1
  • Kyle Spafford
    • 1
  1. 1.Oak Ridge National LaboratoryOak RidgeUSA
  2. 2.Georgia Institute of TechnologyAtlantaUSA
  3. 3.University of Tennessee–KnoxvilleKnoxvilleUSA

Personalised recommendations