International Journal of Parallel Programming

, Volume 44, Issue 5, pp 975–1002 | Cite as

Scaling Properties of Parallel Applications to Exascale

  • Giovanni Mariani
  • Andreea Anghel
  • Rik Jongerius
  • Gero Dittmann


A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3\(\times \) for the memory access pattern, and by more than 2\(\times \) for the communication requirements.


Exascale architectures Profiling MPI OpenMP  Square kilometre array Radio-astronomy Design space exploration Supercomputing 



This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe.


  1. 1.
    Agerwala, T.: Exascale computing: the challenges and opportunities in the next decade. In: 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), p. 1 (2010)Google Scholar
  2. 2.
    Almeida, A., Castel-Branco, M., Falcao, A.: Linear regression for calibration lines revisited: weighting schemes for bioanalytical methods. J. Chromatogr. B 774(2), 215–222 (2002)CrossRefGoogle Scholar
  3. 3.
    Anghel, A., Rodríguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: High Performance Computing—30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12–16, 2015, Proceedings, pp. 472–487 (2015)Google Scholar
  4. 4.
    Anghel, A., Vasilescu, L.M., Jongerius, R., Dittmann, G., Mariani, G.: An instrumentation approach for hardware-agnostic software characterization. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 3:1–3:8, New York, NY, USA, ACM (2015)Google Scholar
  5. 5.
    Bhattacharyya, A., Hoefler, T.: Pemogen: Automatic adaptive performance modeling during program runtime. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 393–404, New York, NY, USA, ACM (2014)Google Scholar
  6. 6.
    Breugh, M.B., Eyerman, S., Eeckhout, L.: Mechanistic analytical modeling of superscalar in-order processor performance. ACM Trans. Archit. Code Optim. 11(4), 50:1–50:26 (2015)CrossRefGoogle Scholar
  7. 7.
    Brief introduction | Graph 500.
  8. 8.
    Broekema, P., van Nieuwpoort, R., Bal, H.: The Square Kilometre Array science data processor. Preliminary compute platform design. J. Instrum. 10(07), C07004 (2015)CrossRefGoogle Scholar
  9. 9.
    Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 45:1–45:12, New York, NY, USA, ACM (2013)Google Scholar
  10. 10.
    Carlson, T., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pp. 1–12 (2011)Google Scholar
  11. 11.
    Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 13:1–13:12, Los Alamitos, CA, USA, IEEE Computer Society Press (2012)Google Scholar
  12. 12.
    Cook, H., Skadron, K.: Predictive design space exploration using genetically programmed response surfaces. In: Proceedings of the 45th Annual Design Automation Conference, DAC ’08, pp. 960–965, New York, NY, USA, ACM (2008)Google Scholar
  13. 13.
    Cornwell, T.J., Golap, K., Bhatnagar, S.: The noncoplanar baselines effect in radio interferometry: the w-projection algorithm. IEEE J. Sel. Top. Signal Process. 2(5), 647–657 (2008)CrossRefGoogle Scholar
  14. 14.
    Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J .E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2), 3:1–3:37 (2009)CrossRefGoogle Scholar
  15. 15.
    Fiorin, L., Vermij, E., Van Lunteren, J., Jongerius, R., Hagleitner, C.: An energy-efficient custom architecture for the SKA1-Low central signal processor. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 5:1–5:8, New York, NY, USA, ACM (2015)Google Scholar
  16. 16.
    Gayawan, E., Ipinyomi, R.A.: A comparison of Akaike, Schwarz and R square criteria for model selection using some fertility models. Aust. J. Basic Appl. Sci. 3(4), 3524–3530 (2009)Google Scholar
  17. 17.
    Gluhovsky, I.: Determining output uncertainty of computer system models. Perform. Eval. 64(2), 103–125 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Gluhovsky, I., Vengerov, D., O’Krafka, B.: Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates. ACM Trans. Comput. Syst. (TOCS) 25(1), 1–32 (2007)CrossRefGoogle Scholar
  19. 19.
    Guo, Q., Chen, T., Chen, Y., Li, L., Hu, W.: Microarchitectural design space exploration made fast. Microprocess. Microsyst. 37(1), 41–51 (2013)CrossRefGoogle Scholar
  20. 20.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2003)MATHGoogle Scholar
  21. 21.
    Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: 2015 33nd IEEE International Conference on Computer Design (ICCD), pp. 440–443 (2015)Google Scholar
  23. 23.
    Jongerius, R., Wijnholds, S., Nijboer, R., Corporaal, H.: An end-to-end computing model for the Square Kilometre Array. Computer 47(9), 48–54 (2014)CrossRefGoogle Scholar
  24. 24.
    Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, New York (1997)CrossRefMATHGoogle Scholar
  25. 25.
    Li, B., Peng, L., Ramadass, B.: Accurate and efficient processor performance prediction via regression tree based modeling. J. Syst. Archit. 55(10–12), 457–467 (2009)CrossRefGoogle Scholar
  26. 26.
    Mariani, G., Anghel, A., Jongerius, R., Dittmann, G.: Scaling application properties to exascale. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 31:1–31:8, New York, NY, USA, ACM (2015)Google Scholar
  27. 27.
    Mariani, G., Palermo, G., Zaccaria, V., Silvano, C.: OSCAR: an optimization methodology exploiting spatial correlation in multicore design spaces. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 31(5), 740–753 (2012)CrossRefGoogle Scholar
  28. 28.
    Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, pp. 2–13, New York, NY, USA, ACM (2004)Google Scholar
  29. 29.
  30. 30.
    Montgomery, D.: Design and Analysis of Experiments, 8th edn. Wiley, Hoboken (2012)Google Scholar
  31. 31.
    Sipser, M.: Introduction to the Theory of Computation. Thomson Course Technology, Boston (2006)MATHGoogle Scholar
  32. 32.
  33. 33.
    The LLVM compiler infrastructure project.
  34. 34.
    Ueno, K., Suzumura, T.: Highly scalable graph search for the Graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 149–160, New York, NY, USA, ACM (2012)Google Scholar
  35. 35.
    White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4), 817–838 (1980)MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Wong, A., Rexachs, D., Luque, E.: Parallel application signature for performance analysis and prediction. Parallel Distrib. Syst. IEEE Trans. 26(7), 2009–2019 (2015)CrossRefGoogle Scholar
  37. 37.
    Zhang, Z., Xiaofeng, B.: Comparison about the three central composite designs with simulation. In: International Conference on Advanced Computer Control. ICACC ’09, pp. 163–167 (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Giovanni Mariani
    • 1
  • Andreea Anghel
    • 2
  • Rik Jongerius
    • 1
  • Gero Dittmann
    • 2
  1. 1.IBM ResearchDwingelooThe Netherlands
  2. 2.IBM Research – ZurichRüschlikonSwitzerland

Personalised recommendations