Skip to main content
Log in

Scaling Properties of Parallel Applications to Exascale

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3\(\times \) for the memory access pattern, and by more than 2\(\times \) for the communication requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In theory, one may generate also predictions of worst and best case executions, but this exceeds the scope of this paper.

  2. During crossvalidation, the error for a given metric is measured relative to the difference between its maximum and minimum values in the training set. This approach avoids giving too much weight to runs with small values of \(\theta (\varvec{n})\).

  3. Decreasing trends are handled in a similar way, but are rarely found in practice.

  4. The relative error can be written as \(\hat{a}(\varvec{n})/a(\varvec{n})-1 \) and it measures 900 % when \(\hat{a}\) is 10 times larger than \(a\) and \(-90\) % when \(\hat{a}\) is 10 times smaller than \(a\).

  5. We adopt the default configuration available in the Mathematica environment [29].

  6. For these metrics the SOTA method refers to the same extrapolation technique used for the instruction count mix proposed by Calotoiu et al. [9].

  7. There are 512 sub-bands, each partitioned in 512 channels.

  8. Construction of the SKA is planned to begin in 2018. At that point in time, different Xeon-like architectures may be available providing different computational power.

References

  1. Agerwala, T.: Exascale computing: the challenges and opportunities in the next decade. In: 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), p. 1 (2010)

  2. Almeida, A., Castel-Branco, M., Falcao, A.: Linear regression for calibration lines revisited: weighting schemes for bioanalytical methods. J. Chromatogr. B 774(2), 215–222 (2002)

    Article  Google Scholar 

  3. Anghel, A., Rodríguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: High Performance Computing—30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12–16, 2015, Proceedings, pp. 472–487 (2015)

  4. Anghel, A., Vasilescu, L.M., Jongerius, R., Dittmann, G., Mariani, G.: An instrumentation approach for hardware-agnostic software characterization. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 3:1–3:8, New York, NY, USA, ACM (2015)

  5. Bhattacharyya, A., Hoefler, T.: Pemogen: Automatic adaptive performance modeling during program runtime. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 393–404, New York, NY, USA, ACM (2014)

  6. Breugh, M.B., Eyerman, S., Eeckhout, L.: Mechanistic analytical modeling of superscalar in-order processor performance. ACM Trans. Archit. Code Optim. 11(4), 50:1–50:26 (2015)

    Article  Google Scholar 

  7. Brief introduction | Graph 500. http://www.graph500.org

  8. Broekema, P., van Nieuwpoort, R., Bal, H.: The Square Kilometre Array science data processor. Preliminary compute platform design. J. Instrum. 10(07), C07004 (2015)

    Article  Google Scholar 

  9. Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 45:1–45:12, New York, NY, USA, ACM (2013)

  10. Carlson, T., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pp. 1–12 (2011)

  11. Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 13:1–13:12, Los Alamitos, CA, USA, IEEE Computer Society Press (2012)

  12. Cook, H., Skadron, K.: Predictive design space exploration using genetically programmed response surfaces. In: Proceedings of the 45th Annual Design Automation Conference, DAC ’08, pp. 960–965, New York, NY, USA, ACM (2008)

  13. Cornwell, T.J., Golap, K., Bhatnagar, S.: The noncoplanar baselines effect in radio interferometry: the w-projection algorithm. IEEE J. Sel. Top. Signal Process. 2(5), 647–657 (2008)

    Article  Google Scholar 

  14. Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J .E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2), 3:1–3:37 (2009)

    Article  Google Scholar 

  15. Fiorin, L., Vermij, E., Van Lunteren, J., Jongerius, R., Hagleitner, C.: An energy-efficient custom architecture for the SKA1-Low central signal processor. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 5:1–5:8, New York, NY, USA, ACM (2015)

  16. Gayawan, E., Ipinyomi, R.A.: A comparison of Akaike, Schwarz and R square criteria for model selection using some fertility models. Aust. J. Basic Appl. Sci. 3(4), 3524–3530 (2009)

    Google Scholar 

  17. Gluhovsky, I.: Determining output uncertainty of computer system models. Perform. Eval. 64(2), 103–125 (2007)

    Article  MathSciNet  Google Scholar 

  18. Gluhovsky, I., Vengerov, D., O’Krafka, B.: Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates. ACM Trans. Comput. Syst. (TOCS) 25(1), 1–32 (2007)

    Article  Google Scholar 

  19. Guo, Q., Chen, T., Chen, Y., Li, L., Hu, W.: Microarchitectural design space exploration made fast. Microprocess. Microsyst. 37(1), 41–51 (2013)

    Article  Google Scholar 

  20. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    MATH  Google Scholar 

  21. Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  22. Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: 2015 33nd IEEE International Conference on Computer Design (ICCD), pp. 440–443 (2015)

  23. Jongerius, R., Wijnholds, S., Nijboer, R., Corporaal, H.: An end-to-end computing model for the Square Kilometre Array. Computer 47(9), 48–54 (2014)

    Article  Google Scholar 

  24. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  25. Li, B., Peng, L., Ramadass, B.: Accurate and efficient processor performance prediction via regression tree based modeling. J. Syst. Archit. 55(10–12), 457–467 (2009)

    Article  Google Scholar 

  26. Mariani, G., Anghel, A., Jongerius, R., Dittmann, G.: Scaling application properties to exascale. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 31:1–31:8, New York, NY, USA, ACM (2015)

  27. Mariani, G., Palermo, G., Zaccaria, V., Silvano, C.: OSCAR: an optimization methodology exploiting spatial correlation in multicore design spaces. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 31(5), 740–753 (2012)

    Article  Google Scholar 

  28. Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, pp. 2–13, New York, NY, USA, ACM (2004)

  29. Mathematica 10, 2014. http://www.wolfram.com/mathematica/

  30. Montgomery, D.: Design and Analysis of Experiments, 8th edn. Wiley, Hoboken (2012)

    Google Scholar 

  31. Sipser, M.: Introduction to the Theory of Computation. Thomson Course Technology, Boston (2006)

    MATH  Google Scholar 

  32. SPEC CPU benchmarks. http://www.spec.org/benchmarks.html

  33. The LLVM compiler infrastructure project. http://www.llvm.org/

  34. Ueno, K., Suzumura, T.: Highly scalable graph search for the Graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 149–160, New York, NY, USA, ACM (2012)

  35. White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4), 817–838 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  36. Wong, A., Rexachs, D., Luque, E.: Parallel application signature for performance analysis and prediction. Parallel Distrib. Syst. IEEE Trans. 26(7), 2009–2019 (2015)

    Article  Google Scholar 

  37. Zhang, Z., Xiaofeng, B.: Comparison about the three central composite designs with simulation. In: International Conference on Advanced Computer Control. ICACC ’09, pp. 163–167 (2009)

Download references

Acknowledgments

This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Mariani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mariani, G., Anghel, A., Jongerius, R. et al. Scaling Properties of Parallel Applications to Exascale. Int J Parallel Prog 44, 975–1002 (2016). https://doi.org/10.1007/s10766-016-0412-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0412-y

Keywords

Navigation