Scaling Properties of Parallel Applications to Exascale

Mariani, Giovanni; Anghel, Andreea; Jongerius, Rik; Dittmann, Gero

doi:10.1007/s10766-016-0412-y

Scaling Properties of Parallel Applications to Exascale

Published: 05 April 2016

Volume 44, pages 975–1002, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Giovanni Mariani¹,
Andreea Anghel²,
Rik Jongerius¹ &
…
Gero Dittmann²

365 Accesses
5 Citations
Explore all metrics

Abstract

A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3\(\times \) for the memory access pattern, and by more than 2\(\times \) for the communication requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Notes

In theory, one may generate also predictions of worst and best case executions, but this exceeds the scope of this paper.
During crossvalidation, the error for a given metric is measured relative to the difference between its maximum and minimum values in the training set. This approach avoids giving too much weight to runs with small values of \(\theta (\varvec{n})\).
Decreasing trends are handled in a similar way, but are rarely found in practice.
The relative error can be written as \(\hat{a}(\varvec{n})/a(\varvec{n})-1 \) and it measures 900 % when \(\hat{a}\) is 10 times larger than \(a\) and \(-90\) % when \(\hat{a}\) is 10 times smaller than \(a\).
We adopt the default configuration available in the Mathematica environment [29].
For these metrics the SOTA method refers to the same extrapolation technique used for the instruction count mix proposed by Calotoiu et al. [9].
There are 512 sub-bands, each partitioned in 512 channels.
Construction of the SKA is planned to begin in 2018. At that point in time, different Xeon-like architectures may be available providing different computational power.

References

Agerwala, T.: Exascale computing: the challenges and opportunities in the next decade. In: 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), p. 1 (2010)
Almeida, A., Castel-Branco, M., Falcao, A.: Linear regression for calibration lines revisited: weighting schemes for bioanalytical methods. J. Chromatogr. B 774(2), 215–222 (2002)
Article Google Scholar
Anghel, A., Rodríguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: High Performance Computing—30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12–16, 2015, Proceedings, pp. 472–487 (2015)
Anghel, A., Vasilescu, L.M., Jongerius, R., Dittmann, G., Mariani, G.: An instrumentation approach for hardware-agnostic software characterization. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 3:1–3:8, New York, NY, USA, ACM (2015)
Bhattacharyya, A., Hoefler, T.: Pemogen: Automatic adaptive performance modeling during program runtime. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 393–404, New York, NY, USA, ACM (2014)
Breugh, M.B., Eyerman, S., Eeckhout, L.: Mechanistic analytical modeling of superscalar in-order processor performance. ACM Trans. Archit. Code Optim. 11(4), 50:1–50:26 (2015)
Article Google Scholar
Brief introduction | Graph 500. http://www.graph500.org
Broekema, P., van Nieuwpoort, R., Bal, H.: The Square Kilometre Array science data processor. Preliminary compute platform design. J. Instrum. 10(07), C07004 (2015)
Article Google Scholar
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 45:1–45:12, New York, NY, USA, ACM (2013)
Carlson, T., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pp. 1–12 (2011)
Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 13:1–13:12, Los Alamitos, CA, USA, IEEE Computer Society Press (2012)
Cook, H., Skadron, K.: Predictive design space exploration using genetically programmed response surfaces. In: Proceedings of the 45th Annual Design Automation Conference, DAC ’08, pp. 960–965, New York, NY, USA, ACM (2008)
Cornwell, T.J., Golap, K., Bhatnagar, S.: The noncoplanar baselines effect in radio interferometry: the w-projection algorithm. IEEE J. Sel. Top. Signal Process. 2(5), 647–657 (2008)
Article Google Scholar
Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J .E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2), 3:1–3:37 (2009)
Article Google Scholar
Fiorin, L., Vermij, E., Van Lunteren, J., Jongerius, R., Hagleitner, C.: An energy-efficient custom architecture for the SKA1-Low central signal processor. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 5:1–5:8, New York, NY, USA, ACM (2015)
Gayawan, E., Ipinyomi, R.A.: A comparison of Akaike, Schwarz and R square criteria for model selection using some fertility models. Aust. J. Basic Appl. Sci. 3(4), 3524–3530 (2009)
Google Scholar
Gluhovsky, I.: Determining output uncertainty of computer system models. Perform. Eval. 64(2), 103–125 (2007)
Article MathSciNet Google Scholar
Gluhovsky, I., Vengerov, D., O’Krafka, B.: Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates. ACM Trans. Comput. Syst. (TOCS) 25(1), 1–32 (2007)
Article Google Scholar
Guo, Q., Chen, T., Chen, Y., Li, L., Hu, W.: Microarchitectural design space exploration made fast. Microprocess. Microsyst. 37(1), 41–51 (2013)
Article Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2003)
MATH Google Scholar
Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)
Article MathSciNet MATH Google Scholar
Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: 2015 33nd IEEE International Conference on Computer Design (ICCD), pp. 440–443 (2015)
Jongerius, R., Wijnholds, S., Nijboer, R., Corporaal, H.: An end-to-end computing model for the Square Kilometre Array. Computer 47(9), 48–54 (2014)
Article Google Scholar
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Li, B., Peng, L., Ramadass, B.: Accurate and efficient processor performance prediction via regression tree based modeling. J. Syst. Archit. 55(10–12), 457–467 (2009)
Article Google Scholar
Mariani, G., Anghel, A., Jongerius, R., Dittmann, G.: Scaling application properties to exascale. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 31:1–31:8, New York, NY, USA, ACM (2015)
Mariani, G., Palermo, G., Zaccaria, V., Silvano, C.: OSCAR: an optimization methodology exploiting spatial correlation in multicore design spaces. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 31(5), 740–753 (2012)
Article Google Scholar
Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, pp. 2–13, New York, NY, USA, ACM (2004)
Mathematica 10, 2014. http://www.wolfram.com/mathematica/
Montgomery, D.: Design and Analysis of Experiments, 8th edn. Wiley, Hoboken (2012)
Google Scholar
Sipser, M.: Introduction to the Theory of Computation. Thomson Course Technology, Boston (2006)
MATH Google Scholar
SPEC CPU benchmarks. http://www.spec.org/benchmarks.html
The LLVM compiler infrastructure project. http://www.llvm.org/
Ueno, K., Suzumura, T.: Highly scalable graph search for the Graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 149–160, New York, NY, USA, ACM (2012)
White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4), 817–838 (1980)
Article MathSciNet MATH Google Scholar
Wong, A., Rexachs, D., Luque, E.: Parallel application signature for performance analysis and prediction. Parallel Distrib. Syst. IEEE Trans. 26(7), 2009–2019 (2015)
Article Google Scholar
Zhang, Z., Xiaofeng, B.: Comparison about the three central composite designs with simulation. In: International Conference on Advanced Computer Control. ICACC ’09, pp. 163–167 (2009)

Download references

Acknowledgments

This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe.

Author information

Authors and Affiliations

IBM Research, Dwingeloo, The Netherlands
Giovanni Mariani & Rik Jongerius
IBM Research – Zurich, Rüschlikon, Switzerland
Andreea Anghel & Gero Dittmann

Authors

Giovanni Mariani
View author publications
You can also search for this author in PubMed Google Scholar
Andreea Anghel
View author publications
You can also search for this author in PubMed Google Scholar
Rik Jongerius
View author publications
You can also search for this author in PubMed Google Scholar
Gero Dittmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Mariani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mariani, G., Anghel, A., Jongerius, R. et al. Scaling Properties of Parallel Applications to Exascale. Int J Parallel Prog 44, 975–1002 (2016). https://doi.org/10.1007/s10766-016-0412-y

Download citation

Received: 21 October 2015
Accepted: 16 March 2016
Published: 05 April 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10766-016-0412-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scaling Properties of Parallel Applications to Exascale

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scaling Properties of Parallel Applications to Exascale

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation