Skip to main content

Machine-Learning-Based Load Balancing for Community Ice Code Component in CESM

  • Conference paper
  • First Online:
High Performance Computing for Computational Science -- VECPAR 2014 (VECPAR 2014)

Abstract

Load balancing scientific codes on massively parallel architectures is becoming an increasingly challenging task. In this paper, we focus on the Community Earth System Model, a widely used climate modeling code. It comprises six components each of which exhibits different scalability patterns. Previously, an analytical performance model has been used to find optimal load-balancing parameter configurations for each component. Nevertheless, for the Community Ice Code component, the analytical performance model is too restrictive to capture its scalability patterns. We therefore developed machine-learning-based load-balancing algorithm. It involves fitting a surrogate model to a small number of load-balancing configurations and their corresponding runtimes. This model is then used to find high-quality parameter configurations. Compared with the current practice of expert-knowledge-based enumeration over feasible configurations, the machine-learning-based load-balancing algorithm requires six times fewer evaluations to find the optimal configuration.

The submitted manuscript has been created by the UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne) under Contracts No. DE-AC02-06CH11357 and DE-FG02-05ER25694 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The NCAR is sponsored by the National Science Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Metz, B., Davidson, O., Bosch, P., Dave, R., Meyer, L.: Contribution of working group III to the fourth assessment report of the Intergovernmental Panel on Climate Change (2007)

    Google Scholar 

  2. MINOTAUR: a toolkit for MINLP. http://wiki.mcs.anl.gov/minotaur/index.php/Main_Page

  3. 2013. http://www.cesm.ucar.edu/events/ws.2012/Presentations/SEWG2/craig.pdf

  4. Bishop, C.M., et al.: Pattern Recognition And Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Hearst, M.A., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. Intell. Syst. Appl. 13(4), 18–28 (1998). IEEE

    Article  Google Scholar 

  7. Rasmussen, C.E., Williams, C.K.: Gaussian Processes For Machine Learning. adaptive computation and machine learning. MIT Press, Cambridge (2005)

    Google Scholar 

  8. Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. Prentice Hall PTR, Upper Saddle River (1994)

    MATH  Google Scholar 

  9. Atkinson, E.J., Therneau, T.M.: An Introduction To Recursive Partitioning Using The Rpart Routines. Mayo Foundation, Rochester (2000)

    Google Scholar 

  10. R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.r-project.org

  11. Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. ACM SIGPLAN Not. 28(10), 91–108 (1993)

    Article  Google Scholar 

  12. Barker, K., Chernikov, A., Chrisochoides, N., Pingali, K.: A load balancingframework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15(2), 183–192 (2004)

    Article  Google Scholar 

  13. Barker, K.J., Chrisochoides, N.P.: An evaluation of a framework for the dynamic load balancing of highly adaptive and irregular parallel applications. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 45. ACM (2003)

    Google Scholar 

  14. Huang, C., Zheng, G., Kalé, L., Kumar, S.: Performance evaluation of adaptive MPI. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 12–21. ACM (2006)

    Google Scholar 

  15. Boneti, C., Gioiosa, R., Cazorla, F.J., Valero, M.: A dynamic scheduler for balancing HPC applications. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 41. IEEE Press (2008)

    Google Scholar 

  16. Sharma, R., Kanungo, P.: Dynamic load balancing algorithm for heterogeneous multi-core processors cluster. In: 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT), pp. 288–292. IEEE (2014)

    Google Scholar 

  17. Hu, Y., Blake, R.: Load balancing for unstructured mesh applications. Parallel Distrib. Comput. Pract. 2(3), 117–148 (1999)

    Google Scholar 

  18. Braun, T.D., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)

    Article  Google Scholar 

  19. Ichikawa, S., Yamashita, S.: Static load balancing of parallel PDE solver for distributed computing environment. In: Proceedings of the 13th International Conference on Parallel and Distributed Computing Systems, pp. 399–405 (2000)

    Google Scholar 

  20. Effatparvar, M., Garshasbi, M.: A genetic algorithm for static load balancing in parallel heterogeneous systems. Procedia Soc. Behav. Sci. 129, 358–364 (2014)

    Article  Google Scholar 

  21. Balaprakash, P., Wild, S.M., Hovland, P.D.: Can search algorithms save large-scale automatic performance tuning? In: International Conference on Computational Science (2011)

    Google Scholar 

  22. Jia, Y., Sun, J.-Z.: A load balance service based on probabilistic neural network. In: International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1333–1336. IEEE (2003)

    Google Scholar 

  23. Dantas, M.A., Pinto, A.R.: A load balancing approach based on a geneticmachine learning algorithm. In: 19th International Symposium on HighPerformance Computing Systems and Applications (HPCS 2005), pp. 124–130. IEEE (2005)

    Google Scholar 

  24. Helmy, T., Shahab, S.A.: Machine learning-based adaptive load balancing framework for distributed object computing. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 488–497. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasanna Balaprakash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Balaprakash, P., Alexeev, Y., Mickelson, S.A., Leyffer, S., Jacob, R., Craig, A. (2015). Machine-Learning-Based Load Balancing for Community Ice Code Component in CESM. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics