Abstract
Load balancing scientific codes on massively parallel architectures is becoming an increasingly challenging task. In this paper, we focus on the Community Earth System Model, a widely used climate modeling code. It comprises six components each of which exhibits different scalability patterns. Previously, an analytical performance model has been used to find optimal load-balancing parameter configurations for each component. Nevertheless, for the Community Ice Code component, the analytical performance model is too restrictive to capture its scalability patterns. We therefore developed machine-learning-based load-balancing algorithm. It involves fitting a surrogate model to a small number of load-balancing configurations and their corresponding runtimes. This model is then used to find high-quality parameter configurations. Compared with the current practice of expert-knowledge-based enumeration over feasible configurations, the machine-learning-based load-balancing algorithm requires six times fewer evaluations to find the optimal configuration.
The submitted manuscript has been created by the UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne) under Contracts No. DE-AC02-06CH11357 and DE-FG02-05ER25694 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The NCAR is sponsored by the National Science Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Metz, B., Davidson, O., Bosch, P., Dave, R., Meyer, L.: Contribution of working group III to the fourth assessment report of the Intergovernmental Panel on Climate Change (2007)
MINOTAUR: a toolkit for MINLP. http://wiki.mcs.anl.gov/minotaur/index.php/Main_Page
2013. http://www.cesm.ucar.edu/events/ws.2012/Presentations/SEWG2/craig.pdf
Bishop, C.M., et al.: Pattern Recognition And Machine Learning. Springer, New York (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Hearst, M.A., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. Intell. Syst. Appl. 13(4), 18–28 (1998). IEEE
Rasmussen, C.E., Williams, C.K.: Gaussian Processes For Machine Learning. adaptive computation and machine learning. MIT Press, Cambridge (2005)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 1st edn. Prentice Hall PTR, Upper Saddle River (1994)
Atkinson, E.J., Therneau, T.M.: An Introduction To Recursive Partitioning Using The Rpart Routines. Mayo Foundation, Rochester (2000)
R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.r-project.org
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. ACM SIGPLAN Not. 28(10), 91–108 (1993)
Barker, K., Chernikov, A., Chrisochoides, N., Pingali, K.: A load balancingframework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15(2), 183–192 (2004)
Barker, K.J., Chrisochoides, N.P.: An evaluation of a framework for the dynamic load balancing of highly adaptive and irregular parallel applications. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 45. ACM (2003)
Huang, C., Zheng, G., Kalé, L., Kumar, S.: Performance evaluation of adaptive MPI. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 12–21. ACM (2006)
Boneti, C., Gioiosa, R., Cazorla, F.J., Valero, M.: A dynamic scheduler for balancing HPC applications. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 41. IEEE Press (2008)
Sharma, R., Kanungo, P.: Dynamic load balancing algorithm for heterogeneous multi-core processors cluster. In: 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT), pp. 288–292. IEEE (2014)
Hu, Y., Blake, R.: Load balancing for unstructured mesh applications. Parallel Distrib. Comput. Pract. 2(3), 117–148 (1999)
Braun, T.D., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Ichikawa, S., Yamashita, S.: Static load balancing of parallel PDE solver for distributed computing environment. In: Proceedings of the 13th International Conference on Parallel and Distributed Computing Systems, pp. 399–405 (2000)
Effatparvar, M., Garshasbi, M.: A genetic algorithm for static load balancing in parallel heterogeneous systems. Procedia Soc. Behav. Sci. 129, 358–364 (2014)
Balaprakash, P., Wild, S.M., Hovland, P.D.: Can search algorithms save large-scale automatic performance tuning? In: International Conference on Computational Science (2011)
Jia, Y., Sun, J.-Z.: A load balance service based on probabilistic neural network. In: International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1333–1336. IEEE (2003)
Dantas, M.A., Pinto, A.R.: A load balancing approach based on a geneticmachine learning algorithm. In: 19th International Symposium on HighPerformance Computing Systems and Applications (HPCS 2005), pp. 124–130. IEEE (2005)
Helmy, T., Shahab, S.A.: Machine learning-based adaptive load balancing framework for distributed object computing. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 488–497. Springer, Heidelberg (2006)
Acknowledgments
This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Balaprakash, P., Alexeev, Y., Mickelson, S.A., Leyffer, S., Jacob, R., Craig, A. (2015). Machine-Learning-Based Load Balancing for Community Ice Code Component in CESM. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-17353-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)