Abstract
Selecting the close-to-optimal collective algorithm based on the parameters of the collective call at run time is an important step for achieving good performance of MPI applications. In this paper, we explore the applicability of C4.5 decision trees to the MPI collective algorithm selection problem. We construct C4.5 decision trees from the measured algorithm performance data and analyze both the decision tree properties and the expected run time performance penalty.
In cases we considered, results show that the C4.5 decision trees can be used to generate a reasonably small and very accurate decision function. For example, the broadcast decision tree with only 21 leaves was able to achieve a mean performance penalty of 2.08%. Similarly, combining experimental data for reduce and broadcast and generating a decision function from the combined decision trees resulted in less than 2.5% relative performance penalty. The results indicate that C4.5 decision trees are applicable to this problem and should be more widely used in this domain.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Rabenseifner, R.: Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference, pp. 77–85 (1999)
Worringen, J.: Pipelining and overlapping for MPI collective operations. In: 28th Annyal IEEE Conference on Local Computer Network, Bonn/Königswinter, Germany, pp. 548–557. IEEE Computer Society Press, Los Alamitos (2003)
Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, Springer, Heidelberg (2004)
Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.M.: On optimizing of collective communication. In: Proceedings of IEEE International Conference on Cluster Computing, 145–155 (2004)
Bernaschi, M., Iannello, G., Lauria, M.: Efficient implementation of reduce-scatter in MPI. Journal of Systems Architure 49(3), 89–108 (2003)
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications 19(1), 49–66 (2005)
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Proceedings of the ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp. 131–140. ACM Press, New York (1999)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California (1993)
Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. In: Proceedings of IPDPS 2005 - PMEO-PDS Workshop, p. 272. IEEE Computer Society Press, Los Alamitos (2005)
Fagg, G.E., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pješivac-Grbović, J., London, K., Dongarra, J.: Extending the MPI specification for process fault tolerance on high performance computing systems. In: Proceedings of the International Supercomputer Conference (ISC) 2004, Primeur (2004)
Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Angskun, T., Dongarra, J.: Tuned: A flexible high performance collective communication component developed for Open MPI. In: Proccedings of DAPSYS 2006, Innsbruck, Austria, pp. 65–72. Springer, Heidelberg (2006)
Pješivac-Grbović, J., Fagg, G.E., Angskun, T., Bosilca, G., Dongarra, J.J.: MPI collective algorithm selection and quadtree encoding. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 40–48. Springer, Heidelberg (2006)
Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical Models for Empirical Search-Based Performance Tuning. International Journal of High Performance Computing Applications 18(1), 65–94 (2004)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York, NY (1998)
Quinlan, J.R.: C4.5 source code (2006), http://www.rulequest.com/Personal
MPICH-2: Implementation of MPI 2 standard (2005), http://www-unix.mcs.anl.gov/mpi/mpich2
OCC: Optimized Collective Communication Library (2005), http://www.cs.utk.edu/~pjesa/projects/occ
SKaMPI: Special Karlsruher MPI Benchmark (2005), http://liinwww.ira.uka.de/~skampi
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pješivac-Grbović, J., Bosilca, G., Fagg, G.E., Angskun, T., Dongarra, J.J. (2007). Decision Trees and MPI Collective Algorithm Selection Problem. In: Kermarrec, AM., Bougé, L., Priol, T. (eds) Euro-Par 2007 Parallel Processing. Euro-Par 2007. Lecture Notes in Computer Science, vol 4641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74466-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-74466-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74465-8
Online ISBN: 978-3-540-74466-5
eBook Packages: Computer ScienceComputer Science (R0)