Decision Trees and MPI Collective Algorithm Selection Problem

  • Jelena Pješivac-Grbović
  • George Bosilca
  • Graham E. Fagg
  • Thara Angskun
  • Jack J. Dongarra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)


Selecting the close-to-optimal collective algorithm based on the parameters of the collective call at run time is an important step for achieving good performance of MPI applications. In this paper, we explore the applicability of C4.5 decision trees to the MPI collective algorithm selection problem. We construct C4.5 decision trees from the measured algorithm performance data and analyze both the decision tree properties and the expected run time performance penalty.

In cases we considered, results show that the C4.5 decision trees can be used to generate a reasonably small and very accurate decision function. For example, the broadcast decision tree with only 21 leaves was able to achieve a mean performance penalty of 2.08%. Similarly, combining experimental data for reduce and broadcast and generating a decision function from the combined decision trees resulted in less than 2.5% relative performance penalty. The results indicate that C4.5 decision trees are applicable to this problem and should be more widely used in this domain.


Decision Tree Message Passing Interface Decision Function Message Size Performance Penalty 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rabenseifner, R.: Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference, pp. 77–85 (1999)Google Scholar
  2. 2.
    Worringen, J.: Pipelining and overlapping for MPI collective operations. In: 28th Annyal IEEE Conference on Local Computer Network, Bonn/Königswinter, Germany, pp. 548–557. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  3. 3.
    Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, Springer, Heidelberg (2004)Google Scholar
  4. 4.
    Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.M.: On optimizing of collective communication. In: Proceedings of IEEE International Conference on Cluster Computing, 145–155 (2004)Google Scholar
  5. 5.
    Bernaschi, M., Iannello, G., Lauria, M.: Efficient implementation of reduce-scatter in MPI. Journal of Systems Architure 49(3), 89–108 (2003)CrossRefGoogle Scholar
  6. 6.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications 19(1), 49–66 (2005)CrossRefGoogle Scholar
  7. 7.
    Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Proceedings of the ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp. 131–140. ACM Press, New York (1999)Google Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California (1993)Google Scholar
  9. 9.
    Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. In: Proceedings of IPDPS 2005 - PMEO-PDS Workshop, p. 272. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  10. 10.
    Fagg, G.E., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pješivac-Grbović, J., London, K., Dongarra, J.: Extending the MPI specification for process fault tolerance on high performance computing systems. In: Proceedings of the International Supercomputer Conference (ISC) 2004, Primeur (2004)Google Scholar
  11. 11.
    Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Angskun, T., Dongarra, J.: Tuned: A flexible high performance collective communication component developed for Open MPI. In: Proccedings of DAPSYS 2006, Innsbruck, Austria, pp. 65–72. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Pješivac-Grbović, J., Fagg, G.E., Angskun, T., Bosilca, G., Dongarra, J.J.: MPI collective algorithm selection and quadtree encoding. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 40–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical Models for Empirical Search-Based Performance Tuning. International Journal of High Performance Computing Applications 18(1), 65–94 (2004)CrossRefGoogle Scholar
  14. 14.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York, NY (1998)zbMATHGoogle Scholar
  15. 15.
    Quinlan, J.R.: C4.5 source code (2006),
  16. 16.
    MPICH-2: Implementation of MPI 2 standard (2005),
  17. 17.
    OCC: Optimized Collective Communication Library (2005),
  18. 18.
    SKaMPI: Special Karlsruher MPI Benchmark (2005),

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jelena Pješivac-Grbović
    • 1
  • George Bosilca
    • 1
  • Graham E. Fagg
    • 1
  • Thara Angskun
    • 1
  • Jack J. Dongarra
    • 1
  1. 1.Innovative Computing Laboratory, The University of Tennessee Computer Science Department, 1122 Volunteer Blvd., Knoxville, TN 37996-3450USA

Personalised recommendations