Toward Better Simulation of MPI Applications on Ethernet/TCP Networks

  • Paul Bédaride
  • Augustin Degomme
  • Stéphane Genaud
  • Arnaud Legrand
  • George S. Markomanolis
  • Martin Quinson
  • Mark StillwellEmail author
  • Frédéric Suter
  • Brice Videau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


Simulation and modeling for performance prediction and profiling is essential for developing and maintaining HPC code that is expected to scale for next-generation exascale systems, and correctly modeling network behavior is essential for creating realistic simulations. In this article we describe an implementation of a flow-based hybrid network model that accounts for factors such as network topology and contention, which are commonly ignored by other approaches. We focus on large-scale, Ethernet-connected systems, as these currently compose 37.8 % of the TOP500 index, and this share is expected to increase as higher-speed 10 and 100GbE become more available. The European Mont-Blanc project, which studies exascale computing by developing prototype systems with low-power embedded devices, uses Ethernet-based interconnect. Our model is implemented within SMPI, an open-source MPI implementation that connects real applications to the SimGrid simulation framework. SMPI provides implementations of collective communications based on current versions of both OpenMPI and MPICH. SMPI and SimGrid also provide methods for easing the simulation of large-scale systems, including shadow execution, memory folding, and support for both online and offline (i.e., post-mortem) simulation. We validate our proposed model by comparing traces produced by SMPI with those from real world experiments, as well as with those obtained using other established network models. Our study shows that SMPI has a consistently better predictive power than classical LogP-based models for a wide range of scenarios including both established HPC benchmarks and real applications.


Message Size Graphene Cluster Collective Communication Collective Operation Network Contention 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    TOP500 supercomputer sites.
  2. 2.
    Mont-Blanc: European Approach Towards Energy Efficient High Performance: Montblanc.
  3. 3.
    Penoff, B., Wagner, A., Tüxen, M., Rüngeler, I.: MPI-NeTSim: A network simulation module for MPI. In: Proc. of the 15th IEEE Intl. Conference on Parallel and Distributed Systems, Shenzen, China (December 2009)Google Scholar
  4. 4.
    Zheng, G., Kakulapati, G., Kale, L.: BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines. In: Proc. of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, NM (April 2004)Google Scholar
  5. 5.
    Lucio, G.F., Paredes-farrera, M., Jammeh, E., Fleury, M., Reed, M.J.: Opnet modeler and ns-2: Comparing the accuracy of network simulators for packet-level analysis using a network testbed. In: Proc. of the 3rd WEAS International Conference on Simulation, Modelling and Optimization, ICOSMO, pp. 700–707 (2003)Google Scholar
  6. 6.
    Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., von Eicken, T.: LogP: Towards a Realistic Model of Parallel Computation. In: Proc. of the fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), San Diego, CA, pp. 1–12 (1993)Google Scholar
  7. 7.
    Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: Incorporating Long Messages Into the LogP Model - One Step Closer Towards a Realistic Model for Parallel Computation. In: Proc. of the 7th ACM Symp. on Parallel Algorithms and Architectures (SPAA), Santa Barbara, CA, pp. 95–105 (1995)Google Scholar
  8. 8.
    Kielmann, T., Bal, H.E., Verstoep, K.: Fast Measurement of LogP Parameters for Message Passing Platforms. In: Rolim, J.D.P. (ed.) IPDPS-WS 2000. LNCS, vol. 1800, pp. 1176–1183. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: a Parallel Computational Model for Synchronization Analysis. In: Proc. of the eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), Snowbird, UT, pp. 133–142 (2001)Google Scholar
  10. 10.
    Velho, P., Schnorr, L., Casanova, H., Legrand, A.: On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations. ACM Transactions on Modeling and Computer Simulation 23(4), 23 (2013)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Clauss, P.N., Stillwell, M., Genaud, S., Suter, F., Casanova, H., Quinson, M.: Single Node On-Line Simulation of MPI Applications with SMPI. In: Proc. of the 25th IEEE Intl. Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK (May 2011)Google Scholar
  12. 12.
    Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In: Proc. of the 10th IEEE International Conference on Computer Modeling and Simulation, Cambridge, UK (March 2008)Google Scholar
  13. 13.
    Desprez, F., Markomanolis, G.S., Suter, F.: Improving the Accuracy and Efficiency of Time-Independent Trace Replay. In: Proc. of the 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Salt Lake City, UT (November 2012)Google Scholar
  14. 14.
    Donassolo, B., Casanova, H., Legrand, A., Velho, P.: Fast and Scalable Simulation of Volunteer Computing Systems Using SimGrid. In: Proc. of the Workshop on Large-Scale System and Application Performance (LSAP), Chicago, IL (June 2010)Google Scholar
  15. 15.
    Quinson, M., Rosa, C., Thiéry, C.: Parallel simulation of peer-to-peer systems. In: Proceedings of the 12th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012). IEEE Computer Society Press (May 2012)Google Scholar
  16. 16.
    Badia, R.M., Labarta, J., Giménez, J., Escalé, F.: Dimemas: Predicting MPI Applications Behaviour in Grid Environments. In: Proc. of the Workshop on Grid Applications and Programming Tools (June 2003)Google Scholar
  17. 17.
    Hoefler, T., Siebert, C., Lumsdaine, A.: LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model. In: Proc. of the ACM Workshop on Large-Scale System and Application Performance, Chicago, IL, pp. 597–604 (June 2010)Google Scholar
  18. 18.
    Tikir, M.M., Laurenzano, M.A., Carrington, L., Snavely, A.: PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 135–148. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  19. 19.
    Núñez, A., Fernández, J., Garcia, J.D., Garcia, F., Carretero, J.: New Techniques for Simulating High Performance MPI Applications on Large Storage Networks. Journal of Supercomputing 51(1), 40–57 (2010)CrossRefGoogle Scholar
  20. 20.
    Zhai, J., Chen, W., Zheng, W.: PHANTOM: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 305–314 (January 2010)Google Scholar
  21. 21.
    Hermanns, M.A., Geimer, M., Wolf, F., Wylie, B.: Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications. In: Proc. of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, Weimar, Germany, pp. 78–84 (February 2009)Google Scholar
  22. 22.
    Wu, X., Mueller, F.: ScalaExtrap: trace-based communication extrapolation for SPMD programs. In: Proc. of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), pp. 113–122 (2011)Google Scholar
  23. 23.
    Carrington, L., Laurenzano, M., Tiwari, A.: Inferring large-scale computation behavior via trace extrapolation. In: Large-Scale Parallel Processing Workshop (IPDPS 2013) (2013)Google Scholar
  24. 24.
    Dickens, P., Heidelberger, P., Nicol, D.: Parallelized Direct Execution Simulation of Message-Passing Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 7(10), 1090–1105 (1996)CrossRefGoogle Scholar
  25. 25.
    Bagrodia, R., Deelman, E., Phan, T.: Parallel Simulation of Large-Scale Parallel Applications. International Journal of High Performance Computing and Applications 15(1), 3–12 (2001)CrossRefGoogle Scholar
  26. 26.
    Riesen, R.: A Hybrid MPI Simulator. In: Proc. of the IEEE International Conference on Cluster Computing, Barcelona, Spain (September 2006)Google Scholar
  27. 27.
    Technical specification of the network interconnect in the graphene cluster of grid’5000.
  28. 28.
    Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lantéri, S., Leduc, J., Melab, N., Namyst, R., Mornet, G., Primet, P., Quetier, B., Richard, O., Talbi, E.G., Touche, I.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications 20(4), 481–494 (2006)Google Scholar
  29. 29.
    Hong, B., Prasanna, V.K.: Adaptive Allocation of Independent Tasks to Maximize Throughput. IEEE Transactions on Parallel and Distributed Systems 18(10), 1420–1435 (2007)CrossRefGoogle Scholar
  30. 30.
    Bobelin, L., Legrand, A., Márquez, D.A.G., Navarro, P., Quinson, M., Suter, F., Thiery, C.: Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation. In: Proc. of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, pp. 220–227 (May 2012)Google Scholar
  31. 31.
    Faraj, A., Yuan, X., Lowenthal, D.: STAR-MPI: self tuned adaptive routines for MPI collective operations. In: Proc. of the 20th Annual International Conference on Supercomputing, ICS 2006, pp. 199–208. ACM, New York (2006)Google Scholar
  32. 32.
    Shende, S., Malony, A.D.: The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)CrossRefGoogle Scholar
  33. 33.
    Bedaride, P., Genaud, S., Degomme, A., Legrand, A., Markomanolis, G., Quinson, M., Stillwell, Mark, L., Suter, F., Videau, B.: Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support. Rapport de recherche RR-8300, INRIA (May 2013)Google Scholar
  34. 34.
    Chen, Y., Griffith, R., Liu, J., Katz, R.H., Joseph, A.D.: Understanding tcp incast throughput collapse in datacenter networks. In: Proc. of the 1st ACM Workshop on Research on Enterprise Networking, WREN 2009, pp. 73–82. ACM (2009)Google Scholar
  35. 35.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. International Journal of High Performance Computer Applications 19(1), 49–66 (2005)CrossRefGoogle Scholar
  36. 36.
    Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: Past, present, and future. concurrency and computation: Practice and experience. Concurrency and Computation: Practice and Experience 15 (2003)Google Scholar
  37. 37.
    Baker, R.S., Koch, K.R.: An \(s_n\) algorithm for the massively parallel CM-200 computer. Nuclear Science and Engineering 128(3), 312–320 (1998).
  38. 38.
    Genovese, L., Neelov, A., Goedecker, S., Deutsch, T., Ghasemi, S.A., Willand, A., Caliste, D., Zilberberg, O., Rayson, M., Bergman, A., Schneider, R.: Daubechies Wavelets as a Basis Set for Density Functional Pseudopotential Calculations. Journal of Chemical Physics 129, 014109 (2008)CrossRefGoogle Scholar
  39. 39.
    Peter, D., Komatitsch, D., Luo, Y., Martin, R., Le Goff, N., Casarotti, E., Le Loher, P., Magnoni, F., Liu, Q., Blitz, C., Nissen-Meyer, T., Basini, P., Tromp, J.: Forward and Adjoint Simulations of Seismic Wave Propagation on Fully Unstructured Hexahedral Meshes. Geophysical Journal International 186(2), 721–739 (2011)CrossRefGoogle Scholar
  40. 40.
  41. 41.
    Rajovic, N., Puzovic, N., Vilanova, L., Villavieja, C., Ramirez, A.: The low-power architecture approach towards exascale computing. In: Proc. of the Second Workshop on Scalable Algorithms for Large-Scale Systems, ScalA 2011. ACM (2011)Google Scholar
  42. 42.
    Barcelona Supercomputer Center: Extrae.
  43. 43.
    Minkenberg, C., Rodriguez, G.: Trace-Driven Co-Simulation of High-Performance Computing Systems Using OMNeT++. In: Proc. of the 2nd International Conference on Simulation Tools and Techniques (SimuTools), Rome, Italy (2009)Google Scholar
  44. 44.
    Mubarak, M., Carothers, C.D., Ross, R., Carns, P.: Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In: High Performance Computing, Networking Storage and Analysis, SC Companion, pp. 366–376 (2012)Google Scholar
  45. 45.
    Grove, D.A., Coddington, P.D.: Communication benchmarking and performance modelling of mpi programs on cluster computers. Journal of Supercomputing 34(2), 201–217 (2005)CrossRefGoogle Scholar
  46. 46.
    Companion of the PMBS’13 publication on SMPI. Hosted on Figshare., Online version of this article with access to the experimental data and scripts (in the org source)

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Paul Bédaride
    • 1
  • Augustin Degomme
    • 2
  • Stéphane Genaud
    • 3
  • Arnaud Legrand
    • 2
  • George S. Markomanolis
    • 4
  • Martin Quinson
    • 1
  • Mark Stillwell
    • 5
    Email author
  • Frédéric Suter
    • 6
  • Brice Videau
    • 2
  1. 1.Loria/INRIA/University of NancyNancyFrance
  2. 2.CNRS/INRIA/University of GrenobleGrenobleFrance
  3. 3.University of StrasbourgStrasbourgFrance
  4. 4.INRIA, LIP, ENS LyonLyonFrance
  5. 5.School of EngineeringCranfield UniversityBedfordUK
  6. 6.IN2P3 Computing Center, CNRSLyon-VilleurbanneFrance

Personalised recommendations