Advertisement

Towards Elastic Resource Management

  • Isaías A. Comprés UreñaEmail author
  • Michael Gerndt
Conference paper

Abstract

A new paradigm for HPC Resource Management, called Elastic Computing, is under development at the Invasive Computing Transregional Collaborative Research Center. An extension to MPI for programming elastic applications and a resource manager were implemented. The resource manager is an extension of the SLURM batch scheduler. Resource elasticity allows the resource manager to dictate changes in the resource allocations of running applications based on scheduler decisions. These resource allocation changes are decided by the scheduler based on performance feedback from the applications. The collection of performance feedback from running applications poses unique challenges for the runtime system. In this document, our current performance feedback system is presented.

Keywords

Resource management MPI Performance monitoring 

References

  1. 1.
    Aguilar, X., Fürlinger, K., Laure, E.: MPI trace compression using event flow graphs. In: Euro-Par 2014 Parallel Processing: 20th International Conference, Porto, Portugal, August 25–29, 2014. Proceedings, pp. 1–12. Springer International Publishing (2014).  https://doi.org/10.1007/978-3-319-09873-91
  2. 2.
    Aguilar, X., Fürlinger, K., Laure, E.: Automatic on-line detection of MPI application structure with event flow graphs. In: Euro-Par 2015: Parallel Processing: 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings, pp. 70–81. Springer, Berlin, Heidelberg (2015).  https://doi.org/10.1007/978-3-662-48096-06
  3. 3.
    Aguilar, X., Fürlinger, K., Laure, E.: Visual MPI performance analysis using event flow graphs. Proced. Comput. Sci. 51, 1353 – 1362 (2015).  https://doi.org/10.1016/j.procs.2015.05.322. URL http://www.sciencedirect.com/science/article/pii/S1877050915011308CrossRefGoogle Scholar
  4. 4.
    Aguilar, X., Fürlinger, K., Laure, E.: Event flow graphs for MPI performance monitoring and analysis. In: Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, pp. 103–115. Springer International Publishing, Cham (2016).  https://doi.org/10.1007/978-3-319-39589-08
  5. 5.
    Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988).  https://doi.org/10.1109/32.4634CrossRefGoogle Scholar
  6. 6.
    Coffman, E.G., J., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM J. Comput. 7(1), 1–17 (1978).  https://doi.org/10.1137/0207001
  7. 7.
    Davis, R.I., Burns, A.: A survey of hard real-time scheduling for multiprocessor systems. ACM Comput. Surv. 43(4), 35:1–35:44 (2011).  https://doi.org/10.1145/1978802.1978814CrossRefGoogle Scholar
  8. 8.
    Etsion, Y., Tsafrir, D.: A short survey of commercial cluster batch schedulers. Sch. Comput. Sci. Eng. Hebr. Univ. Jerus. 44221, 2005–13 (2005)Google Scholar
  9. 9.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling—a status report. In: Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing, JSSPP 2004, pp. 1–16. Springer, Berlin, Heidelberg (2005).  https://doi.org/10.1007/114075221
  10. 10.
    Fortnow, L.: The status of the P versus NP problem. Commun. ACM 52(9), 78–86 (2009).  https://doi.org/10.1145/1562164.1562186CrossRefGoogle Scholar
  11. 11.
    Fürlinger, K., Skinner, D.: Capturing and visualizing event flow graphs of MPI applications. In: Euro-Par 2009—Parallel Processing Workshops: HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands, August 25–28, 2009, Revised Selected Papers, pp. 218–227. Springer, Berlin, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14122-526
  12. 12.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)Google Scholar
  13. 13.
    Graham, R., Lawler, E., Lenstra, J., Kan, A.: Optimization and approximation in deterministic sequencing and scheduling: a survey. In: Proceedings of the Advanced Research Institute on Discrete Optimization and Systems Applications, Annals of Discrete Mathematics, vol. 5, pp. 287–326. Elsevier (1979).  https://doi.org/10.1016/S0167-5060(08)70356-XGoogle Scholar
  14. 14.
    Havlak, P.: Nesting of reducible and irreducible loops. ACM Trans. Program. Lang. Syst. 19(4), 557–567 (1997).  https://doi.org/10.1145/262004.262005CrossRefGoogle Scholar
  15. 15.
    Ioannou, N., Kauschke, M., Gries, M., Cintra, M.: Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 131–142 (2011).  https://doi.org/10.1109/PACT.2011.19
  16. 16.
    Jackson, D.B., Snell, Q., Clement, M.J.: Core algorithms of the Maui scheduler. In: Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2001, pp. 87–102. Springer, London, UK (2001). http://dl.acm.org/citation.cfm?id=646382.689682
  17. 17.
    Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, pp. 85–103. Springer US, Boston, MA (1972).  https://doi.org/10.1007/978-1-4684-2001-29
  18. 18.
    Khan, A.A., Mccreary, C.L., Jones, M.S.: A comparison of multiprocessor scheduling heuristics. In: Internatonal Conference on Parallel Processing Vol. 2, vol. 2, pp. 243–250 (1994).  https://doi.org/10.1109/ICPP.1994.19
  19. 19.
    Lawler, E.L., Lenstra, J.K., Kan, A.H.R., Shmoys, D.B.: Chapter 9 sequencing and scheduling: Algorithms and complexity. In: Logistics of Production and Inventory, Handbooks in Operations Research and Management Science, vol. 4, pp. 445 – 522. Elsevier (1993).  https://doi.org/10.1016/S0927-0507(05)80189-6CrossRefGoogle Scholar
  20. 20.
    Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. J. Discret. Algorithms 5(2), 243–249 (2007).  https://doi.org/10.1016/j.jda.2006.03.019. 2004 Symposium on String Processing and Information RetrievalMathSciNetCrossRefGoogle Scholar
  21. 21.
    Lenstra, J., Kan, A.R., Brucker, P.: Complexity of machine scheduling problems. In: Studies in Integer Programming, Annals of Discrete Mathematics, vol. 1, pp. 343–362. Elsevier (1977).  https://doi.org/10.1016/S0167-5060(08)70743-XCrossRefGoogle Scholar
  22. 22.
    Lopes, R.V., Menascé, D.: A taxonomy of job scheduling on distributed computing systems. IEEE Trans. Parallel Distrib. Syst. 27(12), 3412–3428 (2016).  https://doi.org/10.1109/TPDS.2016.2537821CrossRefGoogle Scholar
  23. 23.
    Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001).  https://doi.org/10.1109/71.932708CrossRefGoogle Scholar
  24. 24.
    Ramalingam, G.: Identifying loops in almost linear time. ACM Trans. Program. Lang. Syst. 21(2), 175–188 (1999).  https://doi.org/10.1145/316686.316687CrossRefGoogle Scholar
  25. 25.
    Rotithor, H.G.: Taxonomy of dynamic task scheduling schemes in distributed computing systems. IEE Proc. Comput. Digital Techn. 141(1), 1–10 (1994).  https://doi.org/10.1049/ip-cdt:19949630CrossRefGoogle Scholar
  26. 26.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers, pp. 55–71. Springer, Berlin, Heidelberg (2002).  https://doi.org/10.1007/3-540-36180-44
  27. 27.
    SuperMUC Petascale System (2017). https://www.lrz.de/services/compute/supermuc/. [Online]
  28. 28.
    Tarjan, R.: Testing flow graph reducibility. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 96–107. ACM, New York, NY, USA (1973).  https://doi.org/10.1145/800125.804040
  29. 29.
    Transregional Research Center InvasIC (2017). http://www.invasic.de. [Online]
  30. 30.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995). 10.1007/BF01206331Google Scholar
  31. 31.
    Ullman, J.: Np-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975).  https://doi.org/10.1016/S0022-0000(75)80008-0MathSciNetCrossRefGoogle Scholar
  32. 32.
    Wei, T., Mao, J., Zou, W., Chen, Y.: A new algorithm for identifying loops in decompilation. In: Static Analysis: 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007. Proceedings, pp. 170–183. Springer, Berlin, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74061-211

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Technical University of Munich (TUM)MünchenGermany

Personalised recommendations