Co-scheduling algorithms for high-throughput workload execution
- 264 Downloads
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40 % improvement on average over traditional scheduling) and to faster response times (50 % improvement). Hence, co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives.
KeywordsExecution Time Total Execution Time Parallel Task Longe Execution Time Application Execution Time
Anne Benoit and Yves Robert are with the Institut Universitaire de France (IUF). This work was supported in part by the ANR RESCUE project. The research of Padma Raghavan and Manu Shantharam was supported in part by the U.S. National Science Foundation through grants CCF 0963839, 1018881 and 1319448.
- Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., & Zhang, H. (2012). PETSc Web page. http://www.mcs.anl.gov/petsc.
- Bhadauria, M., & McKee, S. A. (2010). An approach to resource-aware co-scheduling for CMPs. In: Proceedings of 24th ACM International Conference on Supercomputing ICS ’10, ACM.Google Scholar
- Blackford, L. S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., & Whaley R. C. (1997). ScaLAPACK User’s Guide. SIAM. Philadelphia, PA, USA.Google Scholar
- Borgesson, L. (1996). Abaqus. In: Coupled thermo-hydro-mechanical processes of fractured media—mathematical and experimental studies, vol. 79. Amsterdam, Elsevier (pp. 565–570).Google Scholar
- Chandra, D., Guo, F., Kim, S., & Solihin, Y. (2005). Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA 11, IEEE, (pp. 340–351). doi: 10.1109/HPCA.2005.27.
- Cormen, T . H., Leiserson, C . E., Rivest, R . L., & Stein, C. (2009). Introduction to algorithms. Cambridge: The MIT Press.Google Scholar
- Deb, R. K., & Serfozo, R. F. (1973). Optimal control of batch service queues. Advances in Applied Probability, 340–361.Google Scholar
- Drozdowski, M. (2003). Scheduling parallel tasks: Algorithms and complexity, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Raton: Chapman/CRC.Google Scholar
- Dutot, P. F. (2003). Scheduling parallel tasks: Approximation algorithms, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Rato: Chapman/CRC.Google Scholar
- Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A guide to the theory of NP-completeness. New York: W.H, Freeman and Co.Google Scholar
- Gordon. (2011). Gordon user guide: Technical summary. http://www.sdsc.edu/us/resources/gordon/
- Hankendi, C., & Coskun, A. (2012). Reducing the energy cost of computing through efficient co-scheduling of parallel workloads. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, (pp. 994–999). doi: 10.1109/DATE.2012.6176641.
- Heroux, M. A., Doerfler, D. W., Crozier, P. S., Willenbring, J. M., Edwards, H. C., Williams, A., Rajan, M., Keiter, E. R., Thornquist, H. K., & Numrich, R. W. (2009). Improving performance via mini-applications. Research Report 5574, Sandia National Laboratories, USA.Google Scholar
- Kamil, S., Shalf, J., & Strohmaier, E. (2008). Power efficiency in high performance computing. In: IPDPS, IEEE.Google Scholar
- Koehler, F., & Khuller, S. (2013). Optimal batch schedules for parallel machines. In: Proceedings of the 13th Annual Algorithms and Data Structures Symposium.Google Scholar
- Koole, G., & Righter, R. (2001). A stochastic batching and scheduling problem. Probability in the Engineering and Informational Sciences, 15(04), 465–479.Google Scholar
- Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). Power-aware MPI task aggregation prediction for high-end computing systems. IPDPS, 10, 1–12.Google Scholar
- Potts, C. N., & Kovalyov, M. Y. (2000). Scheduling with batching: A review. European Journal of Operational Research, 120(2), 228–249.Google Scholar
- Rountree, B., Lownenthal, D. K., de Supinski, B. R., Schulz, M., Freeh, V. W., & Bletsch, T. (2009). Adagio: Making DVS practical for complex HPC applications. ICS, 09, 460–469.Google Scholar
- Scogland, T., Subramaniam, B., & Feng, W. -C. (2011), Emerging trends on the evolving green500: Year three. In: 7th Workshop on High-Performance, Power-Aware Computing, Anchorage, Alaska, USA.Google Scholar
- Turek, J., Schwiegelshohn, U., Wolf, J. L., & Yu, P. S. (1994). Scheduling parallel tasks to minimize average response time. In: Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (pp. 112–121).Google Scholar