Journal of Scheduling

, Volume 19, Issue 6, pp 627–640 | Cite as

Co-scheduling algorithms for high-throughput workload execution

  • Guillaume Aupy
  • Manu Shantharam
  • Anne Benoit
  • Yves Robert
  • Padma Raghavan


This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40 % improvement on average over traditional scheduling) and to faster response times (50 % improvement). Hence, co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives.


Execution Time Total Execution Time Parallel Task Longe Execution Time Application Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Anne Benoit and Yves Robert are with the Institut Universitaire de France (IUF). This work was supported in part by the ANR RESCUE project. The research of Padma Raghavan and Manu Shantharam was supported in part by the U.S. National Science Foundation through grants CCF 0963839, 1018881 and 1319448.


  1. Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., & Zhang, H. (2012). PETSc Web page.
  2. Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Rupp, K., Smith, B. F., & Zhang, H. (2014). PETSc Web page.,
  3. Bhadauria, M., & McKee, S. A. (2010). An approach to resource-aware co-scheduling for CMPs. In: Proceedings of 24th ACM International Conference on Supercomputing ICS ’10, ACM.Google Scholar
  4. Blackford, L. S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., & Whaley R. C. (1997). ScaLAPACK User’s Guide. SIAM. Philadelphia, PA, USA.Google Scholar
  5. Borgesson, L. (1996). Abaqus. In: Coupled thermo-hydro-mechanical processes of fractured media—mathematical and experimental studies, vol. 79. Amsterdam, Elsevier (pp. 565–570).Google Scholar
  6. Brucker, P., Gladky, A., Hoogeveen, H., Kovalyov, M. Y., Potts, C., Tautenhahn, T., et al. (1998). Scheduling a batching machine. Journal of Scheduling, 1, 31–54.CrossRefGoogle Scholar
  7. Chandra, D., Guo, F., Kim, S., & Solihin, Y. (2005). Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA 11, IEEE, (pp. 340–351). doi: 10.1109/HPCA.2005.27.
  8. Coffman, E. G, Jr, Garey, M. R., Johnson, D. S., & Tarjan, R. E. (1980). Performance bounds for level-oriented two-dimensional packing algorithms. SIAM Journal on Computing, 9(4), 808–826.CrossRefGoogle Scholar
  9. Cormen, T . H., Leiserson, C . E., Rivest, R . L., & Stein, C. (2009). Introduction to algorithms. Cambridge: The MIT Press.Google Scholar
  10. Deb, R. K., & Serfozo, R. F. (1973). Optimal control of batch service queues. Advances in Applied Probability, 340–361.Google Scholar
  11. Drozdowski, M. (2003). Scheduling parallel tasks: Algorithms and complexity, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Raton: Chapman/CRC.Google Scholar
  12. Dutot, P. F. (2003). Scheduling parallel tasks: Approximation algorithms, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Rato: Chapman/CRC.Google Scholar
  13. Frachtenberg, E., Feitelson, D., Petrini, F., & Fernandez, J. (2005). Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11), 1066–1077. doi: 10.1109/TPDS.2005.130.CrossRefGoogle Scholar
  14. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A guide to the theory of NP-completeness. New York: W.H, Freeman and Co.Google Scholar
  15. Gordon. (2011). Gordon user guide: Technical summary.
  16. Hankendi, C., & Coskun, A. (2012). Reducing the energy cost of computing through efficient co-scheduling of parallel workloads. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, (pp. 994–999). doi: 10.1109/DATE.2012.6176641.
  17. Heroux, M. A., Doerfler, D. W., Crozier, P. S., Willenbring, J. M., Edwards, H. C., Williams, A., Rajan, M., Keiter, E. R., Thornquist, H. K., & Numrich, R. W. (2009). Improving performance via mini-applications. Research Report 5574, Sandia National Laboratories, USA.Google Scholar
  18. Ikura, Y., & Gimple, M. (1986). Efficient scheduling algorithms for a single batch processing machine. Operations Research Letters, 5(2), 61–65.CrossRefGoogle Scholar
  19. Kamil, S., Shalf, J., & Strohmaier, E. (2008). Power efficiency in high performance computing. In: IPDPS, IEEE.Google Scholar
  20. Koehler, F., & Khuller, S. (2013). Optimal batch schedules for parallel machines. In: Proceedings of the 13th Annual Algorithms and Data Structures Symposium.Google Scholar
  21. Koole, G., & Righter, R. (2001). A stochastic batching and scheduling problem. Probability in the Engineering and Informational Sciences, 15(04), 465–479.Google Scholar
  22. Kresse, G., & Hafner, J. (1993). Ab initio molecular dynamics for liquid metals. Physical Review B, 47(1), 558–561.CrossRefGoogle Scholar
  23. Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). Power-aware MPI task aggregation prediction for high-end computing systems. IPDPS, 10, 1–12.Google Scholar
  24. Lodi, A., Martello, S., & Monaci, M. (2002). Two-dimensional packing problems: A survey. European Journal of Operational Research, 141(2), 241–252.CrossRefGoogle Scholar
  25. Muthuvelu, N., Chai, I., Chikkannan, E., & Buyya, R. (2011). Batch resizing policies and techniques for fine-grain grid tasks: The nuts and bolts. Journal of Information Processing Systems, 7(2), 299–320.CrossRefGoogle Scholar
  26. Plimpton, S. (1995). Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 117, 1–19.CrossRefGoogle Scholar
  27. Potts, C. N., & Kovalyov, M. Y. (2000). Scheduling with batching: A review. European Journal of Operational Research, 120(2), 228–249.Google Scholar
  28. Rountree, B., Lownenthal, D. K., de Supinski, B. R., Schulz, M., Freeh, V. W., & Bletsch, T. (2009). Adagio: Making DVS practical for complex HPC applications. ICS, 09, 460–469.Google Scholar
  29. Scogland, T., Subramaniam, B., & Feng, W. -C. (2011), Emerging trends on the evolving green500: Year three. In: 7th Workshop on High-Performance, Power-Aware Computing, Anchorage, Alaska, USA.Google Scholar
  30. Shantharam, M., Youn, Y., & Raghavan, P. (2013). Speedup-aware co-schedules for efficient workload management. Parallel Processing Letters, 23(2), 1340001.CrossRefGoogle Scholar
  31. Turek, J., Schwiegelshohn, U., Wolf, J. L., & Yu, P. S. (1994). Scheduling parallel tasks to minimize average response time. In: Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (pp. 112–121).Google Scholar

Copyright information

© Springer Science+Business Media New York (outside the USA) 2015

Authors and Affiliations

  • Guillaume Aupy
    • 1
  • Manu Shantharam
    • 3
  • Anne Benoit
    • 1
  • Yves Robert
    • 1
    • 2
  • Padma Raghavan
    • 4
  1. 1.LIP, ENS LyonLyonFrance
  2. 2.University of TennesseeKnoxvilleUSA
  3. 3.San Diego Supercomputer CenterLa JollaUSA
  4. 4.Pennsylvania State UniversityState CollegeUSA

Personalised recommendations