The Journal of Supercomputing

, Volume 75, Issue 3, pp 1717–1731 | Cite as

Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling

  • Antón ReyEmail author
  • Francisco D. Igual
  • Manuel Prieto-Matías


Task-parallel programming models have alleviated the gap between software and hardware complexity in high-performance computing. However, the developer is still in charge of complex decisions that have a significant impact in the overall efficiency and affect the application development. Specifically, in a context in which a set of heterogeneous and interdependent tasks share resources, there is a complex interplay between different factors such as task granularity, task criticality, problem size, application inter-task and intra-task parallelism and available hardware concurrency. In this paper, we explore the effects of this mix from a static scheduling perspective, by exposing a mixed-integer linear program in which the amount of inter- and intra-task parallelism can be adapted as the execution evolves. We solve a set of instances simulating a dense Cholesky factorization on a 20-core Xeon multiprocessor in a power-constrained scenario targeting makespan and energy minimization. The model reveals performance gains up to 17.9% in terms of performance and 4.1% in terms of energy by discovering a set of high-quality scheduling solutions.


Task scheduling Multiprocessor Threading Linear programming 



This work has been supported by the EU (FEDER) and the Spanish MINECO, under Grants TIN 2015-65277-R and BES-2016-076806.


  1. 1.
    Alonso P, Dolz MF, Mayo R, Quintana-Ortí ES (2014) Modeling power and energy of the task-parallel cholesky factorization on multicore processors. Comput Sci-Res Dev 29(2):105–112CrossRefGoogle Scholar
  2. 2.
    Barreda M, Barrachina MS, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana ES (2013) A framework for power-performance analysis of parallel scientific applications. In: 3rd International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies, pp 114–119Google Scholar
  3. 3.
    Błażewicz Jacek, Machowiak Maciej, Wkeglarz Jan, Kovalyov Mikhail Y, Trystram Denis (2004) Scheduling malleable tasks on parallel processors to minimize the makespan. Ann Oper Res 129(1):65–80MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bosma W, Cannon J, Playoust C (1997) The Magma algebra system. I. The user language. J Symb Comput 24(3–4):235–265 Computational algebra and number theory (London, 1993)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
  6. 6.
    Cochran R, Hankendi C, Coskun AK, Reda S (2011) Pack cap: adaptive DVFS and thread packing under power caps. In: 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 175–185Google Scholar
  7. 7.
    Curtis-Maury M, Dzierwa J, Antonopoulos CD, Nikolopoulos DS (2006) Online strategies for high-performance power-aware thread execution on emerging multiprocessors. In: Proceedings 20th IPDPS, p 8Google Scholar
  8. 8.
    Dolz MF, Igual FD, Ludwig T, Piñuel L, Quintana-Ortí ES (2015) Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the intel xeon phi. CAEE 46:95–111Google Scholar
  9. 9.
    Ben-Itzhak Y et al (2010) Performance and power aware CMP thread allocation modeling. In: Yale N et al (eds) HIPEAC. Springer, Berlin, pp 232–246Google Scholar
  10. 10.
    Inc. Gurobi Optimization. Gurobi optimizer reference manual, 2016Google Scholar
  11. 11.
    Intel. Intel Xeon Processor Scalable Family. Specification Update, February 2018Google Scholar
  12. 12.
    Lawson G, Sosonkina M, Shen Y (2014) Energy evaluation for applications with different thread affinities on the intel xeon phi. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop, pp 54–59Google Scholar
  13. 13.
    Li J, Martinez JF (2005) Power-performance implications of thread-level parallelism on chip multiprocessors. In: IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005, pp 124–134Google Scholar
  14. 14.
    Liu G, Park J, Marculescu D (2013) Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems. In: 2013 IEEE 31st International Conference on Computer Design (ICCD), pp 54–61Google Scholar
  15. 15.
    Perez JM, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE International Conference on Cluster Computing, pp 142–151Google Scholar
  16. 16.
    Rangan KK (2009) Thread motion: fine-grained power management for multi-core systems. SIGARCH Comput Archit News 37(3):302–313CrossRefGoogle Scholar
  17. 17.
    Rey A, Igual F, Prieto MM, Prins J (2017) Performance and scalability study of FMM kernels on novel multi- and many-core architectures. Proc Comput Sci 108:2313–2317CrossRefGoogle Scholar
  18. 18.
    Aater Suleman M, Qureshi MK, Patt YN (2008) Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput Archit News 36(1):277–286CrossRefGoogle Scholar
  19. 19.
    Takouna I, Dawoud W, Meinel C (2011) Accurate mutlicore processor power models for power-aware resource management. In: 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing, pp 419–426Google Scholar
  20. 20.
    Turek J, Wolf JL, Yu PS (1992) Approximate algorithms scheduling parallelizable tasks. In: Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’92, New York, NY, USA. ACM, pp 323–332Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Departmento de Arquitectura de Computadores y AutomáticaUniversidad Complutense de MadridMadridSpain

Personalised recommendations