Cluster Computing

, Volume 22, Supplement 1, pp 2407–2423 | Cite as

Whole procedure heterogeneous multiprocessors low-power optimization at algorithm-level

  • Zhuowei Wang
  • Naixue XiongEmail author
  • Hao Wang
  • Lianglun Cheng
  • Wuqing Zhao


Power consumption reduction is the primary problem for the design and implementation of heterogeneous parallel systems. As it is difficult to make progress in the low-power optimization in the hardware layer to meet the increasing need for power optimization, more attention has been paid to low-power optimization in the hardware layer. The relationship between the execution time and dynamic power consumption of programs divided between homogeneous and heterogeneous computing sections is analysed. In addition, the communication power consumption for data transmission and dynamic multi-task allocation are described. Afterwards, this study establishes a power model for the whole procedure of heterogeneous parallel systems. By using this model, a selection algorithm is designed for the optimal frequency of processors with optimal power consumption under time constraints, optimal descent-based time allocation algorithms in multiple computing sections, and profiling dynamic analysis-based integral linear programming at algorithm-level, separately. Finally, the validity of the power optimization algorithm is ascertained using typical applications.


Whole procedure Heterogeneous parallel systems Algorithm-level Low-power optimization 



This work was sponsored by National Natural Science Foundation of China (Grant Numbers 61300029, 61672168).


  1. 1.
    Holmbacka, S., Keller, J., Eitschberger, P., Lilius, J.: Accurate energy modelling for many-core static schedules. In: Euromicro International Conference on Parallel Distributed and Network-Based Processing, Turku, pp. 525–532 (2015)Google Scholar
  2. 2.
    Zhao, Y., Li, X., Ju, L., Zong, Z.: Dependency-based energy-efficient scheduling for homogeneous muli-core clusters. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Melbourne, 1299–1306 (2013)Google Scholar
  3. 3.
    Jayaseelan, R.: Application-Specific Thermal Management of Computer Systems. National University of Singapore, Singapore (2014)Google Scholar
  4. 4.
    Coskun, A.K.: Efficient Thermal Management for Multiprocessor Systems. University of California, San Diego (2014)Google Scholar
  5. 5.
    Tabik, S., Villegas, A., Zapata, E.L., Romero, L.F.: Optimal tilt and orientation maps: a multi-algorithm approach for heterogeneous multicore-GPU system. J. Supercomput. 66(1), 135–147 (2013)CrossRefGoogle Scholar
  6. 6.
    Qin, X., Mishra, P.: TECS: temperature-and energy-constrained scheduling for multicore systems. IEEE Trans. Parallel Distrib. Syst. 26(3), 868–877 (2015)CrossRefGoogle Scholar
  7. 7.
    Sreraman, N., Govindarajan, A.: Vectorizing compiler for multimedia extensions. Int. J. Parallel Prog. 28(4), 363–400 (2012)CrossRefGoogle Scholar
  8. 8.
    Andreas, K., Sylvain, L.: Compilation techniques for multimedia processors. Int. J. Parallel Prog. 28(4), 347–361 (2012)Google Scholar
  9. 9.
    Rychly, M., Skoda, P., Smrz, P.: Heterogeneity-aware scheduler for stream processing frameworks. Int. J. Big Data Intell. 2(2), 70–80 (2015)CrossRefGoogle Scholar
  10. 10.
    Mao, L., Qi, D.Y., Lin, W.W., et al.: An energy-efficient resource scheduling algorithm for cloud computing based on resource equivalence optimization. Int. J. Grid High Perform. Comput. 8(2), 43–57 (2016)CrossRefGoogle Scholar
  11. 11.
    Tiwari, V., Malik, S., Wolfe, A., et al.: Instruction level power analysis and optimization of software. In: Proceedings of Ninth International Conference on VLSI Design, pp. 326–328 (1996)Google Scholar
  12. 12.
    Chang, F., Farkas, K., Ranganathan, P.: Energy-driven statistical profiling: detecting software hotspots. In: Workshop on Power-Aware Computer Systems (2012)Google Scholar
  13. 13.
    Landman, P.: High-level power estimation. In: Proceedings of the International Symposium on Low Power Electronics and Design, Piscataway, NJ, pp. 29–35 (1996)Google Scholar
  14. 14.
    Brooks, D., Tiwari, V., Martonosi, M. Wattch: a framework for architectural-level power analysis and optimizations. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, New York, pp. 83–94 (2010)Google Scholar
  15. 15.
    Chen, J., Dubois, M., Stenstrom, P.: Integrating complete-system and user-level performance/power simulators: the SimWattch approach. In: IEEE International Symposium, In Performance Analysis of Systems and Software, pp. 1–10 (2013)Google Scholar
  16. 16.
    Che, S., Boyer, M., Meng, J., et al. Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of 2013 IEEE International Symposium on Workload Characterization, pp. 44–54 (2013)Google Scholar
  17. 17.
    University of Illinois. Parboil Benchmark suite.
  18. 18.
    Luk, C.-K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, pp. 45–55 (2012)Google Scholar
  19. 19.
    Yang, C., Wang, F., Du, Y., et al.: Adaptive optimization for petascale heterogeneous CPU/GPU Computing. In: Proceedings of IEEE International Conference on Cluster Computing, Los Alamitos, CA, pp. 19–28 (2012)Google Scholar
  20. 20.
    Lorch, J.R.: Operating Systems Techniques for Reducing Processor Energy Consumption. University of California, Berkeley (2014)Google Scholar
  21. 21.
    Weiser, M., Welch, B., Demers, A., Shenker, S.: Scheduling for reduced CPU energy. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation. USENIX Association, Monterey, CA, vol. 353, pp. 13–23 (2014)Google Scholar
  22. 22.
    Govil, K., Chan, E., Wasserman, H.: Comparing algorithms for dynamic speed-setting of a low-power CPU. In: Proceedings of the 11st Annual International Conference on Mobile Computing and Networking, Berkeley, CA, pp. 13–25 (2013)Google Scholar
  23. 23.
    Lorch, J.R., Smith, A.J.: Improving dynamic voltage scaling algorithms with PACE. Joint International Conference on Measurement and Modeling of Computer Systems, Cambridge, MA 29(1), 50–61 (2014)Google Scholar
  24. 24.
    Seng, J.S., Tullsen, D.M.: The effect of compiler optimizations on pentium 4 power consumption. In: Annual Workshop on Interaction between Compilers and Computer Architecture, pp. 51–56 (2013)Google Scholar
  25. 25.
    Zhu, Y., Magklis, G., Scott, M.L.: The energy impact of aggressive loop fusion. In: Proceedings of the 13 the International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, pp. 153–164 (2014)Google Scholar
  26. 26.
    Kandemir, M., Vijaykrishnan, N., Irwin, M.: Influence of compiler optimizations on systems power. In: Proceedings of the 37th Annual Design Automation Conference, New York, pp. 304–307 (2013)Google Scholar
  27. 27.
    Vidyarthi, D.P., Singh, S.K.: Independent tasks scheduling using parallel PSO in multiprocessor systems. Int. J. Grid High Perform. Comput. 7(2), 1–17 (2015)CrossRefGoogle Scholar
  28. 28.
    Hsu, C.H., Slagter, K.D., Chen, S.C., et al.: Optimizing energy consumption with task consolidation in clouds. Inf. Sci. 258(3), 452–462 (2014)CrossRefGoogle Scholar
  29. 29.
    Zong, Z., Manzanares, A., Ruan, X., Qin, X.: Ead and PEBE: two energy-aware duplication scheduling algorithms for parallel tasks on homogeneous clusters. IEEE Trans. Comput. 3, 360–374 (2011)CrossRefzbMATHGoogle Scholar
  30. 30.
    La Fratta, P.A., Kogge, P.M.: Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation. Parallel Distrib. Comput. 73(12), 1551–1562 (2013)CrossRefGoogle Scholar
  31. 31.
    Singh, J., Betha, S., Mangipudi, B., Auluck, N.: Contention aware energy efficient scheduling on heterogeneous multiprocessors. IEEE Trans. Parallel Distrib. Syst. 26(5), 1251–1264 (2015)CrossRefGoogle Scholar
  32. 32.
    Chiesi, M., Vanzolini, L., Mucci, C.: Power-aware job scheduling on heterogeneous multicore architectures. IEEE Trans. Parallel Distrib. Syst. 26(3), 868–876 (2015)CrossRefGoogle Scholar
  33. 33.
    Hunold, S., Rauber, T., Suter, F.: Redistribution aware two-step scheduling for mixed Cparallel applications. In: IEEE International Conference on Cluster Computing, pp. 50–58 (2013)Google Scholar
  34. 34.
    Dutot, P.-F., Takpe, T., Suter, F.: Scheduling parallel task graphs on homogeneous multicluster platforms. IEEE Trans. Parallel Distrib. Syst. 20, 940–952 (2011)CrossRefGoogle Scholar
  35. 35.
    Fu, Z.G., Sun, C.S., Luo, Z.Y.: A task scheduling algorithm of real-time leakage power and temperature optimization. In: Proceedings of the Computer Aided Design and Computer Graophics, Yellow Mountain City, China, pp. 484–491 (2013)Google Scholar
  36. 36.
    Rai, D., Yang, H., Bacivarov, I., et al.: Worst-case temperature analysis for real-time systems. In: Proceedings of Design, Automation Test in Europe, Grenoble, France, pp. 631–636 (2011)Google Scholar
  37. 37.
    Yang, C.Y., Thiele, L., Kuo, T.W.: Energy-efficient real-time task scheduling with temperature-dependent leakage. In: Proceedings of the Design, Automation Test in Europe, Dresden, Germany, pp. 9–14 (2010)Google Scholar
  38. 38.
    Li, K.: Energy efficient scheduling of parallel tasks on multiprocessors computers. J. Supercomput. (2010). Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Zhuowei Wang
    • 1
  • Naixue Xiong
    • 2
    Email author
  • Hao Wang
    • 3
  • Lianglun Cheng
    • 1
  • Wuqing Zhao
    • 4
  1. 1.School of Computer Science and TechnologyGuangdong University of TechnologyGuangzhouChina
  2. 2.Department of Computer ScienceGeorgia State UniversityAtlantaUSA
  3. 3.Department of ICT and Natural SciencesNorwegian University of Science and TechnologyTrondheimNorway
  4. 4.China Southern Power GridGuangzhouChina

Personalised recommendations