Multicore Power and Thermal Proxies Using Least-Angle Regression

  • Rupesh Raj Karn
  • Ibrahim (Abe) M. ElfadelEmail author


The use of performance counters (PCs) to develop per-core power and thermal proxies for multicore processors is now well established. These proxies are typically obtained using traditional linear regression techniques. These techniques have the disadvantage of requiring the full PC set regardless of the workload run by the multicore processor. Typically a computationally expensive principal component analysis is conducted to find the PCs most correlated with each workload. In this chapter, we use the more recent algorithm of least-angle regression to efficiently develop power and thermal proxies that include only PCs most relevant to the workload. Such PCs are considered workload signatures in the PC space and used to categorize the workload and to trigger specific power and thermal management action. Also, the workload signatures at both the core and the thread level are used to decide thread migration policies to maximize per-core utilization and reduce the number of active cores. Our new power and thermal proxies are trained and tested on workloads from the PARSEC and SPEC CPU 2006 benchmarks with an average error of less than 3%. Power, thermal, and performance-aware autoscaling policies are presented, and extensive numerical experiments are used to illustrate the advantages of our algorithm for real-time multicore power and performance management.



The authors would like to acknowledge very helpful discussions with Andrew Henroid from Intel, and with Pradip Bose, Alper Buyuktosunoglu, Canturk Isci, Prabhakar Kudva, and Charles Lefurgy from IBM. This work was supported by SRC under Contract 2011-TJ- 2192 with customized funding from Mubadala, Abu Dhabi, UAE.


  1. 1.
    J.S. Lee, K. Skadron, S.W. Chung, Predictive temperature-aware DVFs. IEEE Trans. Comput. 59(1), 127–133 (2010)MathSciNetCrossRefGoogle Scholar
  2. 2.
    R. Kalla, B. Sinharoy, W.J. Starke, M. Floyd, Power7: IBM’s next-generation server processor. IEEE Micro 30(2), 7–15 (2010)CrossRefGoogle Scholar
  3. 3.
    M. Floyd, M. Allen-Ware, K. Rajamani, B. Brock, C. Lefurgy, A.J. Drake, L. Pesantez, T. Gloekler, J.A. Tierno, P. Bose et al., Introducing the adaptive energy management features of the power7 chip. IEEE Micro 31(2), 60–75 (2011)CrossRefGoogle Scholar
  4. 4.
    K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, G.D. Peterson, Power aware computing on GPUs, in 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC) (IEEE, Piscataway, 2012), pp. 64–73Google Scholar
  5. 5.
    A. Sîrbu, O. Babaoglu, Predicting system-level power for a hybrid supercomputer (2016). Preprint. arXiv:1605.09530Google Scholar
  6. 6.
    M. Yasin, A. Shahrour, I.M. Elfadel, Unified, ultra compact, quadratic power proxies for multi-core processors, in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014 (IEEE, Piscataway, 2014), pp. 1–4Google Scholar
  7. 7.
    C.-B. Cho, T. Li, Using wavelet domain workload execution characteristics to improve accuracy, scalability and robustness in program phase analysis, in IEEE International Symposium on Performance Analysis of Systems & Software, 2007. ISPASS 2007 (IEEE, Piscataway, 2007), pp. 136–145Google Scholar
  8. 8.
    R. Sarikaya, C. Isci, A. Buyuktosunoglu, Runtime application behavior prediction using a statistical metric model. IEEE Trans. Comput. 62(3), 575–588 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    B. Efron, T. Hastie, I. Johnstone, R. Tibshirani et al., Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRefGoogle Scholar
  10. 10.
    R.R. Karn, I.M. Elfadel, Extraction of thermal workload signatures in multicore processors using least angle regression, in 2015 International Conference on Communications, Signal Processing, and Their Applications (ICCSPA’15), Feb 2015, pp. 1–5Google Scholar
  11. 11.
    R.R. Karn, I.M. Elfadel, Multicore power proxies using least-angle regression, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pp. 2872–2875Google Scholar
  12. 12.
    R.R. Karn, I.M. Elfadel, Autoscaling of cores in multicore processors using power and thermal workload signatures, in IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS 2016), Oct 2016, pp. 1–4Google Scholar
  13. 13.
    T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, R. Tibshirani, The Elements of Statistical Learning, vol. 2, no. 1 (Springer, Berlin, 2009)CrossRefGoogle Scholar
  14. 14.
    J. Demmel, A. Gearhart, Instrumenting linear algebra energy consumption via on-chip energy counters. UC at Berkeley, Tech. Rep. UCB/EECS-2012-168 (2012)Google Scholar
  15. 15.
    Intel PCM performance counter monitor description, Accessed 30 March 2015
  16. 16.
    M. Shafique, S. Garg, J. Henkel, D. Marculescu, The EDA challenges in the dark silicon era: temperature, reliability, and variability perspectives, in Proceedings of the 51st Annual Design Automation Conference (ACM, New York, 2014), pp. 1–6Google Scholar
  17. 17.
    J. Henkel, H. Khdr, S. Pagani, M. Shafique, New trends in dark silicon, in Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE (IEEE, Piscataway, 2015), pp. 1–6Google Scholar
  18. 18.
    M. Shafique, D. Gnad, S. Garg, J. Henkel, Variability-aware dark silicon management in on-chip many-core systems, in Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium (2015), pp. 387–392Google Scholar
  19. 19.
    H. Khdr, S. Pagani, M. Shafique, J. Henkel, Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips, in Proceedings of the 52nd Annual Design Automation Conference (ACM, New York, 2015), p. 179Google Scholar
  20. 20.
    I. Takouna, W. Dawoud, C. Meinel, Accurate multicore processor power models for power-aware resource management, in 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC) (IEEE, Piscataway, 2011), pp. 419–426Google Scholar
  21. 21.
    V.M. Weaver, Linux perf event features and overhead, in The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, 2013, p. 80Google Scholar
  22. 22.
    H. Zhao, A. Sharifi, S. Srikantaiah, M. Kandemir, Feedback control based cache reliability enhancement for emerging multicores, in 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (IEEE, Piscataway, 2011), pp. 56–62Google Scholar
  23. 23.
    E. Seo, J. Jeong, S. Park, J. Lee, Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans. Parallel Distrib. Syst. 19(11), 1540–1552 (2008)CrossRefGoogle Scholar
  24. 24.
    X. Guerin, W. Tan, Y. Liu, S. Seelam, P. Dube, Evaluation of multi-core scalability bottlenecks in enterprise java workloads, in 2012 IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS) (IEEE, Piscataway, 2012), pp. 308–317Google Scholar
  25. 25.
    K.K. Pusukuri, R. Gupta, L.N. Bhuyan, Thread reinforcer: dynamically determining number of threads via OS level monitoring, in IEEE International Symposium on Workload Characterization (IISWC), November 2011, pp. 116–125Google Scholar
  26. 26.
    C. Bienia, S. Kumar, J.P. Singh, K. Li, The parsec benchmark suite: characterization and architectural implications, in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (ACM, New York, 2008), pp. 72–81Google Scholar
  27. 27.
    J.L. Henning, Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringKhalifa University of Science and TechnologyAbu DhabiUnited Arab Emirates
  2. 2.Department of Electrical and Computer Engineering and Center for Cyber Physical SystemsKhalifa UniversityAbu DhabiUAE

Personalised recommendations