Thermal Management of a Many-Core Processor under Fine-Grained Parallelism

  • Fuat Keceli
  • Tali Moreshet
  • Uzi Vishkin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7155)


In this paper, we present the work in progress that studies the run-time impact of various DTM techniques on a proposed 1024-core XMT chip. XMT aims to improve single task performance using fine-grained parallelism. Via simulations, we show that relative to a general global scheme, speedups of up to 46% with a dedicated interconnection controller and 22% with distributed control of computing clusters are possible. Our findings lead to several high level insights that can impact the design of a broader family of shared memory many-core systems.


Clock Frequency Thermal Management Memory Access Pattern Shared Cache Power Envelope 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balkan, A.O., Horak, M.N., Qu, G., Vishkin, U.: Layout-accurate design and implementation of a high-throughput interconnection network for single-chip parallel processing. In: Proc. Hot Interconnects (2007)Google Scholar
  2. 2.
    Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proc. SC (2009)Google Scholar
  3. 3.
    Caragea, G., Keceli, F., Tzannes, A., Vishkin, U.: General-purpose vs. GPU: Comparison of many-cores on irregular workloads. In: Proc. HotPar (2010)Google Scholar
  4. 4.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proc. IISWC (2009)Google Scholar
  5. 5.
    Donald, J., Martonosi, M.: Techniques for multicore thermal management: Classification and new exploration. In: Proc. ISCA (2006)Google Scholar
  6. 6.
    Ge, Y., Malani, P., Qiu, Q.: Distributed task migration for thermal management in many-core systems. In: Proc. DAC (2010)Google Scholar
  7. 7.
    Ginosar, R.: The plural architecture (2011),, also see course on Parallel Computing, Electrical Engineering, Technion,
  8. 8.
    Hoberock, J., Bell, N.: Thrust: A parallel template library version 1.1 (2009),
  9. 9.
    Howard, J., Dighe, S., et al.: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In: Proc. ISSCC (2010)Google Scholar
  10. 10.
    Huang, W., Stan, M.R., Sankaranarayanan, K., Ribando, R.J., Skadron, K.: Many-core design from a thermal perspective. In: Proc. DAC (2008)Google Scholar
  11. 11.
    Isci, C., Martonosi, M.: Runtime power monitoring in high-end processors: Methodology and empirical data. In: Proc. MICRO (2003)Google Scholar
  12. 12.
    Kadin, M., Reda, S., Uht, A.: Central vs. distributed dynamic thermal management for multi-core processors: which one is better? In: Proceedings of the Great Lakes Symposium on VLSI (2009)Google Scholar
  13. 13.
    Kaxiras, S., Martonosi, M.: Computer Architecture Techniques for Power Efficiency. Morgan and Claypool Publishers (2008)Google Scholar
  14. 14.
    Keceli, F.: Power and Performance Studies of the Explicit Multi-Threading (XMT) Architecture. Ph.D. thesis, University of Maryland (2011)Google Scholar
  15. 15.
    Keceli, F., Moreshet, T., Vishkin, U.: Power-performance comparison of single-task driven many-cores, submitted for publicationGoogle Scholar
  16. 16.
    Keceli, F., Tzannes, A., Caragea, G., Vishkin, U., Barua, R.: Toolchain for programming, simulating and studying the XMT many-core architecture. In: Proc. HIPS (2011), in conj. with IPDPSGoogle Scholar
  17. 17.
    Keceli, F., Vishkin, U.: XMTSim: Cycle-accurate Simulator of the XMT Many-Core Architecture. Tech. Rep. UMIACS-TR-2011-02, Univ. of Maryland (2011)Google Scholar
  18. 18.
    Keller, J., Kessler, C., Traeff, J.L.: Practical PRAM Programming. John Wiley & Sons, Inc., New York (2001)Google Scholar
  19. 19.
    Kumar, R., Hinton, G.: A family of 45nm IA processors. In: Proc. ISSCC (2009)Google Scholar
  20. 20.
    Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proc. MICRO (2009)Google Scholar
  21. 21.
    Liu, S., Zhang, J., Wu, Q., Qiu, Q.: Thermal-aware job allocation and scheduling for three dimensional chip multiprocessor. In: Proceedings of the International Symposium on Quality Electronic Design (2010)Google Scholar
  22. 22.
    Ma, K., Li, X., Chen, M., Wang, X.: Scalable power control for many-core architectures running multi-threaded applications. In: Proc. ISCA (2011)Google Scholar
  23. 23.
    NVIDIA: CUDA SDK 2.3 (2009),
  24. 24.
    Padua, D., Vishkin, U.: Joint UIUC/UMD parallel algorithms/ programming course. In: Proc. EduPar (2011), in conj. with IPDPSGoogle Scholar
  25. 25.
    Patterson, D.: The trouble with multicore: Chipmakers are busy designing microprocessors that most programmers can’t handle. IEEE Spectrum (July 2010)Google Scholar
  26. 26.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proc. IPDPS (2009)Google Scholar
  27. 27.
    Skadron, K., Stan, M.R., Huang, W., Velusamy, S., Sankaranarayanan, K., Tarjan, D.: Temperature-aware microarchitecture. In: Proc. ISCA (2003)Google Scholar
  28. 28.
    Wen, X., Vishkin, U.: FPGA-based prototype of a PRAM on-chip processor. In: Proc. Comp. Front. (2008)Google Scholar
  29. 29.
    Wilton, S., Jouppi, N.: CACTI: an enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31(5), 677–688 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fuat Keceli
    • 1
  • Tali Moreshet
    • 2
  • Uzi Vishkin
    • 1
  1. 1.University of MarylandCollege ParkUSA
  2. 2.Swarthmore CollegeSwarthmoreUSA

Personalised recommendations