International Journal of Parallel Programming

, Volume 34, Issue 4, pp 343–381 | Cite as

Supporting Microthread Scheduling and Synchronisation in CMPs



Chip multiprocessors (CMPs) hold great promise for achieving scalability in future systems. Microthreaded CMPs add a means of exploiting legacy code in such systems. Using this model, compilers generate parametric concurrency from sequential source code, which can be used to optimise a range of operational parameters such as power and performance over many orders of magnitude, given a scalable implementation. This paper shows scalability in performance, power and most importantly, in silicon implementation, the main contribution of this paper. The microthread model requires dynamic register allocation and a hardware scheduler, which must support hundreds of microthreads per processor. The scheduler must support thread creation, context switching and thread rescheduling on every machine cycle to fully support this model, which is a significant challenge. Scalable implementations of such support structures are given and the feasibility of large-scale CMPs is investigated by giving detailed area estimate of these structures.


Microgrids microthreads CMPs schedulers register files 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    L. A. Barroso, et al., Piranha: A Scalable Architecture Based on Single-Chip Proc. of 27th Annual International Symposium on Computer Architecture, Vancouver, British Columbia, Canada, pp. 282–293 (June 2000).Google Scholar
  2. 2.
    Hammond L., Hubbert B.A., Siu M., Prabhu M.K., Chen M., Olukolun K. (March-April 2000). The Stanford Hydra CMP. IEEE Micro 20:71–84CrossRefGoogle Scholar
  3. 3.
    Hammond L., Nayfah B.A., Olukotun K.A. (September 1997). Single-Chip Multiprocessor. IEEE Computer Society 30(9):79–85Google Scholar
  4. 4.
    Tendler J.M., Dodson J.S., Fields J.S., Le H., Sinharoy B. (2002). Power4 System Micro-architecture. IBM Journal of Research and Development 46(1):5–25CrossRefGoogle Scholar
  5. 5.
    Kongetira P., Aingaran K., Olukotun K. (March-April 2005). Niagara: 32-way Multithreaded Sparc Processor. IEEE Computer Society 25(2):21–29Google Scholar
  6. 6.
    McNairy C., Bhatia R. (March-April 2005). Montecito: A Dual-Core, Dual-Thread Itanium Processor. IEEE Computer Society 25(2):10–20Google Scholar
  7. 7.
    V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures, Proc. of the 27th Annual International Symposium on Computer Architecture, Vancouver, British Columbia, , pp. 248–259 (June 2000).Google Scholar
  8. 8.
    A. Shilov, Intel to Cancel NetBurst Pentum 4 Xeon Evolution, http:/www.xbitlabs. com/news/cpu/display/20040507000306.html (2004), Accessed 7/1/2005.Google Scholar
  9. 9.
    Lipasti M.H., Shen J.P. (September 1997). Superspeculative Microarchitecture for Beyond AD 2000. IEEE Computer Society 30(9):59–66Google Scholar
  10. 10.
    International Technology Roadmap for Semiconductors, (2003), Accessed 20/4/2005.Google Scholar
  11. 11.
    S. Rixner, et al., Register Organisation for Media Processing, International Symposium on High Performance Computer Architecture, Toulouse, France, pp. 375–386 (January 2000).Google Scholar
  12. 12.
    Ronen R. et al. (2001) Coming Challenges in Microarchitecture and Architecture. Proc. IEEE 89(3):325–340CrossRefGoogle Scholar
  13. 13.
    K. Bousias and C. R. Jesshope, The Challenges of Massive On-chip Concurrency, 10th Asia-Pacific Computer Systems Architecture Conference, Singapore, October 24–26, number 3740 in LNCS, pp. 157–170, Springer-Verlag (2005).Google Scholar
  14. 14.
    S. Onder and R. Gupta, Superscalar Execution with Dynamic Data Forwarding, Proc. of the International Conference on Parallel Architectures and Compilation Techniques, Paris, France, pp. 130–135 (October 1998).Google Scholar
  15. 15.
    R. Balasubramonian, S. Dwarkadas, and D. Albonesi, Reducing the Complexity of the Register File in Dynamic Superscalar Processors, Proc. of the 34th International Symposium on Micro-architecture, Austin, Texas, pp. 237–248 (December 2001).Google Scholar
  16. 16.
    S. Palacharla, N. P. Jouppi, and J. Smith, Complexity-effective Superscalar Processors, Proc. of the 24th International Symposium on Computer Architecture, Denver, Colorado, United States, pp. 206–218 (June 1997).Google Scholar
  17. 17.
    D. M. Tullsen, S. Eggersa, and H. M. Levy, Simultaneous Multithreading: Maximizing on Chip Parallelism, Proc. of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392–403 (June 1995).Google Scholar
  18. 18.
    J. Burns and J. -L. Gaudiot, Area and System Clock Effects on SMT/CMP Processors, Proc. of the 2001 International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, pp. 211–218 (September 2001).Google Scholar
  19. 19.
    L. Spracklen and S. G. Abraham, Chip Multithreading: Opportunities and Challenges, Proc. of the 11th Intel’s Symposium on High performance Computer Architecture (HPCA-11 2005), San Francisco, CA, USA, pp. 248–252 (February 2005).Google Scholar
  20. 20.
    K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and Chang, K., The Case for a Single-Chip Multiprocessor, Proc. of the Seventh International Symposium, Cambridge, MA, pp. 2–11 (October 1996).Google Scholar
  21. 21.
    W. Ro and J. -L. Gaudiot, SPEAR: A Hybrid Model for Speculative Pre-Execution, Proc. of 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Eldorado Hotel, Santa Fe, New Mexico, pp. 26–30 (April 2004).Google Scholar
  22. 22.
    G. M. Zoppetti, G. Agrawal, L. Pollock, J. N. Amaral, X. Tang, and G. R. Gao, Automatic Compiler Techniques for Thread Coarsening for Multithreaded Architectures, Proc. of the 14th International Conference on Supercomputing, Santa Fe, New Mexico, USA, pp. 306–315, (May 2000).Google Scholar
  23. 23.
    K. Wilcox and S. Manne, Alpha Processor: A history of Power issues and a look to the Future, In Cool-chips Tutorial, Held in conjunction with MICRO-32 (Dec. 1999).Google Scholar
  24. 24.
    J. Huh, D. Burger, and S. W. Keckler, Exploring the Design Space of Future CMPs, Proc. Of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, pp. 199–210 (September 2001).Google Scholar
  25. 25.
    R. P. Preston, et al., Design of an 8-wide Superscalar RISC microprocessor with Simultaneous Multithreading, 2002 IEEE International Solid-State Circuits Conference, San Francisco, CA, pp. 334–335 (February 2002).Google Scholar
  26. 26.
    J. Scott, Designing the Low-Power M-CORE Architecture, Proc. IEEE Power Driven Micro Architecture Workshop at ISCA98, Barcelona, Spain, pp. 145–150 (June 1998).Google Scholar
  27. 27.
    R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, Proc. of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, San Diego, CA, USA, pp. 81 (December 2003).Google Scholar
  28. 28.
    L. Yingmin, D. Brooks, H. Zhigang, and K. Skadron, Performance, Energy, and Thermal Considerations for SMT and CMP Architectures, Proc. of the 11th IEEE International Symposium on high Performance Computer Architecture (HPCA), San Francisco, CA, USA, pp. 71–82 (February 2005).Google Scholar
  29. 29.
    M. Kiemb and K. Choi, Memory and Architecture Exploration with Thread Shifting for Multithreaded Processors in Embedded Systems, Proc. of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Washington DC, USA, pp. 230–237 (September 2004).Google Scholar
  30. 30.
    C. R. Jesshope, Scalable Instruction-level Parallelism, Computer Systems: Architectures, Modeling and Simulation, 3rd and 4th International Workshops, SAMOS 2004, Samos, Greece, pp. 383–392 (July 2004).Google Scholar
  31. 31.
    K. Bousias, N. M. Hasasneh, and C. R. Jesshope, Instruction-level Parallelism Through Microthreading—A Scalable Approach to Chip Multiprocessors, an Electronic Version of an article to be published in the BCS Computer Journal (2005). Online access: EoSzke60tdKdUYz&keytype=refGoogle Scholar
  32. 32.
    C. R. Jesshope, Micro-Grids—The Exploitation of Massive On-Chip Concurrency, in L. Grandinetti (ed.), Grid Computing: A New Frontier of High Performance Computing, 14 (Invited paper, (HPC 2004)Cetraro, June 2004), Elsevier, Amsterdam pp. 203–223, (2005).Google Scholar
  33. 33.
    J. Silberman, et al., A 1.0 GHz Single Issue 64b PowerPC Integer Processor, ISSCC, Department of Computer Sciences, IBM Austin Research Lab., Austin, TX, pp. 230 (1998).Google Scholar
  34. 34.
    S. Gupta, S. W. Keckler, and D. C. Burger, Technology Independent Area and Delay Estimates for Microprocessor Building Blocks, Tech. Report TR2000–05, Department of Computer Sciences, the University of Texas at Austin, pp. 1–27 (May 2000).Google Scholar
  35. 35.
    D. Lopez, J. Llosa, M. Valero, and E. Ayguade, Resource Widening versus Replication: Limits and Performance-Cost Trade-Off, 12th International Conference on Supercomputing (ICS-12), Melbourne, Australia, pp. 441–448 (1998).Google Scholar
  36. 36.
    R. Kumar, N. P. Jouppi, and D. M. Tullsen, Conjoined-Core Chip Multiprocessing, Proc. of the 37th annual International Symposium on Microarchitecture (MICRO-37 2004), Portland, Oregon, pp. 195–206 (December 2004).Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Department of Electronic EngineeringUniversity of HullHullUK
  2. 2.Institute for InformaticsUniversity of AmsterdamAmsterdamThe Netherland

Personalised recommendations