The Coming Wave of Multithreaded Chip Multiprocessors

  • James Laudon
  • Lawrence Spracklen

The performance of microprocessors has increased exponentially for over 35 years. However, process technology challenges, chip power constraints, and difficulty in extracting instruction-level parallelism are conspiring to limit the performance of future individual processors. To address these limits, the computer industry has embraced chip multiprocessing (CMP), predominately in the form of multiple high-performance superscalar processors on the same die. We explore the trade-off between building CMPs from a few high-performance cores or building CMPs from a large number of lower-performance cores and argue that CMPs built from a larger number of lower-performance cores can provide better performance and performance/Watt on many commercial workloads. We examine two multi-threaded CMPs built using a large number of processor cores: Sun’s Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads. The programming model for these CMPs is similar to the widely used programming model for symmetric multiprocessors (SMPs), but the greatly reduced costs associated with communication of data through the on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMP. Finally, we present performance comparisons between Sun’s Niagara and more conventional dual-core processors built from large superscalar processor cores. For several key server workloads, Niagara shows significant performance and even more significant performance/Watt advantages over the CMPs built from traditional superscalar processors.


Chip multiprocessing multithreading performance parallel programming 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. E. Moore, Cramming more Components onto Integrated Circuits, Electronics, 114–117, (1965).Google Scholar
  2. 2.
    D. W. Wall, Limits of Instruction-Level Parallelism, WRL Research Report 93/6, Digital Western Research Laboratory, Palo Alto, CA (1993).Google Scholar
  3. 3.
    J. D. Davis et. al., Maximizing CMT Throughput with Mediocre Cores in Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pp.51–62 (Sep. 2005).Google Scholar
  4. 4.
    Standard Performance Evaluation Corporation, SPEC*, http://www.spec. org, Warrenton, VA.Google Scholar
  5. 5.
    Transaction Processing Performance Council. TPC Benchmark C, Standard Specification Revision 3.6 (October 1999).Google Scholar
  6. 6.
    Transaction Processing Performance Council, TPC-*,, San Francisco, CA.Google Scholar
  7. 7.
    XML Processing Performance in Java and .Net, Scholar
  8. 8.
    S. Kunkel, R. Eickemeyer, M. Lip, T. Mullins, A Performance Methodology for Commercial Servers, IBM Journal of Research and Development 44(6): (2000).Google Scholar
  9. 9.
    Kongetira P., Aingaran K. and Olukotun K. (2005). Niagara: A 32 way Multithreaded SPARC Processor. IEEE Micro 25(2): 21–29 CrossRefGoogle Scholar
  10. 10.
    J. Laudon, Performance/Watt: The New Server Focus, in The Proceedings of the Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Barcelona, Spain (November 2005).Google Scholar
  11. 11.
    Altschul S.F., Gish W., Miller W., Myers E.W. and Lipman D.J. (1990). Basic Local Alignment Search Tool: Basic Local Alignment Search Tool. Journal of Molecular Biology 215: 403–410 Google Scholar
  12. 12.–12/sunflash.20051206.2.xmlGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Sun Microsystems, Inc.Santa ClaraUSA

Personalised recommendations