Skip to main content

Towards High-Performance and Energy-Efficient Multi-core Processors

  • Chapter
  • First Online:
CMOS Processors and Memories

Part of the book series: Analog Circuits and Signal Processing ((ACSP))

Abstract

Traditional uni-core processors have met tremendous challenges to improve their performance and energy efficiency, and to adapt to the deep submicron fabrication technology. Meanwhile, traditional ASIC implementations are also widely prohibited due to their inherent inflexibility and high design cost. On the other hand, rapidly advancing fabrication technologies have enabled the integration of many processors into a single chip, called multi-core processors, and promise a platform with high performance, high energy efficiency, and high flexibility.

This chapter will discuss the motivations of shifting from traditional IC systems (including uni-core processors and ASIC implementations) to multi-core processors, investigate the design cases of multi-core processors and their key features, and look forward to the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

  1. M.S. Hrishikesh, N.P. Jouppi, K.I. Farkas, D. Burger, S.W. Keckler, P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, in International Symposium on Computer Architecture (ISCA), May 2002, pp. 14–24

    Google Scholar 

  2. B. Flachs, S. Asano, S.H. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Liberty, B. Michael, H. Oh, S.M. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, A streaming processing unit for a CELL processor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2005, pp. 134–135

    Google Scholar 

  3. A. Harstein, T.R. Puzak, Optimum power/performance pipeline depth, in IEEE International Symposium on Microarchitecture (MICRO), Dec 2003, pp. 117–125

    Google Scholar 

  4. G.E. Moore, Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (Apr 1965)

    Google Scholar 

  5. S. Agarwala, T. Anderson, A. Hill, M.D. Ales, R. Damodaran, P. Wiley, S. Mullinnix, J. Leach, A. Lell, M. Gill, A. Rajagopal, A. Chachad, M. Agarwala, J. Apostol, M. Krishnan, D. Bui, Q. An, N.S. Nagaraj, T. Wolf, T.T. Elappuparackal, A 600-MHz VLIW DSP. IEEE J. Solid State Circuits (JSSC) 37(11), 1532–1544 (Nov 2002)

    Article  Google Scholar 

  6. R.P. Preston, R.W. Badeau, D.W. Balley, S.L. Bell, L.L. Biro, W.J. Bowhill, D.E. Dever, S. Felix, R. Gammack, V. Germini, M.K. Gowan, P. Gronowshi, D.B. Jankson adn S. Mehta, S.V. Morton, J.D. Pickholtz, M.H. Reilly, M.J. Smith, Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2002, pp. 266–267

    Google Scholar 

  7. J.L. Hennessy, D. Patterson, Computer Architecture – A Quantitative Approach, 4th edn. (Morgan Kaufmann Publisher, 2007)

    Google Scholar 

  8. K. Roy, S. Mukhopadyay, H. Mahmoodi-meimand, Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits. Proc. IEEE 91(2), 305–327 (Feb 2003)

    Article  Google Scholar 

  9. M. Horowitz, W. Dally, How scaling will change processor architecture, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2004, pp. 132–133

    Google Scholar 

  10. S. Borkar, Low power design challenges for the decade, in Asia and South Pacific Design Automatic Conference (ASP-DAC), 2001, pp. 293–296

    Google Scholar 

  11. J. Stinson, S. Rusu, A 1.5 GHz third generation Itanium processor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2003, pp. 252–253

    Google Scholar 

  12. S. Naffziger, T. Grutkowski, B. Stackhouse, The implementation of a 2-core multi-threaded Itanium family processor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2005, pp. 182–183, 592

    Google Scholar 

  13. S. Rusu, S. Tam, H. Muljono, D. Ayers, J. Chang, A dual-core multi-threaded Xeon processor with 16MB L3 cache, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2006, pp. 102–103

    Google Scholar 

  14. H.D. Man, Ambient intelligence: Gigascale dreams and nanoscale realities, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2004, pp. 29–35

    Google Scholar 

  15. R. Ho, K.W. Mai, M.A. Horowitz, The future of wires. Proc. IEEE 89(4), 490–504 (Apr 2001)

    Google Scholar 

  16. International Roadmap Committee, International technology roadmap for semiconductors, 2005 edn. Technical report, ITRS, 2005. http://public.itrs.net/

  17. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De, Parameter variations and impact on circuits and microarchitecture, in IEEE International Conference on Design Automation (DAC), June 2003, pp. 338–342

    Google Scholar 

  18. S. Kaneko, K. Sawai, N. Masui, et al., A 600 MHz single-chip multiprocessor with 4.8GB/s internal shared pipelined bus and 512kB internal memory, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2003, pp. 254–255

    Google Scholar 

  19. J. Hart, S. Choe, L. Cheng, C. Chou, A. Dixit, K. Ho, J. Hsu, K. Lee, J.Wu, Implementation of a 4th-generation 1.8GHz dual-core SPARC v9 microprocessor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2005, pp. 186–187

    Google Scholar 

  20. A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, V. Salapura, Greating the BlueGene/L supercomputer from lowpower SoC AISCs, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2005, pp. 188–189

    Google Scholar 

  21. M.B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, W. Lee, A. Saraf, N. Shnidman, V. Strumpen, S. Amarasinghe, A. Agarwal, A 16-issue multiple-program-counter microprocessor with point-topoint scalar operand network, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2003, pp. 170–171

    Google Scholar 

  22. Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. Mohsenin, M. Singh, B. Baas, An asynchronous array of simple processors for DSP applications, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2006, pp. 428–429

    Google Scholar 

  23. S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Lyer, A. Singh, T. Jacb, S. Jain, S. Venkataraman, Y. Hoskote, N. Borkar, An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2007, pp. 98–99

    Google Scholar 

  24. K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The landscape of parallel computing research: A view from berkeley, Technical Report UCB/EECS-2006-183, University of California, Berkeley, Dec 2006

    Google Scholar 

  25. W.A. Wulf, C.G. Bell, C.mmp – a multi-mini-processor, in AFIPS Conference, 1972, pp. 765–777

    Google Scholar 

  26. D. Lenoshi, J. Laudon, K. Gharachorloo, W.D. Weber, A. Gupta, J. Hennessy, M. Horowitz, M.S. Lam, The stanford DASH multiprocessor. IEEE Comp. 25(3), 63–79 (Mar 1992)

    Article  Google Scholar 

  27. C.L. Seitz, The cosmic cube. Commun. ACM 28(1), 22–33 (Jan 1985)

    Article  MathSciNet  Google Scholar 

  28. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, J. Hennessy, The Stanford FLASH multiprocessor, in International Symposium on Computer Architecture (ISCA), Apr 1994, pp. 302–313

    Google Scholar 

  29. D.H. Lawrie, Access and alignment of data in an array processor. IEEE Trans. Comput. 24(12), 1145–1155 (Dec 1975)

    Article  MathSciNet  MATH  Google Scholar 

  30. H.S. Stone, Parallel processing with the perfect shuffle. IEEE Trans. Comput. 2, 153–161 (Feb 1971)

    Article  Google Scholar 

  31. C. Whitby-Strevens, Transputers-past, present and future. IEEE Micro 10(6), 16–19 (Dec 1990)

    Article  Google Scholar 

  32. H. T. Kung. “Why systolic architectures?” Computer Magazine, 15(1), January 1982.

    Google Scholar 

  33. H.T. Kung, Systolic communication, in International Conference on Systolic Arrays, May 1988, pp. 695–703

    Google Scholar 

  34. L. Snyder, Introduction to the configurable, highly parallel computer. IEEE Comput. 15(1), 47–56 (Jan 1982)

    Article  Google Scholar 

  35. S.Y. Kung, K.S. Arun, R.J. Gal-Ezer, D.V. Bhaskar Rao, Wavefront array processor: Language, architecture, and applications. IEEE Trans. Comput. 31(11), 1054–1066 (Nov 1982)

    Article  Google Scholar 

  36. S.Y. Kung, VLSI array processors. IEEE ASSP Mag. 2(3), 4–22 (July 1985)

    Article  Google Scholar 

  37. U. Schmidt, S. Mehrgardt, Wavefront array processor for video applications, in IEEE International Conference on Computer Design (ICCD), Sept 1990, pp. 307–310

    Google Scholar 

  38. A. Keung, J.M. Rabaey, A 2.4 GOPS data-driven reconfigurable multiprocessor IC for DSP, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 1995, pp. 108–110

    Google Scholar 

  39. E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal, Baring it all to software: Raw machines. IEEE Comput. 30(9), 86–93 (Sept 1997)

    Article  Google Scholar 

  40. S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Laguns, P. Mattson, J.D. Owens, A bandwidth-efficient architecture for media processing, in IEEE international Symposium on Microarchitecture (MICRO), Nov 1998, pp. 3–13

    Google Scholar 

  41. B. Khailany, W.J. Dally, A. Chang, U.J. Kapasi, J. Namkoong, B. Towles, VLSI design and verification of the imagine processor, in IEEE International Conference on Computer Design (ICCD), Sept 2002, pp. 289–294

    Google Scholar 

  42. B. Khailany, T. Williams, J. Lin, E. Long, M. Rygh, D. Tovey, W.J. Dally, A programmable 512 GOPS stream processor for signal, image, and video processing, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2007, pp. 272–273

    Google Scholar 

  43. L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, K. Olukotun, The stanford Hydra CMP. IEEE Micro 20(2), 71–84 (March 2000)

    Article  Google Scholar 

  44. A. Leon, J. Shin, K. Tam, W. Bryg, F. Schumachier, P. Kongetira, D. Weisner, A. Strong, A power-efficienct high-throughput 32-thread SPARC processor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2006, pp. 98–99

    Google Scholar 

  45. U.G. Nawathe, N. Hassan, L. Warriner, K. Yen, B. Upputuri, D. Greenhill, A. Kumar, H. Park, An 8-core 64-thread 64b power-efficient SPARC SoC, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2007, pp. 108–109

    Google Scholar 

  46. H. Zhang, V. Prabhu, V. George, M. Wan, M. Benes, A. Abnous, J.M. Rabaey, A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing. IEEE J. Solid State Circuits (JSSC) 35(11), 1697–1704 (Nov 2000)

    Article  Google Scholar 

  47. K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, Smart memories: A modular reconfigurable architecture, in International Symposium on Computer Architecture (ISCA), June 2000, pp. 161–171

    Google Scholar 

  48. K. Mai, R. Ho, E. Alon, D. Liu, Y. Kim, D. Patil, M. Horowitz, Architecture and circuit techniques for a reconfigurable memory block, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2004, pp. 500–501

    Google Scholar 

  49. K. Sankaralingam, R. Nagarajan, H. Liu, J. Huh, C.K. Kim, D. Burger, S.W. Keckler, C.R. Moore, Exploiting ILP, TLP, and DLP using polymorphism in the TRIPS architecture, in International Symposium on Computer Architecture (ISCA), Feb 2003, pp. 422–433

    Google Scholar 

  50. M. Saravana, S. Govindan, D. Burger, S. Keckler, TRIPS: A distributed explicit data graph execution (EDGE) microprocessor, in Hotchips, August 2007

    Google Scholar 

  51. H. Schmit, D. Whelihan, M. Moe, B. Levine, R.R. Taylor, PipeRench: A virtualized programmable datapath in 0.18 micron technology, in IEEE Custom Integrated Circuits Conference (CICC), May 2002, pp. 63–66

    Google Scholar 

  52. S. Swanson, K. Michelson, A. Schwerin, M. Oskin, Wavescalar, in IEEE international Symposium on Microarchitecture (MICRO), Dec 2003, pp. 291–302

    Google Scholar 

  53. S. Swanson, A. Putnam, M. Mercaldi, K. Michelson, A. Petersen, A. Schwerin, M. Oskin, S.J. Eggers, Area-performance trade-offs in tiled dataflow architectures, in International Symposium on Computer Architecture (ISCA), May 2006, pp. 314–326

    Google Scholar 

  54. D. Truong, W. Cheng, T. Mohsenin, Z. Yu, T. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, B. Baas, A 167-processor Computational Platform in 65 nm CMOS. IEEE J. Solid State Circuits (JSSC) 44(4), 1130–1144 (April 2009)

    Article  Google Scholar 

  55. J. Oliver, R. Rao, P. Sultana, J. Crandall, E. Czernikowski, L.W. Jones, D. Franklin, V. Akella, F.T. Chong, Synchroscalar: A multiple clock domain, power-aware, tile-based embedded processor, in International Symposium on Computer Architecture (ISCA), June 2004, pp. 150–161

    Google Scholar 

  56. D.C. Cronquist, P. Franklin, C. Fisher, M. Figueroa, C. Ebeling, Architecture design of reconfigurable pipelined datapaths, in Conference on Advanced Research in VLSI, March 1999, pp. 23–40

    Google Scholar 

  57. R. Baines, D. Pulley, A total cost approach to evaluating different reconfigurable architectures for baseband processing in wireless receivers. IEEE Commun. Mag. 41(1), 105–113 (Jan 2003)

    Article  Google Scholar 

  58. S. Kyo, T. Koga, S. Okazaki, R. Uchida, S. Yoshimoto, I. Kuroda, A 51.2GOPS scalable video recognition processor for intelligent cruise contol based on a linear array of 128 4-way VLIW processing elements, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2003, pp. 48–49

    Google Scholar 

  59. J. Carlstrom, G. Nordmark, J. Roos, T. Boden, L. Svensson, P. Westlund, A 40Gb/s network processor with PISC dataflow architecture, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2004, pp. 60–61

    Google Scholar 

  60. W. Eatherton, The push of network processing to the top of the pyramid, in Symposium on Architectures for Networking and communications systems, Oct 2005

    Google Scholar 

  61. D. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, The design and implementation of a first-generation CELL processor, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2005, pp. 184–185

    Google Scholar 

  62. Intellasys, SEAforth-24B, embedded array processor, Technical report. http://www.intellasys.net/

  63. Mathstar, Arrix family product brief, Technical report. http://www.mathstar.com/

  64. Rapport, KC256 technical overview, Technical report. http://www.rapportincorporated.com/

  65. A.M. Jones, M. Butts, TeraOPS hardware: A new massively-parallel MIMD computing fabric IC, in Hotchips, Aug 2006

    Google Scholar 

  66. D. Lattard, E. Beigne, C. Bernard, C. Bour, F. Clermidy, Y. Durand, J. Durupt, D. Varreau, P. Vivit, P. Penard, A. Bouttier, F. Berens, A telecom baseband circuit based on an asynchronous network-on-chip, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2007, pp. 258–259

    Google Scholar 

  67. V. Yalala, D. Brasili, D. Carlson, A. Hughes, A. Jain, T. Kiszely, K. Kodandapani, A. Varadhrajan, T. Xanthopoulos, A 16-core RISC microprocessor with network extensions, in IEEE International Solid-State Circuits Conference (ISSCC), Feb 2006, pp. 100–101

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyi Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Yu, Z. (2010). Towards High-Performance and Energy-Efficient Multi-core Processors. In: Iniewski, K. (eds) CMOS Processors and Memories. Analog Circuits and Signal Processing. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9216-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9216-8_2

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9215-1

  • Online ISBN: 978-90-481-9216-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics