Advertisement

Pipelining to Reduce the Power

  • David Chinnery
  • Kurt Keutzer

Algorithmic and architectural choices can reduce the power by an order of magnitude [56]. We assume that ASIC and custom designers make similar algorithmic and architectural choices to find a low power implementation that meets performance requirements for the target application.

Circuit designers explore trade-offs for different microarchitectural features that implement a given architecture for typical applications. The analysis may be detailed using cycle accurate instruction simulators, but low level circuit optimizations are not usually examined until a much later design phase. High level microarchitectural choices have a substantial impact on the performance and power consumption, affecting the design constraints for low level optimizations.

This chapter examines the power gap between ASIC and custom with pipelining and different architectural overheads. Other researchers have proposed high level pipelining models that consider power consumption, but they do not consider gate sizing and voltage scaling. We will augment a pipeline model with a model of power savings from voltage scaling and gate sizing versus timing slack. This enables simultaneous analysis of the power and performance trade-offs for both high-level and low-level circuit optimizations.

Keywords

Pipeline Stage Combinational Logic Clock Period Custom Design Inverse Discrete Cosine Transform 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Anderson, F., Wells, J., and Berta, E., “The Core Clock System on the Next Generation ItaniumTM Microprocessor, ” Digest of Technical Papers of the IEEE International Solid-State Circuits Conference, 2002, pp. 146-147, 453.Google Scholar
  2. [7]
    Benschneider, B. J., et al., “A 300-MHz 64-b Quad-Issue CMOS RISC Microprocessor, ” IEEE Journal of Solid-State Circuits, vol. 30, no. 11, November 1995, pp. 1203-1214.CrossRefGoogle Scholar
  3. [8]
    Bhavnagarwala, A., et al., “A Minimum Total Power Methodology for Projecting Limits on CMOS GSI, ” IEEE Transactions on VLSI Systems, vol. 8, no. 3, June 2000, pp. 235-251.CrossRefGoogle Scholar
  4. [9]
    Brooks, D., et al., “Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, ” IEEE Micro, vol. 20, no. 6, 2000, pp. 26-44.CrossRefGoogle Scholar
  5. [10]
    Chandrakasan, A., and Brodersen, R., “Minimizing Power Consumption in Digital CMOS Circuits, ” in Proceedings of the IEEE, vol. 83, no. 4, April 1995, pp. 498-523.  CrossRefGoogle Scholar
  6. [11]
    Chinnery, D, Low Power Design Automation, Ph. D. dissertation, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 2006.Google Scholar
  7. [12]
    Chinnery, D., et al., “Automatic Replacement of Flip-Flops by Latches in ASICs, ” chapter 7 in Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design, Kluwer Academic Publishers, 2002, pp. 187-208.Google Scholar
  8. [13]
    Chinnery, D., and Keutzer, K., Closing the Gap Between ASIC & Custom: Tools and Tech-niques for High-Performance ASIC Design, Kluwer Academic Publishers, 2002, 432 pp.Google Scholar
  9. Clark, L., “The XScale Experience: Combining High Performance with Low Power from 0. 18um through 90nm Technologies, ” presented at the Electrical Engineering and Computer Science Department of the University of Michigan, September 30, 2005. http://www. eecs. umich. edu/vlsi_seminar/f05/Slides/VLSI_LClark. pdf
  10. [15]
    Clark, L., et al., “An Embedded 32-b Microprocessor Core for Low-Power and High-Performance Applications, ” Journal of Solid-State Circuits, vol. 36, no. 11, November 2001, pp. 1599-1608.CrossRefGoogle Scholar
  11. [16]
    Contreras, G., et al., “XTREM: A Power Simulator for the Intel XScale Core, ” in Proceedings of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems, 2004, 11 pp.Google Scholar
  12. [17]
    Dai, W., and Staepelaere, D., “Useful-Skew Clock Synthesis Boosts ASIC Performance, ” chapter 8 in Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design, Kluwer Academic Publishers, 2002, pp. 209-223.Google Scholar
  13. [18]
    Davies, B., et al., “iPART: An Automated Phase Analysis and Recognition Tool, ” Intel Research Tech Report IR-TR-2004-1, 2004, pp. 12.Google Scholar
  14. De Gelas, J. AMD’s Roadmap. February 28, 2000. http://www. aceshardware. com/Spades/ read. php?article_id=119
  15. [20]
    Fanucci, L., and Saponara, S., “Low-Power VLSI Architectures for 3D Discrete Cosine Transform (DCT), ” in Proceedings of the International Midwest Symposium on Circuits and Systems, 2003, pp. 1567-1570.Google Scholar
  16. [21]
    Flynn, D., and Keating, M., “Creating Synthesizable ARM Processors with Near Custom Performance, ” chapter 17 in Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design, Kluwer Academic Publishers, 2002, pp. 383-407.Google Scholar
  17. [22]
    Furber, S., ARM System-on-Chip Architecture. 2nd Ed. Addison-Wesley, 2000.Google Scholar
  18. [23]
    Ghani, T., et al., “100 nm Gate Length High Performance/Low Power CMOS Transistor Structure, ” Technical digest of the International Electron Devices Meeting, 1999, pp. 415-418.Google Scholar
  19. [24]
    Golden, M., et al., “A Seventh-Generation x86 Microprocessor, ” IEEE Journal of Solid-State Circuits, vol. 34, no. 11, November 1999, pp. 1466-1477.CrossRefGoogle Scholar
  20. [25]
    Gonzalez, D., “Micro-RISC architecture for the wireless market, ” IEEE Micro, vol. 19, no. 4, 1999, pp. 30-37.CrossRefGoogle Scholar
  21. [26]
    Gowan, M., Biro, L., and Jackson, D., “Power Considerations in the Design of the Alpha 21264 Microprocessor, ” in Proceedings of the Design Automation Conference, 1998, pp. 726-731.Google Scholar
  22. [27]
    Greenlaw, D., et al., “Taking SOI Substrates and Low-k Dielectrics into High-Volume Microprocessor Production, ” Technical Digest of the International Electron Devices Meeting, 2003, 4 pp.Google Scholar
  23. [28]
    Grochowski, E., and Annavaram, M., “Energy per Instruction Trends in Intel Microprocessors, ” Technology@Intel Magazine, March 2006, 8 pp.Google Scholar
  24. [29]
    Gronowski, P., et al., “High-Performance Microprocessor Design, ” IEEE Journal of Solid-State Circuits, vol. 33, no. 5, May 1998, pp. 676-686.CrossRefGoogle Scholar
  25. Hare, C. 586/686 Processors Chart. http://users. erols. com/chare/586. htm
  26. Hare, C. 786 Processors Chart. http://users. erols. com/chare/786. htm
  27. [32]
    Harstein, A., and Puzak, T., “Optimum Power/Performance Pipeline Depth, ” in Procee-dings of the 36th International Symposium on Microarchitecture, 2003, pp. 117-126.Google Scholar
  28. [33]
    Hauck, C., and Cheng, C. “VLSI Implementation of a Portable 266MHz 32-Bit RISC Core, ” Microprocessor Report, November 2001, 5 pp.Google Scholar
  29. [34]
    Hinton, G., et al., “A 0. 18-um CMOS IA-32 Processor With a 4-GHz Integer Execution Unit, ” IEEE Journal of Solid-State Circuits, vol. 36, no. 11, November 2001, pp. 1617-1627.CrossRefGoogle Scholar
  30. [35]
    Hinton, G., et al., “The Microarchitecture of the Pentium 4 Processor, ” Intel Technical Journal, Q1 2001, pp. 13.Google Scholar
  31. [36]
    Hofstee, H., “Power Efficient Processor Architecture and the Cell Processor, ” in Proceedings of the Symposium on High-Performance Computer Architecture, 2005, pp. 258-262.Google Scholar
  32. Horan, B., “Intel Architecture Update, ” presented at the IBM EMEA HPC Conference, May 17, 2006. www-5. ibm. com/fr/partenaires/forum/hpc/intel. pdfGoogle Scholar
  33. [38]
    Hrishikesh, M., et al., “The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, ” in Proceedings of the Annual International Symposium on Computer Architecture, May 2002, pp. 14-24.Google Scholar
  34. Intel, Intel Unveils World’s Best Processor, July27, 2006. http://www. intel. com/ pressroom/archive/releases/20060727comp. htm
  35. Intel, Inside the NetBurst Micro-Architecture of the Intel Pentium 4 Processor, Revision 1. 0, 2000. http://developer. intel. com/pentium4/download/netburst. pdf
  36. [41]
    Keltcher, C., et al., “The AMD Opteron Processor for Multiprocessor Servers, ” IEEE Micro, vol. 23, no. 2, 2003, pp. 66-76.CrossRefGoogle Scholar
  37. [42]
    Kurd, N. A, et al., “A Multigigahertz Clocking Scheme for the Pentium®4 Microprocessor, ” IEEE Journal of Solid-State Circuits, vol. 36, no. 11, November 2001, pp. 1647-1653.CrossRefGoogle Scholar
  38. Larri, G., “ARM810: Dancing to the Beat of a Different Drum, ” presented at Hot Chips, 1996.Google Scholar
  39. [44]
    Leitjen, J., Meerbergen, J., and Jess, J., “Analysis and Reduction of Glitches in Synchronous Networks, ” in Proceedings of the European Design and Test Conference, 1995, pp. 398-403.Google Scholar
  40. Lexra, Lexra LX4380 Product Brief, 2002, http://www. lexra. com/LX4380_PB. pdf
  41. [46]
    Mahnke, T., “Low Power ASIC Design Using Voltage Scaling at the Logic Level, ” Ph. D. dissertation, Department of Electronics and Information Technology, Technical University of Munich, May 2003, pp. 204.Google Scholar
  42. [47]
    Montanaro, J., et al., “A 160MHz, 32-b, 0. 5W, CMOS RISC Microprocessor, ” Journal of Solid-State Circuits, vol. 31, no. 11, 1996, pp. 1703-1714.CrossRefGoogle Scholar
  43. MTEK Computer Consulting, AMD CPU Roster, January 2002. http://www. cpuscorecard.  com/cpuprices/head_amd. htm
  44. MTEK Computer Consulting, Intel CPU Roster, January 2002. http://www. cpuscorecard.  com/cpuprices/head_intel. htm
  45. [50]
    Nowka, K., and Galambos, T., “Circuit Design Techniques for a Gigahertz Integer Microprocessor, ” in Proceedings of the International Conference on Computer Design, 1998, pp. 11-16.Google Scholar
  46. [51]
    Perera, A. H., et al., “A versatile 0. 13um CMOS Platform Technology supporting High Performance and Low Power Applications, ” Technical Digest of the International Electron Devices Meeting, 2000, pp. 571-574.Google Scholar
  47. [52]
    Richardson, N., et al., “The iCORETM 520MHz Synthesizable CPU Core, ” Chapter 16 of Closing the Gap Between ASIC and Custom, 2002, pp. 361-381.Google Scholar
  48. [53]
    Rollins, N., and Wirthlin, M., “Reducing Energy in FPGA Multipliers Through Glitch Reduction, ” presented at the International Conference on Military and Aerospace Programmable Logic Devices, September 2005, 10 pp.Google Scholar
  49. [54]
    Segars, S., “The ARM9 Family -High Performance Microprocessors for Embedded Applications, ” in Proceedings of the International Conference on Computer Design, 1998, pp. 230-235.Google Scholar
  50. [55]
    Silberman, J., et al., “A 1. 0-GHz Single-Issue 64-Bit PowerPC Integer Processor, ” IEEE Journal of Solid-State Circuits, vol. 33, no. 11, November 1998. pp. 1600-1608.CrossRefGoogle Scholar
  51. [56]
    Singh, D., et al., “Power Conscious CAD Tools and Methodologies: a Perspective, ” in Proceedings of the IEEE, vol. 83, no. 4, April 1995, pp. 570-594.CrossRefGoogle Scholar
  52. [57]
    Srinivasan, V., et al., “Optimizing pipelines for power and performance, ” in Proceedings of the International Symposium on Microarchitecture, 2002, pp. 333-344.Google Scholar
  53. Standard Performance Evaluation Corporation, SPEC’s Benchmarks and Published Results, 2006. http://www. spec. org/benchmarks. html
  54. STMicroelectronics, “STMicroelectronics 0. 25µ, 0. 18µ & 0. 12 CMOS, ” slides presented at the annual Circuits Multi-Projets users meeting, January 9, 2002. http://cmp. imag. fr/ Forms/Slides2002/061_STM_Process. pdf
  55. techPowerUp! CPU Database, August 2006. http://www. techpowerup. com/cpudb/
  56. Tensilica, Xtensa Microprocessor -Overview Handbook -A Summary of the Xtensa Data Sheet for Xtensa T1020 Processor Cores. August 2000.Google Scholar
  57. [62]
    Thompson, S., et al., “An Enhanced 130 nm Generation Logic Technology Featuring 60 nm Transistors Optimized for High Performance and Low Power at 0. 7 -1. 4 V, ” Technical Digest of the International Electron Devices Meeting, 2001, 4 pp.Google Scholar
  58. TSMC, 0. 13 Micron CMOS Process Technology, March 2002.Google Scholar
  59. TSMC, 0. 18 Micron CMOS Process Technology, March 2002.Google Scholar
  60. TSMC, TSMC Unveils Nexsys 90-Nanometer Process Technology, August 2006. http://www. tsmc. com/english/technology/t0113. htm
  61. [66]
    Tyagi, S., et al., “An advanced low power, high performance, strained channel 65nm technology, ” Technical Digest of the International Electron Devices Meeting, 2005, pp. 245-247.Google Scholar
  62. [67]
    Weicker, R., “Dhrystone: A Synthetic Systems Programming Benchmark, ” Communi-cations of the ACM, vol. 27, no. 10, 1984, pp. 1013-1030.CrossRefGoogle Scholar
  63. [68]
    Wilton, S., Ang, S., and Luk, W., “The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays, ” in Proceedings of the International Conference on Field Programmable Logic and Applications, 2004, pp. 719-728.Google Scholar
  64. [69]
    Xanthopoulos, T., and Chandrakasan, A., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization, ” Journal of. Solid State Circuits, vol. 35, no. 5, May 2000, pp. 740-750.CrossRefGoogle Scholar
  65.  [70.
    Xanthopoulos, T., and Chandrakasan, A., “A Low-Power IDCT Macrocell for MPEG-2 MP@ML Exploiting Data Distribution Properties for Minimal Activity, ” Journal of Solid State Circuits, vol. 34, May 1999, pp. 693-703.CrossRefGoogle Scholar
  66. [71]
    Yang, F., et al., “A 65nm Node Strained SOI Technology with Slim Spacer, ” Technical Digest of the International Electron Devices Meeting, 2003, pp. 627-630.Google Scholar
  67. [72]
    Zhuang, X., Zhang, T., and Pande, S., “Hardware-managed Register Allocation for Embedded Processors, ” in Proceedings of the ACM Conference on Languages, Compilers, and Tools for Embedded Systems, 2004, pp. 10.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • David Chinnery
    • 1
  • Kurt Keutzer
    • 2
  1. 1.AMDSunnyvaleUSA
  2. 2.Department of Electrical Engineering and Computer SciencesUniversity of CaliforniaBerkeleyUSA

Personalised recommendations