Journal of Computer Science and Technology

, Volume 23, Issue 4, pp 620–632 | Cite as

Making Effective Decisions in Computer Architects’ Real-World: Lessons and Experiences with Godson-2 Processor Designs

  • Wei-Wu HuEmail author
  • Jian Wang
Regular Paper


Although the design of many kinds of microprocessors has been under developing for several decades, the computer architecture R&D community lacks well documented lessons and experiences about design decisions in the research literature. In this paper, we systematically present the design decisions we made during the designing and prototyping of Godson-2 series processors. The 250MHz Godson-2B, 450MHz Godson-2C, and 1GHz Godson-2E processors that implement 64-bit, four-issue, out-of-order architecture were taped out in 2003, 2004, and 2005, respectively. Each processor triples its predecessor in the SPEC CPU2000 rates. Our first-hand experiences and lessons gained from these designs would provide unique perspectives and insights that are not available in any existing text books and/or published papers. We summarize 10 critical lessons and experiences based on hundreds of our attempts at architectural and design optimizations for performance improvement of Godson-2 series processors. The issues include silicon-simulation correlation, design balancing, performance optimizing, and pico-architecture tuning. We conclude that persistent improvement, attitude towards work-on-silicon design, and insightful understanding of software and fabrication process are the three most important factors for designing a high performance processor with low energy consumption.


superscalar architecture correlation design balanced design optimized design Pico-architecture design work-on-silicon 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_9158_MOESM1_ESM.pdf (70 kb)
(PDF 70.1 kb)


  1. [1]
    Weiwu Hu, Zhimin Tang. Microarchitecture design of the Godson-1 processor. Chinese Journal of Computers, April 2003, 26(4): 385–396. (in Chinese)MathSciNetGoogle Scholar
  2. [2]
    Weiwu Hu, Fuxin Zhang, Zusong Li. Microarchitecture of the Goodson-2 Processor. Journal of Computer Science and Technology, March 2005, 20(2): 243–249.CrossRefGoogle Scholar
  3. [3]
    Wei-Wu Hu, Ji-Ye Zhao, Shi-Qiang Zhong, Xu Yang, Elio Guidetti, Chris Wu. Implementing a 1GHz four-issue out-of-order execution microprocessor in astandard cell ASIC methodology. Journal of Computer Science and Technology, January 2007, 22(1): 1–14.CrossRefGoogle Scholar
  4. [4]
    Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, P Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proc. the 14th Int. Symp. High Performance Computer Architecture (HPCA’08), Salt Lake City, Utah, February 16–20, 2008.Google Scholar
  5. [5]
    David A. Patterson, John L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1996.Google Scholar
  6. [6]
    Kenneth C Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, April 1996, 16(2): 28–41.CrossRefGoogle Scholar
  7. [7]
    Zhang Fuxin. Performance analysis and optimizations of microprocessors [Dissertation]. Institute of Computing Technology, Chinese Academy of Sciences, 2005.Google Scholar
  8. [8]
  9. [9]
    Lin Wei. Improving performance of Linux memory management on Godson2 system [Thesis]. Institute of Computing Technology, Chinese Academy of Sciences, 2006.Google Scholar
  10. [10]
    Shiwen Hu, Lizy K. John. Avoiding store misses to fully modified cache blocks. Technical Report: TR-030701-01, The University of Texas at Austin, July 2003.Google Scholar
  11. [11]
    Huan Dandan. Research on high performance cache and memory system. [Dissertation] Institute of Computing Technology, Chinese Academy of Sciences, 2006.Google Scholar
  12. [12]
    Allen D, Dhong S, Hofstee H, Leenstra J, Nowka K, Stasiak D, Wendel D. Custom circuit design as a driver of microprocessor performance. IBM Journal of Research and Development, November 2000, 44(6): 799–822.CrossRefGoogle Scholar
  13. [13]
    Eric Sprangle, Doug Carmean. Increase processor performance by implementing deeper pipelines. In Proc. the 29th Int. Symp. Computer Architecture, Anchorage, Alaska, May 25–29, 2002, pp.25–34.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  1. 1.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.Graduate University of Chinese Academy of SciencesBeijingChina

Personalised recommendations