PARROT: Power Awareness Through Selective Dynamically Optimized Traces

  • Roni Rosner
  • Yoav Almog
  • Micha Moffie
  • Naftali Schwartz
  • Avi Mendelson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3164)


We present the PARROT concept aimed at both higher performance and power-awareness. The PARROT microarchitectural framework integrates trace caching, dynamic optimizations and pipeline decoupling. We employ a gradual and selective approach for applying complex mechanisms only for the most frequently used traces to maximize the performance gain at any given power constraint, thus attaining finer control of tradeoffs between performance and power awareness.

We show that the PARROT microarchitecture delivers performance increases comparable to those available through conventional doubling of execution resources (average 16% IPC improvement). This improvement comes through better utilization of all available resources with the combination of a trace cache and selective trace optimization. On the other hand, performance advantage of a trace cache alone is limited to wide-machine configurations. No less critical, however, is power awareness. The PARROT microarchitecture delivers the performance increase at a comparable energy level, whereas the conventional path to higher performance consumes an average 70% more energy. Meanwhile, for those designs which can tolerate a higher power budget, PARROT gracefully scales up to use additional execution resources in a uniformly efficient manner. In particular, a PARROT-style doubly-wide machine delivers an average 45% IPC improvement while actually improving the Cubic- MIPS-per-WATT power awareness metric by over 50%.


Dynamic Optimization Instruction Cache Branch Predictor Atomic Trace Execution Core 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Almog, Y., Rosner, R., Schwartz, N., Schmorak, A.: Specialized Dynamic Optimizations for High-Performance Energy-Efficient Mi-croarchitecture. In: CGO 2004 (to appear, 2004)Google Scholar
  2. 2.
    Bala, V., Duesterwald, E., Banerjia, S.: Transparent Dynamic Optimization: The Design and Implementation of Dynamo. TR HPL-1999-78, HP LabsGoogle Scholar
  3. 3.
    Bekerman, M., Mendelson, A., Sheaffer, G.: Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures. In: PACT, October 1996, pp. 24–34 (1996)Google Scholar
  4. 4.
    Black, B., Shen, J.P.: Turboscalar: A High Frequency High IPC Microarchitecture. In: ISCA 27 (June 2000)Google Scholar
  5. 5.
    Brooks, D.M., et al.: Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro 20(6), 36–44 (2000)Google Scholar
  6. 6.
    Cai, G., Lim, C.H., Daasch, W.R.: Thermal-Scheduling For Ultra Low Power Mobile Microprocessor. In: WCED 2002 (2002)Google Scholar
  7. 7.
    Ebcioglu, K., Altman, E.R.: DAISY: Dynamic Compilation for 100% Architectural Compatibility. In: ISCA 24, pp. 26–37 (1997)Google Scholar
  8. 8.
    Fahs, B., Bose, S., Crum, M., Slechta, B., Spadini, F., Tung, T., Patel, S.J., Lumetta, S.S.: Permormance Characterization of a Hardware Mechanism for Dynamic Optimization. In: MICRO 34 (2001)Google Scholar
  9. 9.
    Friendly, D., Patel, S., Patt, Y.: Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors. In: MICRO 31 (November 1998)Google Scholar
  10. 10.
    Gschwind, M., Altman, E.R., Sathaye, S., Ledak, P., Appenzeller, D.: Dynamic and Transparent Binary Translation. IEEE Computer Magazine 33(3), 54–59 (2000)CrossRefGoogle Scholar
  11. 11.
    Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The Microarchitecture of the Pentium ® 4 Processor. Intel Technology Journal (2001)Google Scholar
  12. 12.
    Jacobson, Q., Rotenberg, E., Smith, J.E.: Path-Based Next Trace Prediction. In: MICRO 30 (1997)Google Scholar
  13. 13.
    Jourdan, S., Rappoport, L., Almog, Y., Erez, M., Yoaz, A., Ronen, R.: eXtended Block Cache. In: HPCA 6 (January 2000)Google Scholar
  14. 14.
    Kosyakovsky, O., Mendelson, A., Kolodny, A.: The Use of Profile-based Trace Classification for Improving the Power and Performance of Trace Cache Systems. In: 4th Workshop on Feedback-Directed and Dynamic Optimization, Austin (December 2001)Google Scholar
  15. 15.
    Lam, M.S., Wilson, R.P.: Limits of Control Flow on Parallelism. In: Proc. 19th ISCA, May 1992, pp. 46–57 (1992)Google Scholar
  16. 16.
    Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., Bringmann, R.A.: Effective Compiler Support for Predicated Execution using the Hyperblock. In: MICRO 25 (1992)Google Scholar
  17. 17.
    Melvin, S., Patt, Y.: Enhancing Instruction Scheduling with a Block-Structured ISA. Intern. Journal of Parallel Prog. 23(3), 221–243 (1995)CrossRefGoogle Scholar
  18. 18.
    Merten, M.C., Trick, A.R., George, C.N., Gyllenhaal, J., Hwu, W.W.: A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In: ISCA 26 (1999)Google Scholar
  19. 19.
    Merten, M.C., Trick, A.R., Nystrom, E.M., Barnes, R.D., Mwu, W.: A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots. In: ISCA 27 (May 2000)Google Scholar
  20. 20.
    Nair, R., Hopkins, M.E.: Exploiting instruction level parallelism in processors by caching scheduled groups. In: Proc. ISCA 24, pp. 13–25 (1997)Google Scholar
  21. 21.
    Parikh, A., Kandemir, M., Vijaykrishnan, N., Irwin, M.J.: VLIW Scheduling for Energy and Performance. In: Proc. IEEE Workshop on VLIW, April 2001, pp. 111–117 (2001)Google Scholar
  22. 22.
    Patel, S., Lumetta, S.: rePlay: A Hardware Framework for Dynamic Optimization. IEEE Trans. on Computers 50(6), 590–608 (2001)CrossRefGoogle Scholar
  23. 23.
    Patel, S., Tung, T., Bose, S., Crum, M.: Increasing the Size of Atomic Instruction Blocks using Control Flow Assertions. In: MICRO 33 (2000)Google Scholar
  24. 24.
    Peleg, A., Weiser, U.: Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line, U.S. Patent 5,381,533 (January 1995)Google Scholar
  25. 25.
    Postiff, M., Tyson, G., Mudge, T.: Performance Limits of Trace Caches. Journal of ILP 1 (October 1999)Google Scholar
  26. 26.
    Rosner, R., Mendelson, A., Ronen, R.: Filtering Techniques to Improve Trace-Cache Efficiency. In: Malyshkin, V.E. (ed.) PaCT 2001. LNCS, vol. 2127. Springer, Heidelberg (2001)Google Scholar
  27. 27.
    Rosner, R., Moffie, M., Sazeides, Y., Ronen, R.: Selecting Long Atomic Traces for High Coverage. In: ICS 2003, pp. 2–11 (2003)Google Scholar
  28. 28.
    Rotenberg, E., Bennett, S., Smith, J.: A trace cache microarchitecture and evaluation. IEEE Trans. on Computers 48(2), 111–120 (1999)CrossRefGoogle Scholar
  29. 29.
    Solomon, B., Ronen, R., Orenstien, D., Almog, Y., Mendelson, A.: Micro-Operation Cache: A Power Aware Frontend for Variable Instruction Length ISA. In: ISLPED 2001 (August 2001)Google Scholar
  30. 30.
    Slechta, B., et al.: Dynamic Optimizations of Micro-Operations. In: HPCA 9 (February 2003)Google Scholar
  31. 31.
    Srinivasan, V., Brooks, D., Gschwind, M., Bose, P., Zyuban, V., Strenski, P.N., Emma, P.G.: Optimizing Pipelines for Power and Performance. In: MICRO 35 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Roni Rosner
    • 1
  • Yoav Almog
    • 1
  • Micha Moffie
    • 1
  • Naftali Schwartz
    • 1
  • Avi Mendelson
    • 1
  1. 1.Microprocessor ResearchIntel LabsHaifaIsrael

Personalised recommendations