RIMP: Runtime Implicit Predication

  • YuXing Tang
  • Kun Deng
  • XiaoDong Wang
  • Yong Dou
  • XingMing Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3756)


If-conversion and predicated execution are widely adopted to eliminate branch misprediction penalty. Previous predication execution depends on compiler to generate explicit predicated instructions. In this paper, a trace-based predicate mechanism named RIMP (Runtime IMplicit Predication) is discussed. The candidates of if-conversion will be identified during dynamic execution. Conventional trace cache has been modified to store RIMP traces, which include instructions both from fall-through and target block following the conditional branch. Hardware extension will add predication to RIMP trace automatically. With the help of RIMP, legacy applications can benefit from predication mechanism without recompiling source code. Simulation of RIMP implementation under diverse microarchitecture configurations is presented in the paper. Results have shown promising performance improvement. In general, RIMP with 64kB trace storage delivers an average 10.3% IPC improvement while actually speeding up the execution time by over 7%.


predication trace cache runtime execution RIMP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sharangpani, H., Aurora, K.: Itanium processor microarchitecture. IEEE Micro 20(5), 24–43 (2000)CrossRefGoogle Scholar
  2. 2.
    Chuang, W., Calder, B., Ferrante, J.: Phi-Predication for Light-Weight If-Conversion. In: Proceedings of the Intl. Symposium on code generation and optimization, March 2003, pp. 179–190 (2003)Google Scholar
  3. 3.
    Sias, J., Hunter, H., Hwu, W.: Enhancing loop buffering of media and telecommunication applications using low-overhead predication. In: Proceedings of the 34th MICRO (December 2001)Google Scholar
  4. 4.
    Jacobson, Q., Smith, J.E.: Trace preconstruction. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-2000), pp. 37–46. IEEE Computer Society Press, Vancouver (2000)CrossRefGoogle Scholar
  5. 5.
    Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: a low latency approach to high bandwidth instruction fetching [A]. In: Proceedings of the 29th MICRO, pp. 24–35. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  6. 6.
    Sites, R.L., Witek, R.T.: Alpha AXP Architecture Reference Manual, 2nd edn. Digital Press, Boston (1995)Google Scholar
  7. 7.
    Sohm, O.: Variable-Length Decding on the TMS320C6000 DSP platform. Application Report (June 2002),
  8. 8.
    Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., Bringmann, R.A.: Effective Compiler Support for Predicated Execution Using the Hyperblock. In: 25th Intl. Conf. On Microarchitecture, December 1992, pp. 45–54 (1992)Google Scholar
  9. 9.
    Gwennap, L.: Intel’s P6 uses ducoupled superscalar Design. Microprocessor Report 9(2) (February 1995)Google Scholar
  10. 10.
    Hyper-pipelined technology: Intel Pentium 4 Processor – Product Overview (2004),
  11. 11.
    Tremblay, M., Chan, J., Chaudhry, S., Conigliaro, A.W., Tse, S.S.: The MAJC Architecture: A Synthesis of Parallelism and Scalability. IEEE Micro 20(6), 12–25 (2000)CrossRefGoogle Scholar
  12. 12.
    Krewell, K.: Alhpa ev7 processor: a high-performance tradition continues. Microprocessor Report. In-Stat/MDR (April 2002)Google Scholar
  13. 13.
    Pnevmatikatos, D.N., Sohi, G.S.: Guarded Execution and Branch Prediction in Dynamic ILP processors. In: 21st Intl. Symp. on computer architecture, June 1994, pp. 120–129 (1994)Google Scholar
  14. 14.
    Mahlke, S.A., Hank, R.E., Bringmann, R.A., Gyllenhaal, J.C., Gallagher, D.M., Hwu, W.: Characterizing the Impact of Predicated Execution on Branch Prediction. In: 27th Annual Intl. Symp. On Microarchitecture, San Jose, CA (December 1994)Google Scholar
  15. 15.
    Tyson, G.S.: The Effects of Predicated Execution on Branch Prediction. In: 27th Annual Intl. Symp. On Microarchitecture, San Jose, CA, December 1994, pp. 196–206 (1994)Google Scholar
  16. 16.
    Rau, R., Yen, D., Yen, W., Towle, R.: The Cydra 5 Departmental Supercomputer. IEEE Computer 22(1), 12–35 (1989)Google Scholar
  17. 17.
    Klauser, A., Austin, T., Grunwald, D., Calder, B.: Dynamic Hammock Predication for Non-predicated Instruction Set Architectures. In: Proceedings of ICPACT (1998)Google Scholar
  18. 18.
    Chang, P.Y., Hao, E., Patt, Y., Chang, P.: Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution. In: Intl. Conf. On Parallel Arch. And Compilation Techniques, Limassol, Cyprus (June 1995)Google Scholar
  19. 19.
    Aramon, J.L., Gonzalez, J., Gonzalez, A., Smith, J.E.: Dual path instruction processing. In: Proceeding of the 16th Intl. Conf. On Supercomputing, New York (2002)Google Scholar
  20. 20.
    Austin, T., Larson, E., Ernst, D.: SimpleScalar: an infrastructure for computer system modeling. IEEE computer 35(2), 59–67 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • YuXing Tang
    • 1
  • Kun Deng
    • 1
  • XiaoDong Wang
    • 1
  • Yong Dou
    • 1
  • XingMing Zhou
    • 1
  1. 1.National Lab for Parallel and distributed ProcessingChina

Personalised recommendations