Abstract
If-conversion and predicated execution are widely adopted to eliminate branch misprediction penalty. Previous predication execution depends on compiler to generate explicit predicated instructions. In this paper, a trace-based predicate mechanism named RIMP (Runtime IMplicit Predication) is discussed. The candidates of if-conversion will be identified during dynamic execution. Conventional trace cache has been modified to store RIMP traces, which include instructions both from fall-through and target block following the conditional branch. Hardware extension will add predication to RIMP trace automatically. With the help of RIMP, legacy applications can benefit from predication mechanism without recompiling source code. Simulation of RIMP implementation under diverse microarchitecture configurations is presented in the paper. Results have shown promising performance improvement. In general, RIMP with 64kB trace storage delivers an average 10.3% IPC improvement while actually speeding up the execution time by over 7%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sharangpani, H., Aurora, K.: Itanium processor microarchitecture. IEEE Micro 20(5), 24–43 (2000)
Chuang, W., Calder, B., Ferrante, J.: Phi-Predication for Light-Weight If-Conversion. In: Proceedings of the Intl. Symposium on code generation and optimization, March 2003, pp. 179–190 (2003)
Sias, J., Hunter, H., Hwu, W.: Enhancing loop buffering of media and telecommunication applications using low-overhead predication. In: Proceedings of the 34th MICRO (December 2001)
Jacobson, Q., Smith, J.E.: Trace preconstruction. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-2000), pp. 37–46. IEEE Computer Society Press, Vancouver (2000)
Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: a low latency approach to high bandwidth instruction fetching [A]. In: Proceedings of the 29th MICRO, pp. 24–35. IEEE Computer Society Press, Los Alamitos (1996)
Sites, R.L., Witek, R.T.: Alpha AXP Architecture Reference Manual, 2nd edn. Digital Press, Boston (1995)
Sohm, O.: Variable-Length Decding on the TMS320C6000 DSP platform. Application Report (June 2002), http://www-s.ti.com/sc/psheets/spra805/spra805.pdf
Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., Bringmann, R.A.: Effective Compiler Support for Predicated Execution Using the Hyperblock. In: 25th Intl. Conf. On Microarchitecture, December 1992, pp. 45–54 (1992)
Gwennap, L.: Intel’s P6 uses ducoupled superscalar Design. Microprocessor Report 9(2) (February 1995)
Hyper-pipelined technology: Intel Pentium 4 Processor – Product Overview (2004), http://www.intel.com/designPentium4/prodbref/
Tremblay, M., Chan, J., Chaudhry, S., Conigliaro, A.W., Tse, S.S.: The MAJC Architecture: A Synthesis of Parallelism and Scalability. IEEE Micro 20(6), 12–25 (2000)
Krewell, K.: Alhpa ev7 processor: a high-performance tradition continues. Microprocessor Report. In-Stat/MDR (April 2002)
Pnevmatikatos, D.N., Sohi, G.S.: Guarded Execution and Branch Prediction in Dynamic ILP processors. In: 21st Intl. Symp. on computer architecture, June 1994, pp. 120–129 (1994)
Mahlke, S.A., Hank, R.E., Bringmann, R.A., Gyllenhaal, J.C., Gallagher, D.M., Hwu, W.: Characterizing the Impact of Predicated Execution on Branch Prediction. In: 27th Annual Intl. Symp. On Microarchitecture, San Jose, CA (December 1994)
Tyson, G.S.: The Effects of Predicated Execution on Branch Prediction. In: 27th Annual Intl. Symp. On Microarchitecture, San Jose, CA, December 1994, pp. 196–206 (1994)
Rau, R., Yen, D., Yen, W., Towle, R.: The Cydra 5 Departmental Supercomputer. IEEE Computer 22(1), 12–35 (1989)
Klauser, A., Austin, T., Grunwald, D., Calder, B.: Dynamic Hammock Predication for Non-predicated Instruction Set Architectures. In: Proceedings of ICPACT (1998)
Chang, P.Y., Hao, E., Patt, Y., Chang, P.: Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution. In: Intl. Conf. On Parallel Arch. And Compilation Techniques, Limassol, Cyprus (June 1995)
Aramon, J.L., Gonzalez, J., Gonzalez, A., Smith, J.E.: Dual path instruction processing. In: Proceeding of the 16th Intl. Conf. On Supercomputing, New York (2002)
Austin, T., Larson, E., Ernst, D.: SimpleScalar: an infrastructure for computer system modeling. IEEE computer 35(2), 59–67 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, Y., Deng, K., Wang, X., Dou, Y., Zhou, X. (2005). RIMP: Runtime Implicit Predication. In: Cao, J., Nejdl, W., Xu, M. (eds) Advanced Parallel Processing Technologies. APPT 2005. Lecture Notes in Computer Science, vol 3756. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573937_10
Download citation
DOI: https://doi.org/10.1007/11573937_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29639-3
Online ISBN: 978-3-540-32107-1
eBook Packages: Computer ScienceComputer Science (R0)