Control Flow: Branching and Control Hazards

  • Amos R. Omondi
Chapter

Abstract

For an instruction pipeline to attain its maximum performance, it is, at the very least, necessary that it be supplied with instructions at a rate that matches its maximum processing rate. The main impediment to ensuring adequate instruction-supply is usually the high access time (relative to the pipeline cycle time) of the memory from which instructions are fetched. At any given moment, the addresses of the next instructions required are easy to determine if there are no branch (control-transfer)1 instructions involved: simply incrementing the program counter, or similar addressing register, suffices. A branch instruction, on the other hand, presents a problem, since the addresses of the following instructions cannot be known with absolute certainty until after the branch has been executed; furthermore, the execution may depend on a condition yet to be determined by preceding instructions. Consequently, unless special measures are taken, a branch instruction will introduce a gap — the delay of which we shall term the branch latency — in the flow of instructions. In this chapter we shall discuss a number of measures for dealing with this, which is arguably the hardest problem in the design of high-performance instruction pipelines [116].

Keywords

Program Counter Instruction Cache Target Address Conditional Branch Branch Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Alpert, D. and D. Avnon. 1993. Architecture of the Pentium microprocessor. IEEE Micro, June: 11–21.Google Scholar
  2. 2.
    Alsup, M. 1990. Motorola’s 88000 family architecture. IEEE Micro, 10(3):48–66.CrossRefGoogle Scholar
  3. 3.
    AMD 1987. Am29000 Streamlined Instruction Processor: User’s Manual. Advanced Micro Devices, Sunnyvale, California, USA.Google Scholar
  4. 4.
    AMD 1997. AMD-K6 MMX Processor. Advanced Micro Devices, Sunnyvale, California.Google Scholar
  5. 5.
    Asprey, T. et al. 1993. Performance features of the PA7100 microprocessor. IEEE Micro, 13(3):22–35.CrossRefGoogle Scholar
  6. 6.
    Ball, T. and J.R. Larus. 1993. Branch prediction for free. In: Proceedings, ACM SIGPLAN Conference on Programming Language Design and Implementation. Google Scholar
  7. 7.
    Becker, M.C. et al. 1993. The PowerPC 601 microprocessor. IEEE Micro, 13(5):54–68.CrossRefGoogle Scholar
  8. 8.
    Blickenstein, D.S. et al. 1992. The GEM optimizing compiler system. Digital Technical Journal, 4(4):121–136.Google Scholar
  9. 9.
    Bray, B. and M.J. Flynn. 1991. Strategies for branch target buffers. In: Proceedings, 24th Workshop on Microprogramming and Microarchitecture, pp 42–49.Google Scholar
  10. 10.
    Calder B. and D. Grunwald. 1994a. Reducing branch costs via branch alignment. In: Proceedings, 6th International Symposium on Architectural Support for Programming Languages and Operating Systems, pp 242–251.Google Scholar
  11. 11.
    Calder B. and D. Grunwald. 1994b. Fast and accurate branch prediction. In: Proceedings, 21st Annual International Symposium on Computer Architecture, pp 2–11.Google Scholar
  12. 12.
    Calder B., D. Grunwald, and J. Elmer. 1995. A system level perspective on branch architecture performance. In: Proceedings, 28th International Symposium on Microarchitecture, pp 199–206.CrossRefGoogle Scholar
  13. 13.
    Calder, B. and D. Grunwald. 1995. Next cache line and set prediction. In: Proceedings, 22nd International Symposium Computer Architecture, pp 287–296.CrossRefGoogle Scholar
  14. 14.
    CDC 1975. Control Data 7600 Series and Cyber 70/Model 76 Computer Systems: Hardware Reference Manual. Control Data Corporation, Minneapolis, Minnesota, USA.Google Scholar
  15. 15.
    Chang, P.-Y., E. Hao, T. Yeh, and Y.N. Patt. 1994. Branch classification: a new mechanism for improving branch prediction performance. In: Proceedings, 27th International Symposium on Microarchitecture. Google Scholar
  16. 16.
    Chang, P.-Y., E. Hao, and Y.N. Patt. 1995. Alternative implementations of hybrid branch predictors. In: Proceedings, 28th International Symposium on Microarchitecture, pp 252–257.CrossRefGoogle Scholar
  17. 17.
    Chang, P.-Y., E. Hao, and Y.N. Patt. 1995. Target prediction for indirect jumps. In: Proceedings, 22nd International Symposium Computer Architecture, pp 274–283.Google Scholar
  18. 18.
    Chang, P.-Y., M. Evers, and Y. Patt. 1996. Improving branch prediction accuracy by reducing pattern history table interference. In: Proceedings, International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
  19. 19.
    Chow, P. and M. Horowitz. 1987. Architectural tradeoffs in the design of the MIPS-X. In: Proceedings, 14th International Symposium on Computer Architecture, pp 300–308.Google Scholar
  20. 20.
    Christie, D. 1996. Developing the AMD K5 architecture. IEEE Micro, 16(2):16–26.CrossRefGoogle Scholar
  21. 21.
    Circello, J. et al. 1995. The superscalar architecture of the MC68060. IEEE Micro, 15(2):10–21.CrossRefGoogle Scholar
  22. 22.
    Cortadella, J. and T. Jove. 1988. Designing a branch target buffer for executing branches with zero times cost in a RISC processor. Microprocessing and Microprogramming, 24:573–580.CrossRefGoogle Scholar
  23. 23.
    Cragon, H.G. 1992. Branch Strategy Taxonomy and Performance Models. IEEE Computer Society Press, Los Alamitos, California.Google Scholar
  24. 24.
    CRAY. 1989. Cray-1 Computer Systems: Mainframe Reference Manual. Cray Research, Inc., Mendota Heights, Minnesota.Google Scholar
  25. 25.
    Davidson, J.W. and D.B. Whalley. 1990. Reducing the cost of branches by using registers. In: Proceedings, 17th Annual International Symposium on Computer Architecture, pp 182–191.CrossRefGoogle Scholar
  26. 26.
    DeRosa, J. and H. Levy. 1987. An evaluation of branch architectures. In: Proceedings, 14th Annual International Symposium on Computer Architecture, pp 10–16.Google Scholar
  27. 27.
    Diefendorff, K. and M. Allen, 1992. Organization of the Motorola 88110 superscalar RISC microprocessor. IEEE Micro, 12(4):40–63.CrossRefGoogle Scholar
  28. 28.
    Diep, T.A., C. Nelson, and J.P. Shen. 1995. Performance evaluation of the PowerPC 620 microarchitecture. In: Proceedings, 22nd Annual International Symposium on Computer Architecture, pp 163–174.CrossRefGoogle Scholar
  29. 29.
    Ditzel, D.R. and H.R. McLellan. 1987. Branch folding in the CRISP microprocessor: reducing branch delay to zero. In: Proceedings, 14th Annual International Symposium on Computer Architecture, pp 2–9.Google Scholar
  30. 30.
    Driesen, K. and U. Holzle. 1998. Accurate indirect branch prediction. Proceedings, 25th Annual International Symposium on Computer Architecture, pp 167–178.Google Scholar
  31. 31.
    Dubey, P.K. and M.J. Flynn. 1991. Branch strategies: modeling and optimization. IEEE Transactions on Computers, 40(10):1159–1167.CrossRefGoogle Scholar
  32. 32.
    Dutta, S. and M. Franklin. 1995. Control flow prediction with tree-like subgraph for superscalar processors. In: Proceedings, 28th International Symposium on Microarchitecture, pp 258–263.CrossRefGoogle Scholar
  33. 33.
    Eden, A.N. and T. Mudge. 1998. The YAGS branch prediction scheme. In: Proceedings, 31 st International Symposium on Microarchitecture. Google Scholar
  34. 34.
    Edmondson, J.H. et al. 1995. Internal organization of the Alpha 21164, a 300MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1):119–135.Google Scholar
  35. 35.
    Edmondson, J.H., P. Rubinfield, R. Preston, and V. Rajagopalan. 1995. Superscalar instruction execution in the Alpha 21164 microprocessor. IEEE Micro, 15(2):33–43.CrossRefGoogle Scholar
  36. 36.
    Emma, P.G. and E.S. Davidson. 1987. Characterization of branch and data dependencies for evaluating pipeline performance. IEEE Transactions on Computers, 36(7):859–876.CrossRefGoogle Scholar
  37. 37.
    Emer, J.S. and D.W. Clark. 1984. A characterization of processor performance in the VAX-11/780. In: Proceedings, 11th Annual International Symposium on Computer Architecture, pp 301–309.Google Scholar
  38. 38.
    Evers, M., S. Patel, R. Cappel, and Y. Patt. 1998. What makes two-level branch prediction work. Proceedings, 25th Annual International Symposium on Computer Architecture, pp 52–61.Google Scholar
  39. 39.
    Fagin, B. and A. Mital. 1995. The performance of counter- and correlation-based schemes for branch target buffers. IEEE Transactions on Computers, 42(12):1383–1393.CrossRefGoogle Scholar
  40. 40.
    Fagin, B. and R. Russell. 1995. Partial resolution in branch target buffers. In: Proceedings, 28th International Symposium on Microarchitecture, pp 193–198.CrossRefGoogle Scholar
  41. 41.
    Farrens, M.K. and A.R. Pleszkun. 1994. Implementation of the PIPE processor. IEEE Computer, 24(1):65–71.CrossRefGoogle Scholar
  42. 42.
    Fisher, J.A. and S.M. Feudenberger. 1992. Predicting conditional branch directions from previous runs of a program. In: Proceedings, 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 85–95.Google Scholar
  43. 43.
    Gerosa, G. et al. 1997. A 250MHz 5W PowerPC microprocessor with on-chip L2 cache controller. IEEE Journal of Solid-State Circuits, 32 (11):1635–1649.CrossRefGoogle Scholar
  44. 44.
    Gloy, N., M.D. Smith, and C. Young. 1995. Performance issues in correlated branch prediction schemes. In: Proceedings, 28th International Symposium on Microarchitecture, pp 3–14.CrossRefGoogle Scholar
  45. 45.
    Gonzalez, A.M. and J.M. Llaberia. 1993. Reducing branch delay to zero in pipelined processors. IEEE Transactions on Computers, 42(3):363–371.CrossRefGoogle Scholar
  46. 46.
    Gonzalez, A.M., J.M. Llaberia, and J. Cortadella. 1988. A mechanism for reducing the cost of branches in RISC architectures. Microprocessing and Microprogramming, 24:565–572.CrossRefGoogle Scholar
  47. 47.
    Grohoski, G.F. 1990. Machine organization of the IBM RISC System/6000 processor. IBM Journal of Research and Development, 43(1):37–58.CrossRefGoogle Scholar
  48. 48.
    Gross, T.-L. and J.L. Hennessy. 1982. Optimizing delayed branches. In: Proceedings: 15th Annual Workshop on Microprogramming, pp 114–120.Google Scholar
  49. 49.
    Gwenap, L. 1997. Centaur improves C6 with no extra cost. Microprocessor Report, 11(15).Google Scholar
  50. 50.
    Gwenap, L. 1996. Digital 21264 sets new standard. Microprocessor Report, 10(14).Google Scholar
  51. 51.
    Halfill, T.R. Beyond Pentium II. Byte, Dec 1997.Google Scholar
  52. 52.
    Halfhill, T.R. 1996. AMD K6 takes on Intel P6. Byte, January:67–72.Google Scholar
  53. 53.
    R.H. Halstead, G.R. Gao, R.A. Iannucci, and B. Smith, Editors. 1994. Multithreaded Computer Architecture: A Summary of the State of the Art. Kluwer Academic Publishers, Boston, Massachusetts.MATHGoogle Scholar
  54. 54.
    Hitachi 1998. Series Overview: SuperH RISC Engine Embedded Processors. Hitachi, Japan.Google Scholar
  55. 55.
    Hsu, P. Y.-T. 1994. Designing the TFP microprocessor IEEE Micro, 14(2):23–33.CrossRefGoogle Scholar
  56. 56.
    Holgate and Ibbett. 1980. An analysis of instruction-fetching strategies in pipelined computers. IEEE Transactions on Computers, C-29(4):325–329.CrossRefGoogle Scholar
  57. 57.
    Hwu, W.M., T.M. Conte, and P.P. Chang. 1989. Comparing software and hardware schemes for reducing the cost of branches. In: Proceedings, 16th Annual International Symposium on Computer Architecture, pp 224–233.Google Scholar
  58. 58.
    Iacobovici, S. 1988. A pipelined interface for high floating-point performance with precise exceptions. IEEE Micro, 8(3):77–87.CrossRefGoogle Scholar
  59. 59.
    Ibbett, R.N. and N.P. Topham. 1989. The Architecture of High Performance Computers (Springer-Verlag, New York), volume 1, Chapter 4.Google Scholar
  60. 60.
    Inayoshi et al. 1988. Realization of the Gmicro/200. IEEE Micro, 8(2):12–21.CrossRefGoogle Scholar
  61. 61.
    Jouppi, N.P. and D.W. Wall. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. In: Proceedings, 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pp 272–282.Google Scholar
  62. 62.
    Juan, T., S. Sanjeevan, and J. Navarro. 1998. Dynamic history-length fitting: a third level of adaptivity for branch prediction. Proceedings, 25th Annual International Symposium on Computer Architecture, pp 155–166.Google Scholar
  63. 63.
    Kaeli, R. and P.G. Emma. 1991. Branch history table predictions of moving branch targets due to subroutine returns. In: Proceedings, 18th Annual International Symposium on Computer Architecture, pp 34–41.Google Scholar
  64. 64.
    Kanenko, H. et al. 1990. Realizing the V80 and its system support functions. IEEE Micro, April:56–59.Google Scholar
  65. 65.
    Katvenis, M. 1985. Reduced Instruction Set Architecture for VLSI. MIT Press, Boston, Massachusetts.Google Scholar
  66. 66.
    Katvenis, M. and N. Tzartzanis. 1991. Reducing the branch penalty by rearranging instructions in a double-width memory. In: Proceedings, 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 15–27.Google Scholar
  67. 67.
    Kuiran, L. et al. 1991. Classification and performance on instruction buffering techniques. In: Proceedings, 18th Annual International Symposium on Computer Architecture, pp 150–159.Google Scholar
  68. 68.
    Lee, C.-C., I.-C. K. Chen, and T.N. Mudge. 1997. The bi-mode branch predictor. In: Proceedings, 30th International Symposium on Microarchitecture. Google Scholar
  69. 69.
    Lee, J.F.K. and A.J. Smith. 1984. Branch prediction strategies and branch target buffer design. IEEE Computer, 17(1):6–22.CrossRefGoogle Scholar
  70. 70.
    Lee, R. 1989. Precision architecture. IEEE Computer, 22 (1): 78–91.CrossRefGoogle Scholar
  71. 71.
    Lewis, D. K., J.P. Costello, and D.M. O’Connor. 1988. Design tradeoffs for a 40 MIPS (peak) CMOS 32-bit microprocessor. In: Proceedings, International Conference on Computer Design, pp 110–113.Google Scholar
  72. 72.
    Lilja, D.J. 1988. Reducing branch penalty in pipelined processors. IEEE Computer, 21(7):47–55.CrossRefGoogle Scholar
  73. 73.
    Mahlke, S.A. et al. 1995. A comparison of full and partial predicated execution support for ILP processors. In: Proceedings, 22nd Annual International Symposium on Computer Architecture, pp 138–150.CrossRefGoogle Scholar
  74. 74.
    Mahlke, S.A. et al. 1994. Charaterizing the impact of predicated execution on branch prediction. In: Proceedings, 27th Annual International Symposium on Microarchitecture, pp 217–227.Google Scholar
  75. 75.
    McFarling, S. 1993. Combining branch predictors. WRL Technical Note TN-36, Western Research Laboratory, Digital Equipment Corporation, Palo Alto California.Google Scholar
  76. 76.
    McFarling, S. and J. Hennessy. 1986. Reducing the cost of branches. In: Proceedings, 13th Annual International Symposium on Computer Architecture, pp 396–403.Google Scholar
  77. 77.
    McMahan, S.C., M. Bluhm, and R.A. Garibay. 1995. 6x86: the Cyrix solution to executing x86 binaries on a high performance microprocessor. Proceedings of the IEEE, 83(12):1664–1672.CrossRefGoogle Scholar
  78. 78.
    McLellan, E. 1993. The Alpha AXP architecture and 21064 microprocessor. IEEE Micro, 13(3):36–47.CrossRefGoogle Scholar
  79. 79.
    Melear, C. 1989. The design of the 88000 RISC family. IEEE Micro, 9(2):26–38.CrossRefGoogle Scholar
  80. 80.
    MIPS. 1995. MIPS R10000 Microprocessor User’s Manual. MIPS Technologies, Mt. View, California.Google Scholar
  81. 80a.
    Michaud, P., A. Seznec, and R. Uhlig. 1995. Trading conflict and capacity aliasing in conditional branch predictors. In: Proceedings, 22nd Annual International Symposium on Computer Architecture, pp 292–303.Google Scholar
  82. 81.
    Mirapuri, S., M. Woodacre, and N. Vasseghi. 1992. The MIPS R4000 processor. IEEE Micro, April: 10–22.Google Scholar
  83. 82.
    Miyata, M. et al. 1988. The TX1 32-bit microprocessor: performance analysis and debugging support. IEEE Micro, 8(2):37–46.MathSciNetCrossRefGoogle Scholar
  84. 83.
    Morris, D. and R.N. Ibbett. 1979. The MU5 Computer System. Springer-Verlag, New York.Google Scholar
  85. 84.
    Murray, J.E., R.C. Hetherington, and R.M. Salett. 1990. VAX instructions that illustrate the architectural features of the VAX 9000 CPU. Digital Technical Journal, vol. 2, no. 4, pp 13–24.Google Scholar
  86. 85.
    Nair, R. 1995. Optimal 2-bit branch predictors. IEEE Transactions on Computers, 44(5):698–702.MATHCrossRefGoogle Scholar
  87. 86.
    Nair, R. 1995. Dynamic path-based branch correlation. In: Proceedings, 28th International Symposium on Microarchitecture, pp 15–23.CrossRefGoogle Scholar
  88. 87.
    Oehler, R.R. and R.D. Groves. 1990. IBM RISC System/6000. IBM Journal of Research and Development, 43(1):23–36.CrossRefGoogle Scholar
  89. 88.
    Okamoto et al 1988. Design considerations for 32-bit microprocessor TX3. In: Digest of Papers, COMPCON, pp 25–29.Google Scholar
  90. 89.
    Pan, S.T., K. So, and J.T Rahmeh. 1996. Improving the accuracy of dynamic branch prediction using branch correlation. In: Proceedings, 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 76–84.Google Scholar
  91. 90.
    Perlebeg, C.H. and A.J. Smith. 1993. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396–412.Google Scholar
  92. 91.
    Pnevmatikatas, D.N. and G.S. Sohi. 1994. Guarded execution and branch prediction in dynamic ILP processors. In: Proceedings, 21st International Symposium on Computer Architecture, pp 120–129.CrossRefGoogle Scholar
  93. 92.
    Potter, M., M. Vaden, J. Young, and N. Ullah. 1994. Resolution of control and data dependencies in the PowerPC 601. IEEE Micro, 14(5):18–29.CrossRefGoogle Scholar
  94. 93.
    Radin, G. 1983. The IBM 801 minicomputer. IBM Journal of Research and Development, 27(3):237–246.MathSciNetCrossRefGoogle Scholar
  95. 94.
    Ramamoorthy, C.V. and H.F. Li. 1977. Pipeline architecture. Computing Surveys 9(1):61–102.MATHCrossRefGoogle Scholar
  96. 95.
    Rau, B.R. and G.E. Rossman. 1977. The effect of instruction fetch strategies upon the performance of pipelined instruction units. In: Proceedings, 4th Annual International Symposium on Computer Architecture, pp 80–89.Google Scholar
  97. 96.
    Rau, R., D.W.L. Yau, W. Yen, and R.A. Towle. 1989. The Cydra 5 departmental supercomputer. IEEE Computer, 22(1):12–35.CrossRefGoogle Scholar
  98. 97.
    Russell, R.D. 1978. The PDP-11: A case study of how not to design condition codes. Proceedings, 5th Annual International Symposium on Computer Architecture, pp 190–194.Google Scholar
  99. 98.
    Sechrest, S., C.C. Lee, and T. Mudge. 1995. The role of adaptivity in two-level branch prediction. In: Proceedings, 28th International Symposium on Microarchitecture, pp 264–270.CrossRefGoogle Scholar
  100. 99.
    Sechrest, S., C.C. Lee, and T. Mudge. 1996. Correlation and aliasing in dynamic branch predictors. In: Proceedings, 23rd International Symposium on Computer Architecture, pp 22–31.Google Scholar
  101. 100.
    Sequin, C.H. and D.A. Patterson. 1983. Design and implementation of RISC I. In: B. Randell and P.C. Treleaven, Eds., VSLI Architecture (Prentice-Hall International, U.K.), pp 276–298.Google Scholar
  102. 101.
    Sites, R.L. 1993. Alpha AXP Architecture. Communications of the ACM, 36(2):33–44.CrossRefGoogle Scholar
  103. 102.
    Smith, J.E. 1981. A study of branch prediction strategies. In: Proceedings, 8th Annual International Symposium on Computer Architecture, pp 135–148.Google Scholar
  104. 103.
    Sprangle, E. et al. 1995. The Agree predictor: a mechanism for reducing negative branch history interference. In: Proceedings, 22nd Annual International Symposium on Computer Architecture, pp 284–291.Google Scholar
  105. 104.
    Sohie, G.R.L. and K.L. Kloker. 1988. A digital signal processor with IEEE floating-point arithmetic. IEEE Micro, 8(6):49–57.CrossRefGoogle Scholar
  106. 105.
    Song, S.P., M. Denman, and J. Chang. 1994. The PowerPC 604 microprocessor. IEEE Micro, 14(5):8–17.CrossRefGoogle Scholar
  107. 106.
    Srivastava, A. and A.M. Despain. 1993. Prophetic branches: a branch architecture for code compaction and efficient execution. In: Proceedings, 26th International Symposium on Microarchitecture. Google Scholar
  108. 107.
    Stark, J., M. Evers, and Y.N. Patt. 1998. Variable length path prediction. In: Proceedings, 8th International Conference on Parallel Architectures and Compilation Techniques Google Scholar
  109. 108.
    Su, C.-L. and A.M. Despain. 1994. Branch with masked squashing in superpipelined processors. In: Proceedings, 21st Annual International Symposium on Computer Architecture, pp 130–140.Google Scholar
  110. 109.
    Talcott, A.r., M. Nemirovsky, and R.C. Wood. 1995. The influence of branch prediction table interference on branch prediction performance. In: Proceedings, 3rd International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
  111. 110.
    Talcott, A.R. et al. 1994. The impact of unresolved branches on branch prediction performance. In: Proceedings,,22nd Annual International Symposium on Computer Architecture, pp 12–21.Google Scholar
  112. 111.
    Thornton, J.E. 1970. Design of a Computer: the Control Data 6600. Scott, Foresman, and Co.; Glenview, Illinois.Google Scholar
  113. 112.
    Topham, N.P., A. Omondi, and R.N. Ibbett. 1988. On the design and performance of conventional pipelined architectures. Journal of Supercomputing, 1(4):353–393.CrossRefGoogle Scholar
  114. 113.
    Tremblay, M., D. Greenley, and K. Normoyle. 1995. The design of the microarchitecure of the U1traSPARC-1. Proceedings of the IEEE, 83(12): 1653–1663.CrossRefGoogle Scholar
  115. 114.
    Tyson, G. S. 1994. The effects of predicated execution on branch prediction. In: Proceedings, 27th Annual International Symposium on Microarchitecture, pp 196–206.Google Scholar
  116. 115.
    Uchiyama, K. et al. 1993. The Gmicro/500 superscalar microprocessor with branch buffers. IEEE Micro, 13(5)12–22.CrossRefGoogle Scholar
  117. 116.
    Uht, A.K., V. Sindagi, and S. Somanathan. 1997. Branch effect reduction techniques. IEEE Computer, May: 71–80.Google Scholar
  118. 117.
    Vogel, J.P. and B.K. Holmer. 1994. Analysis of skip instructions in the HP Precision Architecture. In: Proceedings, 27th Annual International Symposium on Microarchitecture, pp 207–216.Google Scholar
  119. 118.
    Wilken, K.D. 1992. Toward zero-cost branches using instruction registers. In: Proceedings, 25th International Symposium on Microarchitecture, pp pp 214–217.CrossRefGoogle Scholar
  120. 119.
    Williams, T., N. Patkar, and G. Shen. 1995. SPARC64: A 64-b 64-activeinstruction out-of-order-execution MCM processor. IEEE Journal of Solid-State Circuits, 30 (11):1215–1226.CrossRefGoogle Scholar
  121. 120.
    Wu, Y. and J.R. Larus. 1994. Static branch frequency and program profile analysis. In: Proceedings, 27th International Symposium on Microarchitecture, pp 1–11.Google Scholar
  122. 121.
    Yeager, K.C. 1996. The MIPS R10000 superscalar microprocessor. IEEE Micro, 16(2):28–40.CrossRefGoogle Scholar
  123. 122.
    Yeh, T.-Y. and Y.N. Patt. 1991. Two-level adaptive branch prediction. In: Proceedings, 24th International Symposium on and Workshop on Microarchitecture, pp 51–61.CrossRefGoogle Scholar
  124. 123.
    Yeh, T.-Y. and Y.N. Patt. 1992. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. In: Proceedings, 25th International Symposium on and Workshop on Microarchitecture, pp pp 129–139.CrossRefGoogle Scholar
  125. 124.
    Yeh, T.-Y. and Y.N. Patt. 1992. Alternative implementations of two-level adaptive branch prediction. In: Proceedings, 19th Annual International Symposium on Computer Architecture, pp 124–134.CrossRefGoogle Scholar
  126. 125.
    Yeh, T.-Y. and Y.N. Patt. 1993. A comparison of dynamic branch predictors that use two levels of branch history. In: Proceedings, 20th Annual International Symposium on Computer Architecture, pp 257–266.CrossRefGoogle Scholar
  127. 126.
    Yeh, T.-Y. and Y.N. Patt. 1993. Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors. In: Proceedings, 26th Annual International Symposium on Computer Architecture, pp 164–175.Google Scholar
  128. 127.
    Yeh, T.-Y., D.T. Marr, and Y.N. Patt. 1993. Increasing instruction fetch rates via multiple branch predictions and a branch address cache. In: Proceedings, 7th Annual ACM International Conference on Supercomputing, pp 67–76.Google Scholar
  129. 128.
    Yoshida, T. et al. 1992. The GMicro/100 32-bit microprocessor. IEEE Micro, August:62–72.Google Scholar
  130. 129.
    Young, H.C. and J.R. Goodman. 1984. A simulation study of architectural data queues and prepare-to-branch instruction. In, Proceedings, International Conference on Computer Design, pp 544–549.Google Scholar
  131. 130.
    Young, C. and M. Smith. 1994. Improving the accuracy of branch prediction using branch correlation. In: Proceedings, 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp 232–241.CrossRefGoogle Scholar
  132. 131.
    Young, C., N. Gloy, and M.D. Smith. 1995. A comparative analysis of schemes for correlated branch prediction. In: Proceedings, 22nd Annual International Symposium on Computer Architecture, pp 276–286.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 1999

Authors and Affiliations

  • Amos R. Omondi
    • 1
  1. 1.Department of Computer ScienceFlinders UniversityAdelaideAustralia

Personalised recommendations