Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic

  • Vladimír Guzma
  • Pekka Jääskeläinen
  • Pertti Kellomäki
  • Jarmo Takala
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5114)


Software bypassing is a technique that allows programmer-controlled direct transfer of results of computations to the operands of data dependent operations, possibly removing the need to store some values in general purpose registers, while reducing the number of reads from the register file. Software bypassing also improves instruction level parallelism by reducing the number of false dependencies between operations caused by the reuse of registers. In this work we show how software bypassing affects cycle count and reduces register file reads and writes. We analyze previous register file bypassing methods and compare them with our improved software bypassing implementation. In addition, we propose heuristics when not to apply software bypassing to retain scheduling freedom when selecting function units for operations. The results show that we get at best 27% improvement to cycle count, as well as up to 48% less register reads and 45% less register writes with the use of bypassing.


Register File Cycle Count Execution Unit Instruction Level Parallelism Small Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hoogerbrugge, J., Corporaal, H.: Register file port requirements of Transport Triggered Architectures. In: MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pp. 191–195. ACM Press, New York (1994)Google Scholar
  2. 2.
    Patterson, D.A., Hennessy, J.L.: Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Francisco (1998)Google Scholar
  3. 3.
    Sassone, P.G., Wills, D.S.: Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In: Proc. IEEE/ACM Int. Symp. Microarchitecture, pp. 7–17. IEEE Computer Society, Washington (2004)Google Scholar
  4. 4.
    Burger, D., Keckler, S.W., McKinley, K.S., Dahlin, M., John, L.K., Lin, C., Moore, C.R., Burrill, J., McDonald, R.G., Yoder, W.: The TRIPS Team: Scaling to the end of silicon with EDGE architectures. Computer 37(7), 44–55 (2004)CrossRefGoogle Scholar
  5. 5.
    Sassone, P.G., Wills, D.S., Loh, G.H.: Static strands: Safely exposing dependence chains for increasing embedded power efficiency. Trans. on Embedded Computing Sys. 6(4), 24 (2007)CrossRefGoogle Scholar
  6. 6.
    Bracy, A., Prahlad, P., Roth, A.: Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In: MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pp. 18–29. IEEE Computer Society, Washington (2004)Google Scholar
  7. 7.
    Yan, J., Zhang, W.: Virtual registers: Reducing register pressure without enlarging the register file. In: De Bosschere, K., Kaeli, D., Stenström, P., Whalley, D., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4367, pp. 57–70. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Corporaal, H.: Microprocessor Architectures: from VLIW to TTA. John Wiley & Sons, Chichester (1997)Google Scholar
  9. 9.
    Cichon, G., Robelly, P., Seidel, H., Bronzel, M., Fettweis, G.: Compiler scheduling for STA-processors. In: PARELEC 2004: Proceedings of the international conference on Parallel Computing in Electrical Engineering, pp. 45–60. IEEE Computer Society, Washington (2004)Google Scholar
  10. 10.
    Thuresson, M., Sjalander, M., Bjork, M., Svensson, L., Larsson-Edefors, P., Stenstrom, P.: Flexcore: Utilizing exposed datapath control for efficient computing. In: Proc. Int. Conf. on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, Greece, pp. 18–25 (2007)Google Scholar
  11. 11.
    Corporaal, H., Mulder, H.J.: Move: a framework for high-performance processor design. In: Proc. ACM/IEEE Conf. Supercomputing, Albuquerque, NM, pp. 692–701 (1991)Google Scholar
  12. 12.
    Janssen, J., Corporaal, H.: Partitioned register file for TTAs. In: Proc. 28th Annual Workshop on Microprogramming (MICRO-28), pp. 303–312 (1996)Google Scholar
  13. 13.
    Maxim Corporation: MAXQ microcontroller home page (2007),
  14. 14.
    Corporaal, H., Hoogerbrugge, J.: Code generation for Transport Triggered Architectures. In: Code Generation for Embedded Processors, pp. 240–259. Springer, Heidelberg (1995)Google Scholar
  15. 15.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Amsterdam (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Vladimír Guzma
    • 1
  • Pekka Jääskeläinen
    • 1
  • Pertti Kellomäki
    • 1
  • Jarmo Takala
    • 1
  1. 1.Department of Computer SystemsTampere University of TechnologyTampereFinland

Personalised recommendations