Instruction Buffering Exploration for Low Energy Embedded Processors

  • Tom Vander Aa
  • Murali Jayapala
  • Francisco Barat
  • Geert Deconinck
  • Rudy Lauwereins
  • Henk Corporaal
  • Francky Catthoor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2799)


For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. Especially software controlled loop buffers are energy efficient. However current compilers do not fully take advantage of the possibilities of such loop buffers. This paper presents an algorithm the explore for an application or a set of applications what is the optimal loop buffer configuration and the optimal way to use this configuration. Results for the MediaBench application suite show an additional 35% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional approaches to the loop buffer without any performance implications.


Nest Loop Optimal Loop Memory Hierarchy Design Space Exploration Local Controller 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, T., Agarwala, S.: Effective hardware-based two-way loop cache for high performance low power processors. In: Proc of ICCD (September 2000)Google Scholar
  2. 2.
    Bajwa, R.S., et al.: Instruction buffering to reduce power in processors for signal processing. IEEE Transactions on VLSI 5(4), 417–424 (1997)CrossRefGoogle Scholar
  3. 3.
    Bellas, N., Hajj, I., Polychronopoulos, C., Stamoulis, G.: Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors. In: Proc of ISLPED (August 1998)Google Scholar
  4. 4.
    Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., Zafalon, R.: A power modeling and estimation framework for vliw-based embedded systems. In: Proc. Int. Workshop on Power And Timing Modeling, Optimization and Simulation PATMOS (September 2001)Google Scholar
  5. 5.
    Benini, L., de Micheli, G.: Sysmtem-level power optimization: Techniques and tools. ACM TODAES 5(2), 115–192 (2000)CrossRefGoogle Scholar
  6. 6.
    Brooks, D., Tiwari, V., Martonosi, M.: Wattch: A framework for architectural-level power analysis and optimizations. In: Proc of ISCA, pp. 83–94 (June 2000)Google Scholar
  7. 7.
    Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P.G., Van Achteren, T., Omnes, T.: Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, Dordrecht (2002)zbMATHGoogle Scholar
  8. 8.
    Cotterell, S., Vahid, F.: Tuning of loop cache architectures to programs in embedded system design. In: Proc of International Symposium on System Synthesis (ISSS) (October 2002)Google Scholar
  9. 9.
    Gordon-Ross, A., Cotterell, S., Vahid, F.: Exploiting fixed programs in embedded systems: A loop cache example. In: Proc of IEEE Computer Architecture Letters (January 2002)Google Scholar
  10. 10.
    Jacome, M.F., de Veciana, G.: Design challenges for new application-specific processors. Special issue on Design of Embedded Systems in IEEE Design & Test of Computers (April-June 2000)Google Scholar
  11. 11.
    Jayapala, M., Barat, F., OpDe Beeck, P., Catthoor, F., Deconinck, G., Corporaal, H.: A low energy clustered instruction memory hierarchy for long instruction word processors. In: Hochet, B., Acosta, A.J., Bellido, M.J. (eds.) PATMOS 2002. LNCS, vol. 2451, p. 258. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Kin, J., Gupta, M., Mangione-Smith, W.H.: Filtering memory references to increase energy efficiency. IEEE Transactions on Computers 49(1), 1–15 (2000)CrossRefGoogle Scholar
  13. 13.
    Lee, C., et al.: Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In: International Symposium on Microarchitecture, pp. 330–335 (1997)Google Scholar
  14. 14.
    Lee, L.H., Moyer, B., Arends, J., Arbor, A.: Low-cost embedded program loop caching - revisited. Technical report, EECS, University of Michigan (December 1999)Google Scholar
  15. 15.
    Lee, L.H., Moyer, W., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: Proc of ISLPED (August 1999)Google Scholar
  16. 16.
    Panda, P.R., Dutt, N.D., Nicolau, A.: Memory data organization for improved cache performance in embedded processor applications. ACM TODAES 2(4), 384–409 (1997)CrossRefGoogle Scholar
  17. 17.
    Slavenburg, G.A., Rathnam, S., Dijkstra, H.: The Trimedia TM-1 PCI VLIW media processor. In: Proceedings Hot Chips VIII Conference (1996)Google Scholar
  18. 18.
    Texas Instruments Inc., TMS320 DSP Family Overview,
  19. 19.
    Trimaran group. Trimaran: An Infrastructure for Research in Instruction-Level Parallelism (1999),

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tom Vander Aa
    • 1
  • Murali Jayapala
    • 1
  • Francisco Barat
    • 1
  • Geert Deconinck
    • 1
  • Rudy Lauwereins
    • 2
  • Henk Corporaal
    • 3
  • Francky Catthoor
    • 2
  1. 1.ESAT/ELECTAK.U.LeuvenHeverleeBelgium
  2. 2.IMEC vzwHeverleeBelgium
  3. 3.Electrical EngineeringTU EindhovenEindhovenNetherlands

Personalised recommendations