Skip to main content

Advertisement

Log in

Code Transformations for TLB Power Reduction

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. The TLB though small is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Recently, several circuit and microarchitectural implementations of TLBs have been proposed to reduce TLB power. One simple, yet effective TLB design for power reduction is the Use-Last TLB architecture proposed in IEEE J Solid State Circuits, 1190–1199, (2004). The Use-Last TLB architecture reduces the power consumption when the last page is accessed again. In this work, we develop code transformation techniques to reduce the page switchings in data cache accesses and propose an efficient page-aware code placement technique to enhance the energy reduction capabilities achieved by the Use-Last TLB architecture for instruction cache accesses. Our comprehensive page switch reduction algorithm results in an average of 39% reduction in the data-TLB page switching, and our code placement heuristic results in an average of 76% reduction in the instrucion-TLB page switchings with negligible impact on the performance on benchmarks from MiBench, Multimedia, DSPStone and BDTI suites. The reduced page switch count through our techniques achieves an equivalent power savings, above and beyond the reduction achieved by the Use-Last TLB architecture implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Ekman, M., Stenstrm, P., Dahlgren, F.: Tlb and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In: ISLPED ’02, pp. 243–246. ACM, New York (2004)

  2. Kadayif I., Sivasubramaniam A., Kandemir M., Kandiraju G., Chen G.: Optimizing instruction tlb energy using software and hardware techniques. ACM Trans. Des. Autom. Electron. Syst. 10(2), 229–257 (2005)

    Article  Google Scholar 

  3. Zhou, X., Petrov, P.: Low-power cache organization through selective tag translation for embedded processors with virtual memory support. In: GLSVLSI ’06, pp. 398–403. ACM, New York (2004)

  4. Petrov, P., Tracy, D., Orailoglu, A.: Energy-effcient physically tagged caches for embedded processors with virtual memory. In: DAC ’05, pp. 17–22. ACM, New York (2005)

  5. Haigh J.R., Wilkerson M., Miller J., Beatty T., Strazdus S., Clark L.: A low-power 2.5 ghz 90 nm level 1 cache and memory management unit. IEEE J. Solid-State Circuits. 40(5), 1190–1199 (2005)

    Article  Google Scholar 

  6. Clark, L.T., Choi, B., Wilkerson, M.: Reducing translation lookaside buffer active power. In: ISLPED ’03, pp. 10–13. ACM, New York (2003)

  7. Manne, S., Klauser, A., Grunwald, D., Somenzi, F.: Low power tlb design for high performance microprocessors [Online]. Available: http://citeseer.ist.psu.edu/manne97low.html (1997)

  8. Lee, J.-H., Park, G.-H., Park, S.-B., Kim, S.-D.: A selective filter-bank tlb system. In: ISLPED ’03, pp. 312–317. ACM, New York (2003)

  9. Choi J.-H., Lee J.-H., Jeong S.-W., Kim S.-D., Weems C.: A low power tlb structure for embedded systems. IEEE Comput. Archit. Lett. 1(1), 3 (2006)

    Article  Google Scholar 

  10. Chang, Y.-J.: An ultra low-power tlb design. In: DATE ’06: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1122–1127. European Design and Automation Association, 3001 Leuven, Belgium, Belgium (2006)

  11. Kadayif, I., Nath, P., Kandemir, M., Sivasubramaniam, A.: Compiler-directed physical address generation for reducing dtlb power. In: ISPASS ’04, pp. 161–168. IEEE Computer Society, Los Alamitos, CA (2004)

  12. Delaluz, V., Kandemir, M., Vijaykrishnan, N., Irwin, M., Sivasubramaniam, A., Kolcu, I.: Compiler-directed array interleaving for reducing energy in multi-bank memories. In: ASP-DAC ’02, pp. 288–293. IEEE Computer Society, Los Alamitos, CA (2002)

  13. Parikh, A., Kim, S., Kandemir, M., Vijaykrishnan, N., Irwin, M.: Instruction scheduling for low power. In: VLSI-SP ’04, pp. 129–149. Springer, Netherlands (2004)

  14. Chiyonobu, A., Sato, T.: Energy-efficient instruction scheduling utilizing cache miss information. In: MEDEA ’05: Proceedings of the 2005 Workshop on MEmory performance. IEEE Computer Society, Los Alamitos, CA (2005)

  15. Kandemir, M., Kadayif, I., Chen, G.: Compiler-directed code restructuring for reducing data tlb energy. In: CODES+ISSS ’04, pp. 98–103. IEEE Computer Society, Washington (2004)

  16. Intel Corporation. Intel XScale®R Technology Overview [Online]. Available: http://intel.com/design/intelxscale

  17. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: a free, commercially representative embedded benchmark suite. In: WWC ’01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pp. 3–14. IEEE Computer Society, Washington, DC (2001)

  18. Balakrishnan, H., Garg, R.: Multimedia benchmarks: a performance comparison of multimedia programs on different architectures [Online]. Available: http://citeseer.ist.psu.edu/233784.html

  19. Zivojnovic, V., Velarde, J., Schlager, C., Meyr, H.: Dspstone: a dsp-oriented benchmarking methodology. In: Proceedings of Signal Processing Applications and Technology, Dallas (1994)

  20. Henning J.L.: Spec cpu2000: measuring cpu performance in the new millennium. Computer 33(7), 28–35 (2000)

    Article  Google Scholar 

  21. BDTI Suite: Berkeley Design Technology Inc, The bdti benchmark suites [Online]. Available: http://bdti.com/products/benchmark_overview.html

  22. Austin, T.: Simple Scalar LLC

  23. Shrivastava, A., Earlie, E., Dutt, N., Nicolau, A.: Operation tables for scheduling in the presence of incomplete bypassing. In: CODES+ISSS, pp. 194–199 (2004)

  24. Issenin, I., Dutt, N.: Foray-gen: automatic generation of affine functions for memory optimizations. In: DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe, pp. 808–813. IEEE Computer Society, Washington, DC (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reiley Jeyapaul.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeyapaul, R., Shrivastava, A. Code Transformations for TLB Power Reduction. Int J Parallel Prog 38, 254–276 (2010). https://doi.org/10.1007/s10766-009-0123-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-009-0123-8

Keywords

Navigation