Abstract
The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. The TLB though small is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Recently, several circuit and microarchitectural implementations of TLBs have been proposed to reduce TLB power. One simple, yet effective TLB design for power reduction is the Use-Last TLB architecture proposed in IEEE J Solid State Circuits, 1190–1199, (2004). The Use-Last TLB architecture reduces the power consumption when the last page is accessed again. In this work, we develop code transformation techniques to reduce the page switchings in data cache accesses and propose an efficient page-aware code placement technique to enhance the energy reduction capabilities achieved by the Use-Last TLB architecture for instruction cache accesses. Our comprehensive page switch reduction algorithm results in an average of 39% reduction in the data-TLB page switching, and our code placement heuristic results in an average of 76% reduction in the instrucion-TLB page switchings with negligible impact on the performance on benchmarks from MiBench, Multimedia, DSPStone and BDTI suites. The reduced page switch count through our techniques achieves an equivalent power savings, above and beyond the reduction achieved by the Use-Last TLB architecture implementation.
Similar content being viewed by others
References
Ekman, M., Stenstrm, P., Dahlgren, F.: Tlb and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In: ISLPED ’02, pp. 243–246. ACM, New York (2004)
Kadayif I., Sivasubramaniam A., Kandemir M., Kandiraju G., Chen G.: Optimizing instruction tlb energy using software and hardware techniques. ACM Trans. Des. Autom. Electron. Syst. 10(2), 229–257 (2005)
Zhou, X., Petrov, P.: Low-power cache organization through selective tag translation for embedded processors with virtual memory support. In: GLSVLSI ’06, pp. 398–403. ACM, New York (2004)
Petrov, P., Tracy, D., Orailoglu, A.: Energy-effcient physically tagged caches for embedded processors with virtual memory. In: DAC ’05, pp. 17–22. ACM, New York (2005)
Haigh J.R., Wilkerson M., Miller J., Beatty T., Strazdus S., Clark L.: A low-power 2.5 ghz 90 nm level 1 cache and memory management unit. IEEE J. Solid-State Circuits. 40(5), 1190–1199 (2005)
Clark, L.T., Choi, B., Wilkerson, M.: Reducing translation lookaside buffer active power. In: ISLPED ’03, pp. 10–13. ACM, New York (2003)
Manne, S., Klauser, A., Grunwald, D., Somenzi, F.: Low power tlb design for high performance microprocessors [Online]. Available: http://citeseer.ist.psu.edu/manne97low.html (1997)
Lee, J.-H., Park, G.-H., Park, S.-B., Kim, S.-D.: A selective filter-bank tlb system. In: ISLPED ’03, pp. 312–317. ACM, New York (2003)
Choi J.-H., Lee J.-H., Jeong S.-W., Kim S.-D., Weems C.: A low power tlb structure for embedded systems. IEEE Comput. Archit. Lett. 1(1), 3 (2006)
Chang, Y.-J.: An ultra low-power tlb design. In: DATE ’06: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1122–1127. European Design and Automation Association, 3001 Leuven, Belgium, Belgium (2006)
Kadayif, I., Nath, P., Kandemir, M., Sivasubramaniam, A.: Compiler-directed physical address generation for reducing dtlb power. In: ISPASS ’04, pp. 161–168. IEEE Computer Society, Los Alamitos, CA (2004)
Delaluz, V., Kandemir, M., Vijaykrishnan, N., Irwin, M., Sivasubramaniam, A., Kolcu, I.: Compiler-directed array interleaving for reducing energy in multi-bank memories. In: ASP-DAC ’02, pp. 288–293. IEEE Computer Society, Los Alamitos, CA (2002)
Parikh, A., Kim, S., Kandemir, M., Vijaykrishnan, N., Irwin, M.: Instruction scheduling for low power. In: VLSI-SP ’04, pp. 129–149. Springer, Netherlands (2004)
Chiyonobu, A., Sato, T.: Energy-efficient instruction scheduling utilizing cache miss information. In: MEDEA ’05: Proceedings of the 2005 Workshop on MEmory performance. IEEE Computer Society, Los Alamitos, CA (2005)
Kandemir, M., Kadayif, I., Chen, G.: Compiler-directed code restructuring for reducing data tlb energy. In: CODES+ISSS ’04, pp. 98–103. IEEE Computer Society, Washington (2004)
Intel Corporation. Intel XScale®R Technology Overview [Online]. Available: http://intel.com/design/intelxscale
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: a free, commercially representative embedded benchmark suite. In: WWC ’01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pp. 3–14. IEEE Computer Society, Washington, DC (2001)
Balakrishnan, H., Garg, R.: Multimedia benchmarks: a performance comparison of multimedia programs on different architectures [Online]. Available: http://citeseer.ist.psu.edu/233784.html
Zivojnovic, V., Velarde, J., Schlager, C., Meyr, H.: Dspstone: a dsp-oriented benchmarking methodology. In: Proceedings of Signal Processing Applications and Technology, Dallas (1994)
Henning J.L.: Spec cpu2000: measuring cpu performance in the new millennium. Computer 33(7), 28–35 (2000)
BDTI Suite: Berkeley Design Technology Inc, The bdti benchmark suites [Online]. Available: http://bdti.com/products/benchmark_overview.html
Austin, T.: Simple Scalar LLC
Shrivastava, A., Earlie, E., Dutt, N., Nicolau, A.: Operation tables for scheduling in the presence of incomplete bypassing. In: CODES+ISSS, pp. 194–199 (2004)
Issenin, I., Dutt, N.: Foray-gen: automatic generation of affine functions for memory optimizations. In: DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe, pp. 808–813. IEEE Computer Society, Washington, DC (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeyapaul, R., Shrivastava, A. Code Transformations for TLB Power Reduction. Int J Parallel Prog 38, 254–276 (2010). https://doi.org/10.1007/s10766-009-0123-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0123-8