L1 Data Cache Power Reduction Using a Forwarding Predictor

  • P. Carazo
  • R. Apolloni
  • F. Castro
  • D. Chaver
  • L. Pinuel
  • F. Tirado
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6448)


In most modern processor designs the L1 data cache has become a major consumer of power due to its increasing size and high frequency access rate. In order to reduce this power consumption, we propose in this paper a straightforward filtering technique. The mechanism is based on a highly accurate forwarding predictor that determines if a load instruction will take its corresponding data via forwarding from the load-store structure –thus avoiding the data cache access– or it should catch it from the data cache. Our simulation results show that 36% data cache power savings can be achieved on average, with a negligible performance penalty of 0.1%.


Power Saving Bloom Filter Data Cache Load Instruction Cache Access 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bower, F., Sorin, D., Cox, L.: The impact of dynamically heterogeneous multicore processors on thread scheduling. IEEE Micro 28(3), 17–25 (2008)CrossRefGoogle Scholar
  2. 2.
    Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. IEEE Computer 41(7), 33–38 (2008)CrossRefGoogle Scholar
  3. 3.
    Nicolaescu, D., Veidenbaum, A., Nicolau, A.: Reducing Data Cache Energy Consumption via Cached Load/Store Queue. In: ISLPED 2003, pp. 252–257 (2003)Google Scholar
  4. 4.
    Racunas, P., Patt, Y.N.: Partitioned First-Level Cache Design for Clustered Microarchitectures. In: ICS 2003, pp. 22–31 (2003)Google Scholar
  5. 5.
    Kin, J., Gupta, M., Mangione-Smith, W.: The Filter Cache: An Energy Efficient Memory Structure. In: MICRO 1997, pp. 184–193 (1997)Google Scholar
  6. 6.
    Albonesi, D.: Selective Cache Ways: On-Demand Cache Resource Allocation. Journal of Instruction-Level Parallelism 2 (2000)Google Scholar
  7. 7.
    Lee, H., Smelyanskiy, M., Newburn, C., Tyson, G.: Stack Value File: Custom Microarchitecture for the Stack. In: HPCA 2001, pp. 5–14 (2001)Google Scholar
  8. 8.
    Jin, L., Cho, S.: Reducing Cache Traffic and Energy with Macro Data Load. In: ISLPED 2006, pp. 147–150 (2006)Google Scholar
  9. 9.
    Subramaniam, S., Loh, G.: Store Vectors for Scalable Memory Dependence Prediction and Scheduling. In: HPCA 2006, pp. 65–76 (2006)Google Scholar
  10. 10.
    Park, I., Ooi, C., Vijaykumar, T.: Reducing Design Complexity of the Load/Store Queue. In: MICRO 2003, pp. 411–422 (2003)Google Scholar
  11. 11.
    Castro, F., Chaver, D., Pinuel, L., Prieto, M., Huang, M., Tirado, F.: A Load-Store Queue Design based on Predictive State Filtering. Journal of Low Power Electronics 2(1), 27–36 (2006)CrossRefGoogle Scholar
  12. 12.
    Sha, T., Martin, M., Roth, A.: Scalable Store-Load Forwarding via Store Queue Index Prediction. In: MICRO 2005, pp. 159–170 (2005)Google Scholar
  13. 13.
    Bloom, B.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communic. of the ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  14. 14.
    McFarling, S.: Combining Branch Predictors. Technical report tn-36, Western Research Laboratory, Digital Equipment Corporation (June 1993)Google Scholar
  15. 15.
    Sethumadhavan, S., Desikan, R., Burger, D., Moore, C., Keckler, S.: Scalable Hardware Memory Disambiguation for High ILP Procs. In: MICRO 2003, pp. 399–410 (2003)Google Scholar
  16. 16.
    Yourst, M.T.: PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In: ISPASS 2007, pp. 23–34 (2007)Google Scholar
  17. 17.
    Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The Microarchitecture of the Pentium 4 Proc. Intel Technology Journal (Q1 2001)Google Scholar
  18. 18.
    Copenhagen Univ. College of Eng.: The Microarch. of Intel and AMD CPU’s: an Optimization Guide for Assembly Programmers and Compiler Makers (2009)Google Scholar
  19. 19.
    A hybrid timing-address oriented LSQ filtering for an x86 arch. Technical reportGoogle Scholar
  20. 20.
  21. 21.
    Grunwald, D., Klauser, A., Manne, S., Pleszkun, A.: Confidence Estimation for Speculation Control. In: ISCA 1998, pp. 122–131 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • P. Carazo
    • 1
  • R. Apolloni
    • 2
  • F. Castro
    • 3
  • D. Chaver
    • 3
  • L. Pinuel
    • 3
  • F. Tirado
    • 3
  1. 1.Universidad Politecnica de MadridSpain
  2. 2.Universidad Nacional de San LuisArgentina
  3. 3.Universidad Complutense de MadridSpain

Personalised recommendations