Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical Applications

  • Yun LiangEmail author
  • Shuo Wang
  • Tulika Mitra
  • Yajun Ha
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 908)


Emerging biomedical applications generally work at low/medium frequencies and require ultra-low energy. Near threshold processors with near threshold caches are proposed to be the computing platforms for these applications. There exists a large design space for multi-level near threshold cache hierarchies, which requires a fast design space exploration framework. In this paper, we first propose three different two-level near threshold cache architectures with different performance and energy tradeoff. Then, we describe the design space of a two-level near threshold cache hierarchy and develop an accurate and fast analytical design space exploration framework to analyze this space. Experiments indicate that significant energy saving (\(59\%\)) on average is achieved by our new near threshold cache architecture. Moreover, our analytical framework is shown to be both accurate and efficient.



This work was supported by the National Natural Science Foundation China (No. 61672048) and Beijing Natural Science Foundation (No. L172004). We thank all the anonymous reviewers for their feedback.


  1. 1.
    Austin, T.: Simplescalar: an infrastructure for computer system modeling. Computer 35(2), 59–67 (2002)CrossRefGoogle Scholar
  2. 2.
    Brooks, D., Tiwari, V., Martonosi, M.: Wattch: a framework for architectural-level power analysis and optimizations. In: ISCA (2000)Google Scholar
  3. 3.
    Calhoun, B.H., Chandrakasan, A.P.: A 256 kb subthreshold SRAM in 65 nm CMOS. In: IEEE International Solid- State Circuits Conference (2006)Google Scholar
  4. 4.
    Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley (2001)Google Scholar
  5. 5.
    Dreslinski, R.G., et al.: Reconfigurable energy efficient near threshold cache architectures. In: MICRO (2008)Google Scholar
  6. 6.
    Edler, J., Hill, M.D.: Dinero IV trace-driven uniprocessor cache simulator.
  7. 7.
    Ghosh, A., Givargis, T.: Cache optimization for embedded processor cores: an analytical approach. ACM Trans. Des. Autom. Electron. Syst. 9(4), 419–440 (2004)CrossRefGoogle Scholar
  8. 8.
    Gordon-Ross, A., Vahid, F., Dutt, N.: Automatic tuning of two-level caches to embedded applications. In: DATE (2004)Google Scholar
  9. 9.
    Gordon-Ross, A., Vahid, F., Dutt, N.: Fast configurable-cache tuning with a unified second-level cache. In: ISLPED (2005)Google Scholar
  10. 10.
    Hardy, D., Puaut, I.: WCET analysis of multi-level non-inclusive set-associative instruction caches. In: RTSS (2008)Google Scholar
  11. 11.
    Kwong, J.: A 65 nm sub-vt microcontroller with integrated SRAM and switched capacitor DC-DC converter. IEEE J. Solid-State Circuits 44(1), 115–126 (2009)CrossRefGoogle Scholar
  12. 12.
    Liang, Y., Mitra, T.: Cache modeling in probabilistic execution time analysis. In: DAC, pp. 319–324. ACM (2008)Google Scholar
  13. 13.
    Liang, Y., Mitra, T.: Static analysis for fast and accurate design space exploration of caches. In: CODES+ISSS, pp. 103–108. ACM (2008)Google Scholar
  14. 14.
    Liang, Y., Mitra, T.: An analytical approach for fast and accurate design space exploration of instruction caches. TECS 13(3), 43 (2013)CrossRefGoogle Scholar
  15. 15.
    Nazhandali, L., et al.: Energy optimization of subthreshold-voltage sensor network processors. In: ISCA (2005)Google Scholar
  16. 16.
    Strydis, C., Kachris, C., Gaydadjiev, G.N.: ImpBench: a novel benchmark suite for biomedical, microelectronic implants. In: International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (2008)Google Scholar
  17. 17.
    Sugumar, R.A., Abraham, S.G.: Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13(1) (1995)Google Scholar
  18. 18.
    Uhlig, R.A., Mudge, T.N.: Trace-driven memory simulation: a survey. ACM Comput. Surv. 29(2), 128–170 (1997)CrossRefGoogle Scholar
  19. 19.
    Verma, N., Chandrakasan, A.P.: A 256 kb 65 nm 8t subthreshold sram employing sense-amplifier redundancy. IEEE J. Solid-State Circuits 43(1), 141–149 (2008)CrossRefGoogle Scholar
  20. 20.
    Wang, A., Chandrakasan, A.: A 180mv subthreshold fft processor using a minimum energy design methodology. IEEE J. Solid-State Circuits 40(1), 310–319 (2005)CrossRefGoogle Scholar
  21. 21.
    Wilton, S.J.E., Jouppi, N.P.: CACTI an enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 677–688 (1996)CrossRefGoogle Scholar
  22. 22.
    Xie, X., Liang, Y., Sun, G., Chen, D.: An efficient compiler framework for cache bypassing on GPUs. In: ICCAD, pp. 516–523. IEEE (2013)Google Scholar
  23. 23.
    Xie, X., Liang, Y., Wang, Y., Sun, G., Wang, T.: Coordinated static and dynamic cache bypassing for GPUs. In: HPCA, pp. 76–88. IEEE (2015)Google Scholar
  24. 24.
    Zhang, C., Vahid, F., Najjar, W.: A highly configurable cache architecture for embedded systems. In: ISCA (2003)Google Scholar
  25. 25.
    Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Center for Energy-Efficient Computing and Applications (CECA), School of EECSPeking UniversityBeijingChina
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore
  3. 3.School of Information Science and TechnologyShanghaiTech UniversityShanghaiChina

Personalised recommendations