Abstract
In this chapter, we explore the application of approximate computing techniques to caches and the memory access portion of the processor pipeline. As memory accesses contribute significantly to the latency and energy consumption of applications, they have long been the target of various optimizations. Large cache hierarchies are a mainstay in modern designs in order to avoid the long latency and high energy associated with accessing DRAM on every load or store request. With growing data set sizes, building ever larger caches is not necessarily an effective use of silicon real estate. We present recent work that improves the effectiveness of cache storage and reduces the cost of memory accesses by exploiting the inherently noisy or imprecise data that these applications operate on. First, we consider work that selectively forgoes loading data from the caches and memory when the processor can make a reasonable estimate of the value that is needed. Next, we explore work that selectively determines which values to store in the cache through approximate deduplication of data; by reducing how much data needs to be stored in the cache, we see an increase in the effective cache capacity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alameldeen A, Wood DA (2004) Adaptive cache compression for high-performance processors. In: International symposium on computer architecture
Albericio J, Ibanez P, Vinals V, Llaberia JM (2013) The reuse cache: downsizing the shared last-level cache. In: Proceedings of the international symposium on microarchitecture
Alvarez C, Corbal J, Valero M (2005) Fuzzy memoization for floating-point multimedia applications. IEEE Trans Comput 54:922–927
Biswas S, Franklin D, Savage A, Dixon R, Sherwood T, Chong F (2009) Multi-execution: multicore caching for data-similar executions. In: Proceedings of the international symposium on computer architecture
Burtscher M (2000) Improving context-based load value prediction. PhD Thesis, University of Colorado
Ceze L, Strauss K, Tuck J, Torrellas J, Renau J (2006) CAVA: using checkpoint-assisted value prediction to hide L2 misses. ACM Trans Archit Code Optim 3:182–208
Chen X, Yang L, Dick RP, Shang L, Lekatsas H (2010) C-pack: a high-performance microprocessor cache compression algorithm. IEEE Trans Very Large Scale Integr 18:8
Falsafi B, Wenisch T (2014) A Primer on hardware prefetching. Morgan Claypool, San Rafael
Fluhr E, Friedrich J, Dreps D, Zyuban V, Still G, Gonzalez C, Hall A, Hogenmiller D, Malgioglio F, Nett R, Paredes J, Pille J, Plass D, Puri R, Restle P, Shan D, Stawiasz K, Deniz ZT, Wendel D, Ziegler M (2014) POWER8TM: a 12-core server-class processor in 22nm SOI with 7.6tb/s off-chip bandwidth. In: Proceedings of the international solid state circuits conference
Gabbay F (1996) Speculative execution based on value prediction. EE Department Technical Report 1080, Technion - Israel Institute of Technology
Hallnor E, Reinhardt S (2005) A unified compressed memory hierarchy. In: Proceedings of the international symposium on high performance computer architecture
Hammarlund P, Martinez A, Bajwa A, Hill D, Hallnor E, Jiang H, Dixon M, Derr M, Hunsaker M, Kumar R, Osborne R, Rajwar R, Singhal R, D’Sa R, Chappell R, Kaushik S, Chennupaty S, Jourdan S, Gunther S, Piazza T, Burton T (2014) Haswell: the fourth-generation intel core processor. IEEE Micro 34:2
Jaleel A, Theobald KB, Steely SC Jr, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). In: proceedings of the 38th international symposium on computer architecture
Khan SM, Tian Y, Jiménez DA (2010) Dead block replacement and bypass with a sampling predictor. In: Proceedings of the 43rd international symposium on microarchitecture
Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: HPCA
Kleanthous M, Sazeides Y (2008) CATCH: a mechanism for dynamically detecting cache-content-duplication and its application to instruction caches. In: Proceedings of the conference on design automation and test in Europe
Lipasti MH, Wilkerson CB, Shen JP (1996) Value locality and load value prediction. In: Proceedings of the international conference architectural support for programming languages and operating systems
Liu S, Gaudiot J (2009) Potential impact of value prediction on communication in many-core architectures. IEEE Trans Comput 58:759–769
Martin MMK, Sorin DJ, Cain HW, Hill MD, Lipasti MH (2001) Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In: Proceedings of the international symposium on microarchitecture
Nakra T, Gupta R, Soffa ML (1999) Global context-based value prediction. In: Proceedings of the international symposium high-performance computer architecture
Pekhimenko G, Seshadr V, Mutlu O, Kozuch M, Gibbons PB, Mowry TC (2012) Base-delta-immediate compression: Practical data compression for on-chip caches. In: Proceedings of the international conference on parallel architecture and compilation techniques
Qureshi MK, Jaleel A, Patt YN, Steely SC Jr, Emer J (2007) Adaptive insertion policies for high performance caching. In: Proceedings of the 34th international symposium on computer architecture
San Miguel J, Badr M, Enright Jerger N (2014) Load value approximation. In: International symposium on microarchitecture
San Miguel J, Albericio J, Moshovos A, Enright Jerger N (2015) Doppelganger: a cache for approximate computing. In: MICRO
San Miguel J, Albericio J, Enright Jerger N, Jaleel A (2016) The bunker cache for spatio-value approximation. In: MICRO
Sardashti S, Wood DA (2013) Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching. In: International symposium on microarchitecture
Sardashti S, Seznec A, Wood DA (2014) Skewed compressed cache. In: International symposium on microarchitecture
Sazeides Y, Smith J (1997) The predictability of data values. In: Proceedings of the international symposium microarchitecture
Sendag R, Chuang P-F, Lilja D (2003) Address correlation: exceeding the limits of locality. IEEE Comput Archit Lett 2:3–3
Seznec A (1993) A case for two-way skewed-associative caches. In: Proceedings of the international symposium computer architecture
Sreeram J, Pande S (2010) Exploiting approximate value locality for data synchronization on multi-core processors. In: Proceedings of the international symposium workload characterization
Thwaites B, Pekhimenko G, Esmaeilzadeh H, Yazdanbakhsh A, Mutlu O, Park J, Mururu G, Mowry T (2014) Rollback-free value prediction with approximate loads. Poster presented at PACT
Tian Y, Khan S, Jimenez D, Loh G (2014) Last-level cache deduplication. In: Proceedings of the international conference on supercomputing
Tong JYF, Nagle D, Rutenbar RA (2000) Reducing power by optimizing the necessary precision/range of floating-point arithmetic. IEEE Trans Very Large Scale Integr Syst 8:273–286
Wong D, Kim NS, Annavaram M (2016) Approximating warps with intra-warp operand value similarity. In: Proceedings of the international symposium on high performance computer architecture
Wu CJ, Jaleel A, Martonosi M, Steely S Jr, Emer J (2011) PACMan: prefetch-aware cache management for high performance caching. In: Proceedings of the international symposium on microarchitecture
Yazdanbakhsh A, Pekhimenko G, Thwaites B, Esmaeilzadeh H, Mutlu O, Mowry TC (2016) RFVP: rollback-free value prediction with safe-to-approximate loads. ACM Trans Archit Code Optim 12:4
Zhang Y, Yang J, Gupta R (2000) Frequent value locality and value-centric data cache design. ACM SIGOPS Oper Syst Rev 34:150–159
Zhou H, Flanagan J, Conte TM (2003) Detecting global stride locality in value streams. In: Proceedings of the international symposium computer architecture
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jerger, N.E., Miguel, J.S. (2019). Approximate Cache Architectures. In: Reda, S., Shafique, M. (eds) Approximate Circuits. Springer, Cham. https://doi.org/10.1007/978-3-319-99322-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-99322-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99321-8
Online ISBN: 978-3-319-99322-5
eBook Packages: EngineeringEngineering (R0)