A Simulation-Based Study on Memory Design Issues for Embedded Systems
Due to the increasing gap between the speed of CPU and memory, cache designs have become an increasingly critical performance factor in microprocessor systems. Recent improvements in microprocessor technology have provided significant gains in processor speed. This dramatic rise has increased further the gap between the speed of the processor and main memory. Thus, it is necessary to design faster memory systems. In order to decrease the processor—memory speed gap, one of the main concerns has to be in the design of an effective memory hierarchy including multilevel cache and TLB (Translation Lookaside Buffer).
The aim of this chapter is to offer a comprehensive and simulation-based performance evaluation of the cache and TLB design issues in embedded processors such as two-level versus single TLB, split versus unified cache, cache size, cache associativity, and replacement policy.
The rest of chapter is organized as follows. Section 32.2 elaborates the problem under our study, related works on hierarchical TLB, specifications of SPEC CPU2000 benchmarks, and the reasons for selecting the benchmarks used in our study. Section 32.3 describes the setup of our experiments. Section 32.4 reports the results of our experiments, and Sect. 32.5 concludes the chapter.
KeywordsCache Size Replacement Policy Data Cache Cache Line Instruction Cache
Unable to display preview. Download preview PDF.
- 2.Hennessy JL, Patterson D (2003). Computer Architecture: A Quantitative Approach. Third Edition, San Mateo. CA: Morgan Kaufmann.Google Scholar
- 3.Intel XscaleTM (2000). Core: Developer’s Manual, December (2000). URL: http://developer.intel.com.
- 4.Intel Pentium 4 and Intel Xeon Processor Optimization: Reference ManualTM. Reference Manual. URL: http://developer.intel.com.
- 5.Cantin JF, Hill MD (2000). Cache performance of the SPEC CPU2000 benchmarks. URL: http://www.cs.wisc.edu/multifacet/misc/spec2000cachedata/.
- 6.Sair S, Chamey M (2000). Memory behavior of the SPEC2000 benchmark suite. IBM Thomas J. Waston Research Center, Technical Report RC-21852.Google Scholar
- 7.Thomock NC, Flangan JK (2000). Using the BACH trace collection mechanism to characterize the SPEC 2000 integer benchmarks. Workshop on Workload Characterization.Google Scholar
- 8.Wang Z, McKinley K, Rosenberg A, Weems C (2002). Using the compiler to improve cache replacement decisions. The International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, Virginia.Google Scholar
- 9.Wong W, Baer JL (2000). Modified LRU policies for improving second-level cache behavior. The 6th International Symposium on High-Performance Computer Architecture, Toulouse, France.Google Scholar
- 10.Jacob BL, Mudge TN (1998). A look at several memory management units: TLB-refill mechanisms, and page table organizations. Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems, pp 295–306.Google Scholar
- 11.Talluri M (1995). Use of superpages and subblocking in the address translation hierarchy. PhD thesis, Deptartment of CS, Univiversity of Wisconsin at Madison.Google Scholar
- 12.Nagle D, Uhlig R, Stanley T, Sechrest S, Mudge T, Brown R (1993). Design tradeoffs for software managed TLBs. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp 27–38.Google Scholar
- 13.Chen JB, Borg A, Jouppi NP (1992). A simulation based study of TLB performance. Proceedings of the 19th Annual International Symposium on Computer Architecture, pp 114–123.Google Scholar
- 14.Burger D, Austin T (1997). The SimpleScalar tool set, version 2.0. Technical Report #1342, Computer Sciences Department, University of Wisconsin, Madison, WI.Google Scholar
- 15.Henning JL (2000). SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, 33(7), pp 28–35.Google Scholar
- 16.Reineke J, Grund D, Berg C, Wilhelm R (2006). Predictability of cache replacement policies. AVACS Technical Report No. 9, SFB/TR 14 AVACS.Google Scholar
- 17.Malamy A, Patel R, Hayes N (1994). Methods and Apparatus for Implementing a Pseudo-LRU Cache Memory Replacement Scheme with a Locking Feature. United States Patent 5353425.Google Scholar
- 19.Anderson TE, Levy HM, Bershad BN, Lazowska ED (1991). The interaction of architecture and operating system design. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, pp 108–120.Google Scholar
- 21.Clark DW, Emer JS (1985). Performance of the VAX- 1/780 translation buffers: Simulation and measurement. ACM Transactions on Computer Systems, 3(1).Google Scholar
- 22.Huck J, Hays J (1993). Architectural support for translation table management in large address space machines. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp 39–50.Google Scholar
- 24.Austin TM, Sohi GS (1996). High bandwidth address translation for multiple issue processors. The 23rd Annual International Symposium on Computer Architecture.Google Scholar
- 25.Phansalkar A, Joshi A, Eeckhout L, John K (2004). Four generations of SPEC CPU benchmarks: What has changed and what has not. Technical Report TR-041026-01-1.Google Scholar
- 26.Vandierendonck H, De Bosschere K (2004). Eccentric and fragile benchmarks. 2004 IEEE International Symposium on Performance Analysis of Systems and Software, pp 2–11.Google Scholar