A Simulation-Based Study on Memory Design Issues for Embedded Systems

  • Mohsen Sharifi
  • Mohsen Soryani
  • Mohammad Hossein Rezvani
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 6)

Due to the increasing gap between the speed of CPU and memory, cache designs have become an increasingly critical performance factor in microprocessor systems. Recent improvements in microprocessor technology have provided significant gains in processor speed. This dramatic rise has increased further the gap between the speed of the processor and main memory. Thus, it is necessary to design faster memory systems. In order to decrease the processor—memory speed gap, one of the main concerns has to be in the design of an effective memory hierarchy including multilevel cache and TLB (Translation Lookaside Buffer).

The aim of this chapter is to offer a comprehensive and simulation-based performance evaluation of the cache and TLB design issues in embedded processors such as two-level versus single TLB, split versus unified cache, cache size, cache associativity, and replacement policy.

The rest of chapter is organized as follows. Section 32.2 elaborates the problem under our study, related works on hierarchical TLB, specifications of SPEC CPU2000 benchmarks, and the reasons for selecting the benchmarks used in our study. Section 32.3 describes the setup of our experiments. Section 32.4 reports the results of our experiments, and Sect. 32.5 concludes the chapter.


Cache Size Replacement Policy Data Cache Cache Line Instruction Cache 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kalavade A, Knoblock J, Micca E, Moturi M, Nicol CJ, O’Neill JH, Othmer J, Sackinger E, Singh KJ, Sweet J, Terman CJ, Williams J (2000). A single-chip, 1.6 billion, 16-b MAC/s multiprocessor DSP. IEEE Journal of Solid-State Circuits, 35(3), pp. 412–423.CrossRefGoogle Scholar
  2. 2.
    Hennessy JL, Patterson D (2003). Computer Architecture: A Quantitative Approach. Third Edition, San Mateo. CA: Morgan Kaufmann.Google Scholar
  3. 3.
    Intel XscaleTM (2000). Core: Developer’s Manual, December (2000). URL:
  4. 4.
    Intel Pentium 4 and Intel Xeon Processor Optimization: Reference ManualTM. Reference Manual. URL:
  5. 5.
    Cantin JF, Hill MD (2000). Cache performance of the SPEC CPU2000 benchmarks. URL:
  6. 6.
    Sair S, Chamey M (2000). Memory behavior of the SPEC2000 benchmark suite. IBM Thomas J. Waston Research Center, Technical Report RC-21852.Google Scholar
  7. 7.
    Thomock NC, Flangan JK (2000). Using the BACH trace collection mechanism to characterize the SPEC 2000 integer benchmarks. Workshop on Workload Characterization.Google Scholar
  8. 8.
    Wang Z, McKinley K, Rosenberg A, Weems C (2002). Using the compiler to improve cache replacement decisions. The International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, Virginia.Google Scholar
  9. 9.
    Wong W, Baer JL (2000). Modified LRU policies for improving second-level cache behavior. The 6th International Symposium on High-Performance Computer Architecture, Toulouse, France.Google Scholar
  10. 10.
    Jacob BL, Mudge TN (1998). A look at several memory management units: TLB-refill mechanisms, and page table organizations. Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems, pp 295–306.Google Scholar
  11. 11.
    Talluri M (1995). Use of superpages and subblocking in the address translation hierarchy. PhD thesis, Deptartment of CS, Univiversity of Wisconsin at Madison.Google Scholar
  12. 12.
    Nagle D, Uhlig R, Stanley T, Sechrest S, Mudge T, Brown R (1993). Design tradeoffs for software managed TLBs. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp 27–38.Google Scholar
  13. 13.
    Chen JB, Borg A, Jouppi NP (1992). A simulation based study of TLB performance. Proceedings of the 19th Annual International Symposium on Computer Architecture, pp 114–123.Google Scholar
  14. 14.
    Burger D, Austin T (1997). The SimpleScalar tool set, version 2.0. Technical Report #1342, Computer Sciences Department, University of Wisconsin, Madison, WI.Google Scholar
  15. 15.
    Henning JL (2000). SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, 33(7), pp 28–35.Google Scholar
  16. 16.
    Reineke J, Grund D, Berg C, Wilhelm R (2006). Predictability of cache replacement policies. AVACS Technical Report No. 9, SFB/TR 14 AVACS.Google Scholar
  17. 17.
    Malamy A, Patel R, Hayes N (1994). Methods and Apparatus for Implementing a Pseudo-LRU Cache Memory Replacement Scheme with a Locking Feature. United States Patent 5353425.Google Scholar
  18. 18.
    So K, Rechtshaffen RN (1988). Cache operations by MRU change. IEEE Transaction on Computers, 37(6), pp 700–707.CrossRefGoogle Scholar
  19. 19.
    Anderson TE, Levy HM, Bershad BN, Lazowska ED (1991). The interaction of architecture and operating system design. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, pp 108–120.Google Scholar
  20. 20.
    Jacob B, Mudge T (1998). Virtual memory in contemporary microprocessors. IEEE Micro, 18(4), pp 60–75.CrossRefGoogle Scholar
  21. 21.
    Clark DW, Emer JS (1985). Performance of the VAX- 1/780 translation buffers: Simulation and measurement. ACM Transactions on Computer Systems, 3(1).Google Scholar
  22. 22.
    Huck J, Hays J (1993). Architectural support for translation table management in large address space machines. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp 39–50.Google Scholar
  23. 23.
    Rosenblum M, Bugnion E, Devine S, Herrod S (1997). Using the SimOS machine simulator to study complex computer systems. ACM Transactions on Modeling and Computer Simulation, 7(1), pp 78–103.CrossRefGoogle Scholar
  24. 24.
    Austin TM, Sohi GS (1996). High bandwidth address translation for multiple issue processors. The 23rd Annual International Symposium on Computer Architecture.Google Scholar
  25. 25.
    Phansalkar A, Joshi A, Eeckhout L, John K (2004). Four generations of SPEC CPU benchmarks: What has changed and what has not. Technical Report TR-041026-01-1.Google Scholar
  26. 26.
    Vandierendonck H, De Bosschere K (2004). Eccentric and fragile benchmarks. 2004 IEEE International Symposium on Performance Analysis of Systems and Software, pp 2–11.Google Scholar
  27. 27.
    Belady LA (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2), pp 78–101.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Mohsen Sharifi
    • 1
  • Mohsen Soryani
    • 1
  • Mohammad Hossein Rezvani
    • 2
  1. 1.Computer Engineering DepartmentIran University of Science and TechnologyTehranIran
  2. 2.Computer Engineering DepartmentIran University of Science and TechnologyNarmak, TehranIran

Personalised recommendations