Skip to main content

Memory System Support for Dynamic Cache Line Assembly

  • Conference paper
  • First Online:
Intelligent Memory Systems (IMS 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2107))

Included in the following conference series:

Abstract

The effectiveness of cache-based memory hierarchies depends on the presence of spatial and temporal locality in applications. Memory accesses of many important applications have predictable behavior but poor locality. As a result, the performance of these applications suffers from the increasing gap between processor and memory performance. In this paper, we describe a novel mechanism provided by the Impulse memory controller called Dynamic Cache Line Assembly that can be used by applications to improve memory performance. This mechanism allows applications to gather on-the-fly data spread through memory into contiguous cache lines, which creates spatial data locality where none exists naturally. We have used dynamic cache line assembly to optimize a random access loop and an implementation of Fast Fourier Transform (FFTW). Detailed simulation results show that the use of dynamic cache line assembly improves the performance of these benchmarks by up to a factor of 3.2 and 1.4, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Carr, K. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 252–262, Oct. 1994.

    Google Scholar 

  2. J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a smarter memory controller. In Proceedings of the Fifth Annual Symposium on High Performance Computer Architecture, pages 70–79, Jan. 1999.

    Google Scholar 

  3. M. Cierniak and W. Li. Unifying data and control transformations for distributed shared memory machines. Technical Report TR-542, University of Rochester, November 1994.

    Google Scholar 

  4. Compaq Computer Corporation. Alpha 21264 Microprocessor Hardware Reference Manual, July 1999.

    Google Scholar 

  5. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229–241, May 1999.

    Google Scholar 

  6. M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of ICASSP Conference, 1998.

    Google Scholar 

  7. J.W. Manke and J. Wu. Data-Intensive System Benchmark Suite Analysis and Specification. Atlantic Aerospace Electronics Corp., June 1999.

    Google Scholar 

  8. MIPS Technologies Inc. MIPS R10000 Microprocessor User’s Manual, Version 2.0, Dec. 1996.

    Google Scholar 

  9. V. Pai, P. Ranganathan, and S. Adve. RSIM reference manual, version 1.0. IEEE Technical Committee on Computer Architecture Newsletter, Fall 1997.

    Google Scholar 

  10. S. Scott. Synchronization and communication in the T3E multiprocessor. In Proceedings of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.

    Google Scholar 

  11. M. Swanson, L. Stoller, and J. Carter. Increasing TLB reach using superpages backed by shadow memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 204–213, June 1998.

    Google Scholar 

  12. Y. Yamada. Data Relocation and Prefetching in Programs with Large Data Sets. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 1995.

    Google Scholar 

  13. L. Zhang. URSIM reference manual. Technical Report UUCS-00-015, University of Utah, August 2000.

    Google Scholar 

  14. L. Zhang, J. Carter, W. Hsieh, and S. McKee. Memory system support for image processing. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, pages 98–107, Oct. 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, L., Pingali, V.K., Chandramouli, B., Carter, J.B. (2001). Memory System Support for Dynamic Cache Line Assembly. In: Chong, F.T., Kozyrakis, C., Oskin, M. (eds) Intelligent Memory Systems. IMS 2000. Lecture Notes in Computer Science, vol 2107. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44570-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-44570-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42328-7

  • Online ISBN: 978-3-540-44570-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics