Skip to main content

Increasing memory bandwidth for vector computations

  • Session Papers
  • Conference paper
  • First Online:
Programming Languages and System Architectures

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 782))

Abstract

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the “Grand Challenge” scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components “on the other side of the cache” — they should not be treated as uniform access-time RAM. This paper describes the use of hardwareassisted access ordering, a technique that combines compile-time detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. Our simulations show significant speedup on important scientific kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baer, J. L., Chen, T. F., “An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty”, Supercomputing 91, November 1991.

    Google Scholar 

  2. Baron, R.L., and Higbie, L., Computer Architecture, Addison-Wesley, 1992.

    Google Scholar 

  3. Budnik, P., and Kuck, D., “The Organization and Use of Parallel Memories”, IEEE Trans. Comput., 20, 12, 1971.

    Google Scholar 

  4. Callahan, D., et. al., “Software Prefetching”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.

    Google Scholar 

  5. Carr, S., Kennedy, K., “Blocking Linear Algebra Codes for Memory Hierarchies”, Proc. Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989.

    Google Scholar 

  6. Davidson, Jack W., and Benitez, Manuel E., “Code Generation for Streaming: An Access/Execute Mechanism”, Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.

    Google Scholar 

  7. Dongarra, et. al., “Linpack User's Guide“, SLAM, Philadelphia, 1979.

    Google Scholar 

  8. Fu, J. W. C., and Patel, J. H., “Data Prefetching in Multiprocessor Vector Cache Memories”, 18th International Symposium on Computer Architecture, May 1991.

    Google Scholar 

  9. Golub, G., and Ortega, J.M., Scientific Computation: An Introduction with Parallel Computing, Academic Press, Inc., 1993.

    Google Scholar 

  10. Goodman, J. R., et al, “PIPE: A VLSI Decoupled Architecture”, Twelfth International Symposium on Computer Architecture, June 1985.

    Google Scholar 

  11. Gupta, R., and Soffa, M., “Compile-time Techniques for Efficient Utilization of Parallel Memories”, SIGPLAN Not., 23, 9, 1988, pp. 235–246.

    Google Scholar 

  12. Harper, D. T., Jump., J., “Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme”, IEEE Trans. Comput., 36, 12, 1987.

    Google Scholar 

  13. Harper, D. T., “Address Transformation to Increase Memory Performance”, 1989 International Conference on Supercomputing.

    Google Scholar 

  14. Hayes, J.P., Computer Architecture and Organization, McGraw-Hill, 1988.

    Google Scholar 

  15. Hwang, K., and Briggs, F.A., Computer Architecture and Parallel Processing, McGraw-Hill, Inc., 1984.

    Google Scholar 

  16. “High-speed DRAMs”, Special Report, IEEE Spectrum, vol. 29, no. 10, October 1992.

    Google Scholar 

  17. i860 XP Microprocessor Data Book, Intel Corporation, 1991.

    Google Scholar 

  18. Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers”, 17th International Symposium on Computer Architecture, May 1990.

    Google Scholar 

  19. Katz, R., and Hennessy, J., “High Performance Microprocessor Architectures”, University of California, Berkeley, Report No. UCB/CSD 89/529, August, 1989.

    Google Scholar 

  20. Klaiber, A., et. al., “An Architecture for Software-Controlled Data Prefetching”, 18th International Symposium on Computer Architecture, May 1991.

    Google Scholar 

  21. Lam, Monica, et. al., “The Cache Performance and Optimizations of Blocked Algorithms”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.

    Google Scholar 

  22. Lawson, et. al., “Basic Linear Algebra Subprograms for Fortran Usage”, ACM Trans. Math. Soft., 5, 3, 1979.

    Google Scholar 

  23. Lee, K., “Achieving High Performance On the i860 Microprocessor Using Naspack Subroutines”, NAS Systems Division, NASA Ames Research Center, July 1990.

    Google Scholar 

  24. Lee, K., “On the Floating Point Performance of the i860 Microprocessor”, RNR-90-019, NAS Systems Division, NASA Ames Research Center, October 1990.

    Google Scholar 

  25. Maccabe, A.B., Computer Systems: Architecture, Organization, and Programming, Richard D. Irwin, Inc., 1993.

    Google Scholar 

  26. Mano, M.M., Computer System Architecture, 2nd ed., Prentice-Hall, Inc., 1982

    Google Scholar 

  27. McMahon, F.H., “The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range”, Lawrence Livermore National Laboratory, UCRL-53745, December 1986.

    Google Scholar 

  28. McKee, S.A, “Hardware Support for Access Ordering: Performance of Some Design Options”, University of Virginia, Department of Computer Science, Technical Report CS-93-08, July 1993.

    Google Scholar 

  29. Meadows, L., Nakamoto, S., and Schuster, V., “A Vectorizing, Software Pipelining Compiler for LIW and Superscalar Architectures”, RISC'92, February 1992.

    Google Scholar 

  30. Moyer, S.A., “Performance of the iPSC/860 Node Architecture,” University of Virginia, IPC-TR-91-007, 1991.

    Google Scholar 

  31. Moyer, S., “Access Ordering and Effective Memory Bandwidth”, Ph.D. Dissertation, Department of Computer Science, University of Virginia, Technical Report CS-93-18, April 1993.

    Google Scholar 

  32. Quinnell, R., “High-speed DRAMs”, EDN, May 23, 1991.

    Google Scholar 

  33. “Architectural Overview”, Rambus Inc., Mountain View, CA, 1992.

    Google Scholar 

  34. Rau, B. R., “Pseudo-Randomly Interleaved Memory”, 18th International Symposium on Computer Architecture, May 1991.

    Google Scholar 

  35. Sklenar, Ivan, “Prefetch Unit for Vector Operation on Scalar Computers”, Computer Architecture News, 20, 4, September 1992.

    Google Scholar 

  36. Smith, J. E., et al, “The ZS-1 Central Processor”, The Second International Conference on Architectural Support for Programming Languages and Systems, Oct. 1987

    Google Scholar 

  37. Sohi, G. and Manoj, F., “High Bandwidth Memory Systems for Superscalar Processors”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.

    Google Scholar 

  38. Tomek, I., The Foundations of Computer Architecture and Organization, Computer Science Press, 1990.

    Google Scholar 

  39. Valero, M., et. al., “Increasing the Number of Strides for Conflict-Free Vector Access”, 19th International Symposium on Computer Architecture, May 1992.

    Google Scholar 

  40. Wallach, S., “The CONVEX C-1 64-bit Supercomputer”, Compcon Spring 85, February 1985.

    Google Scholar 

  41. Wolfe, M., “Optimizing Supercompilers for Supercomputers”, MIT Press, Cambridge, MA, 1989.

    Google Scholar 

  42. Wulf, W. A., “Evaluation of the WM Architecture”, 19th Annual International Symposium on Computer Architecture, vol 20, no. 2, May 19–21, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jürg Gutknecht

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McKee, S.A., Moyer, S.A., Wulf, W.A., Hitchcock, C. (1994). Increasing memory bandwidth for vector computations. In: Gutknecht, J. (eds) Programming Languages and System Architectures. Lecture Notes in Computer Science, vol 782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57840-4_26

Download citation

  • DOI: https://doi.org/10.1007/3-540-57840-4_26

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57840-6

  • Online ISBN: 978-3-540-48356-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics