Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

Gebis, Joseph; Oliker, Leonid; Shalf, John; Williams, Samuel; Yelick, Katherine

doi:10.1007/978-3-642-00454-4_16

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

Joseph Gebis^20,21,
Leonid Oliker^20,21,
John Shalf²⁰,
Samuel Williams^20,21 &
…
Katherine Yelick^20,21

Conference paper

548 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5455))

Abstract

The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication — achieving 2x–13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, D.: Little’s law and high performance computing. In RNR Technical Report (1997)
Google Scholar
Blelloch, G.E., Heroux, M., Zagha, M.: Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors. Technical Report CMU-CS-93-173 (August 1993)
Google Scholar
Bohrer, P., Peterson, J., Ozahy, M., Rajamony, R., Gheith, A., Rockhold, R., Lefurgy, C., Shafi, H., Nakra, T., Simpson, R., Speight, E., Sudeep, K., Hensbergen, E.V., Zhang, L.: Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review 31(4), 8–12 (2004)
Article Google Scholar
Creating science-driven computer architecture:a new path to scientific leadership, http://www.nersc.gov/news/reports/blueplanet.php
Espasa, R., Valero, M., Smith, J.E.: Vector architectures: past, present and future. In: Proceedings of the 12th international Conference on Supercomputing (1998)
Google Scholar
Gebis, J.: Low-complexity Vector Microprocessor Extensions. PhD thesis, University of California, Berkeley, CA, USA (May 2008)
Google Scholar
Grun, P., Nicolau, A., Dutt, N.: Memory Architecture Exploration for Programmable Embedded Systems. Kluwer Academic Publishers, Norwell (2002)
Google Scholar
Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: Proceedings of 3rd Conference on Computing Frontiers, New York, NY, USA, pp. 1–8 (2006)
Google Scholar
Guo, Y., Chheda, S., Koren, I., Krishna, C.M., Moritz, C.A.: Energy characterization of hardware-based data prefetching. In: ICCD 2004: Proceedings of the IEEE International Conference on Computer Design, Washington, DC, USA, pp. 518–523. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
HPEC Challenge Benchmark Suite, http://www.ll.mit.edu/HPECchallenge
McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996)
Google Scholar
Natarajan, K., Hanson, H., Keckler, S.W., Moore, C.R., Burger, D.: Microprocessor pipeline energy analysis. pp. 282–287 (August 2003)
Google Scholar
Patterson, D.A.: Latency lags bandwith. Commun. ACM 47(10), 71–75 (2004)
Article Google Scholar
Temam, O., Jalby, W.: Characterizing sparse algorithms on caches. In: Proc. Supercomputing (1992)
Google Scholar
Vuduc, R., Demmel, J.W., Yelick, K.A.: OSKI: A library of automatically tuned sparse matrix kernels. In: Proc. SciDAC 2005, Journal of Physics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, CRD/NERSC, Berkeley, CA 94720, USA
Joseph Gebis, Leonid Oliker, John Shalf, Samuel Williams & Katherine Yelick
CS Division, University of California at Berkeley, Berkeley, CA 94720, USA
Joseph Gebis, Leonid Oliker, Samuel Williams & Katherine Yelick

Authors

Joseph Gebis
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Oliker
View author publications
You can also search for this author in PubMed Google Scholar
John Shalf
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Williams
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Yelick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Datentechnik und Kommunikationsnetze, Hans-Sommer-Str. 66, 38106, Braunschweig, Germany
Mladen Berekovic
Leibniz University, Appelstr. 4, 30167, Hannover, Germany
Christian Müller-Schloer
Technical University of Dresden, Nöthnitzer Str. 46, 01187, Dresden, Germany
Christian Hochberger
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Stephan Wong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gebis, J., Oliker, L., Shalf, J., Williams, S., Yelick, K. (2009). Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture . In: Berekovic, M., Müller-Schloer, C., Hochberger, C., Wong, S. (eds) Architecture of Computing Systems – ARCS 2009. ARCS 2009. Lecture Notes in Computer Science, vol 5455. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00454-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-00454-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00453-7
Online ISBN: 978-3-642-00454-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics