Data I/O Minimization for Loops on Limited Onchip Memory Processors

Wang, Lei; Pande, Santosh

doi:10.1007/3-540-44905-1_34

Lei Wang^5,6 &
Santosh Pande^5,6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1863))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

323 Accesses

Abstract

Due to significant advances in VLSI technology, ‘mega-processors’ made with a large number of transistors has become a reality. These processors typically provide multiple functional units which allow exploitation of parallelism. In order to cater to the data demands associated with parallelism, the processors provide a limited amount of on-chip memory. The amount of memory provided is quite limited due to higher area and power requirements associated with it. Even though limited, such on-chip memory is a very valuable resource in memory hierarchy. An important use of on-chip memory is to hold the instructions from short loops along with the associated data for very fast computation. Such schemes are very attractive on embedded processors where, due to the presence of dedicated hard-ware on-chip (such as very fast multipliers-shifters etc.) and extremely fast accesses to on-chip data, the computation time of such loops is extremely small meeting almost all real-time demands. Biggest bottleneck to performance in these cases are off-chip accesses and thus, compilers must carefully analyze references to identify good candidates for promotion to on-chip memory. In our earlier work [6], we formulated this problem in terms of 0/1 knapsack and proposed a heuristic solution that gives us good promotion candidates. Our analysis was limited to a single loop nest. When we attempted extending this framework to multiple loop nests (intra-procedurally), we realized that not only it is important to identify good candidates for promotion but a careful restructuring of loops must be undertaken before performing promotion since data i/o of loading and storing values to on-chip memory poses a significant bottleneck.

Supported in part by NSF through grant no. #EIA 9871345

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Languages and Compilers for Parallel Computing (LCPC), 1992.
Google Scholar
K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing (LCPC), 1993.
Google Scholar
I. Kodukula, N. Ahmed, and K. Pingali. Data centric multi-level blocking. In ACM Programming Language Design and Implementation(PLDI), pages 346–357, 1997.
Google Scholar
N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal of Parallel Programming, volume 26:6, pages 641–670, 1998.
Article Google Scholar
R. Schreiber and J. Dongarra. Automatic blocking of nested loops. In Technical report, RIACS, NASA Ames Research Center, and Oak Ridge National Laboratory, May 1990.
Google Scholar
A. Sundaram and S. Pande. An efficient data partitioning method for limited memory embedded systems. In ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems(LCTES)(in conjunction with PLDI’ 98), Montreal, Canada, Springer-Verlag, pages 205–218, 1998.
Google Scholar
M. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Processing for Scientific Computing, December 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Compiler Research Lab, PO Box 210030, USA
Lei Wang & Santosh Pande
Department of Electrical & Computer Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, 45219, USA
Lei Wang & Santosh Pande

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Pande
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Larry Carter & Jeanne Ferrante &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Pande, S. (2000). Data I/O Minimization for Loops on Limited Onchip Memory Processors. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_34

Download citation

DOI: https://doi.org/10.1007/3-540-44905-1_34
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67858-8
Online ISBN: 978-3-540-44905-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics