An iteration partition approach for cache or local memory thrashing on parallel processing

Fang, J.; Lu, M.

doi:10.1007/BFb0038673

J. Fang¹ &
M. Lu²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 589))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

134 Accesses
1 Citations

Abstract

Parallel processing systems with cache or local memory in the memory hierarchies have become very common. These systems have large-size cache or local memory in each processor and usually employ copy-back protocol for the cache coherence. In such systems, a problem called “cache or local memory thrashing” may arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. The techniques associated with parallel compilers to solve the problem are not completely developed.

In this paper we present an approach to eliminate unnecessary data moving between the caches or local memories for nested parallel loops. This approach is based on relations between array element accesses and enclosed loop indexes in the nested parallel loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm to calculate the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented in the paper, even though there is more than one subscript expression of the same array variable in the loop.

This method benefits parallel code with nested loop constructs in a wide range of applications, in which the array elements are repeatedly referenced in the parallel loops. The experimental results show that the technique is extremely effective—capable of achieving double speedups over application programs such as unpack benchmarks.

Supported by the National Science Foundation under grant no. MIP 8809328.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abu-Sufah, W., Kuck, D., and Lawrie, D., On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C-30, 5, May 1981.
Google Scholar
Allen, J.R., and Kennedy, K., PFC: A Program to Convert Fortran to Parallel Form, Report MASC-TR82-6, Rice University, Mar. 1982.
Google Scholar
Baer, J., and Wang, W., Multilevel Cache Hierarchies: Organizations, Protocols, and Performance, Journal of Parallel and Distributed Computing, Vol. 6, 451–476.
Google Scholar
Burke, M., and Cytron, R., Interprocedural Dependence Analysis and Parallelization, Proceedings of SIGPLAN 1986 Symposium on Compiler Construction, July 1986.
Google Scholar
Calahan, D., A Global Approach to Detection of Parallelism, Feb. 1987, Ph.D. Thesis, Computer Science Department, Rice University, Houston, TX.
Google Scholar
Callahan, D., Can, S., and Kennedy, K., Improving Register Allocation for Subscripted Variables, Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, White Plains, NY, June 20–22, 1990.
Google Scholar
Dongarra, J., Sorensen, D., and Brewer, O., Tools and Methodology for Programming Parallel Processors, Aspects of Computation on Asynchronous Parallel Processors, IFIP 1989, pp. 125–137.
Google Scholar
Fang, Z., and Lu, M., A Solution of Cache Ping-Pong Problem in RISC Based Parallel Processing Systems, Proceedings of International Conference on Parallel Processing 1991, St. Chaise, pp. 1–238-245.
Google Scholar
Fang, Z., Yew, C., Tang, T., and Zhu, C., Dynamic Processor Self-scheduling for General Parallel Nested Loops, IEEE Transactions on Computer, Vol. 39, No. 7, (July 1990), 919–929.
Google Scholar
Kuck, D., et. al., Parallel Supercomputing Today and Cedar Approach, Science, (Feb. 1986), 967–974.
Google Scholar
Kuck, D.J., Kuhn, R.H., Leasure, B., and Wolfe, M.,The Structure of an Advanced Vectorizer for Pipeline Processor, Proceedings of IEEE Computer Society Fourth International Computer Software and Applications Conference, Oct. 1980.
Google Scholar
Leasure, B., et. al., PCF Fortran: Language Definition by the Parallel Computing Forum, Proceedings of International Conferences on Parallel Processing, Aug. 1988.
Google Scholar
Padua, D.A., and Wolfe, M., Advanced Compiler Optimizations for Supercomputers, Communications ACM, (Dec. 1986), 1184–1201.
Google Scholar
Wolfe, M., Iteration Space Tiling for Memory Hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing, Los Angeles, CA, Dec. 1–4, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett-Packard Laboratories, USA
J. Fang
Texas A&M University, USA
M. Lu

Authors

J. Fang
View author publications
You can also search for this author in PubMed Google Scholar
M. Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, J., Lu, M. (1992). An iteration partition approach for cache or local memory thrashing on parallel processing. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038673

Download citation

DOI: https://doi.org/10.1007/BFb0038673
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics