Abstract
Parallel processing systems with cache or local memory in the memory hierarchies have become very common. These systems have large-size cache or local memory in each processor and usually employ copy-back protocol for the cache coherence. In such systems, a problem called “cache or local memory thrashing” may arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. The techniques associated with parallel compilers to solve the problem are not completely developed.
In this paper we present an approach to eliminate unnecessary data moving between the caches or local memories for nested parallel loops. This approach is based on relations between array element accesses and enclosed loop indexes in the nested parallel loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm to calculate the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented in the paper, even though there is more than one subscript expression of the same array variable in the loop.
This method benefits parallel code with nested loop constructs in a wide range of applications, in which the array elements are repeatedly referenced in the parallel loops. The experimental results show that the technique is extremely effective—capable of achieving double speedups over application programs such as unpack benchmarks.
Supported by the National Science Foundation under grant no. MIP 8809328.
Preview
Unable to display preview. Download preview PDF.
References
Abu-Sufah, W., Kuck, D., and Lawrie, D., On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C-30, 5, May 1981.
Allen, J.R., and Kennedy, K., PFC: A Program to Convert Fortran to Parallel Form, Report MASC-TR82-6, Rice University, Mar. 1982.
Baer, J., and Wang, W., Multilevel Cache Hierarchies: Organizations, Protocols, and Performance, Journal of Parallel and Distributed Computing, Vol. 6, 451–476.
Burke, M., and Cytron, R., Interprocedural Dependence Analysis and Parallelization, Proceedings of SIGPLAN 1986 Symposium on Compiler Construction, July 1986.
Calahan, D., A Global Approach to Detection of Parallelism, Feb. 1987, Ph.D. Thesis, Computer Science Department, Rice University, Houston, TX.
Callahan, D., Can, S., and Kennedy, K., Improving Register Allocation for Subscripted Variables, Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, White Plains, NY, June 20–22, 1990.
Dongarra, J., Sorensen, D., and Brewer, O., Tools and Methodology for Programming Parallel Processors, Aspects of Computation on Asynchronous Parallel Processors, IFIP 1989, pp. 125–137.
Fang, Z., and Lu, M., A Solution of Cache Ping-Pong Problem in RISC Based Parallel Processing Systems, Proceedings of International Conference on Parallel Processing 1991, St. Chaise, pp. 1–238-245.
Fang, Z., Yew, C., Tang, T., and Zhu, C., Dynamic Processor Self-scheduling for General Parallel Nested Loops, IEEE Transactions on Computer, Vol. 39, No. 7, (July 1990), 919–929.
Kuck, D., et. al., Parallel Supercomputing Today and Cedar Approach, Science, (Feb. 1986), 967–974.
Kuck, D.J., Kuhn, R.H., Leasure, B., and Wolfe, M.,The Structure of an Advanced Vectorizer for Pipeline Processor, Proceedings of IEEE Computer Society Fourth International Computer Software and Applications Conference, Oct. 1980.
Leasure, B., et. al., PCF Fortran: Language Definition by the Parallel Computing Forum, Proceedings of International Conferences on Parallel Processing, Aug. 1988.
Padua, D.A., and Wolfe, M., Advanced Compiler Optimizations for Supercomputers, Communications ACM, (Dec. 1986), 1184–1201.
Wolfe, M., Iteration Space Tiling for Memory Hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing, Los Angeles, CA, Dec. 1–4, 1987.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fang, J., Lu, M. (1992). An iteration partition approach for cache or local memory thrashing on parallel processing. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038673
Download citation
DOI: https://doi.org/10.1007/BFb0038673
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive