Skip to main content

An iteration partition approach for cache or local memory thrashing on parallel processing

  • VIII. Cache Memory Issues
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1991)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 589))

Abstract

Parallel processing systems with cache or local memory in the memory hierarchies have become very common. These systems have large-size cache or local memory in each processor and usually employ copy-back protocol for the cache coherence. In such systems, a problem called “cache or local memory thrashing” may arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. The techniques associated with parallel compilers to solve the problem are not completely developed.

In this paper we present an approach to eliminate unnecessary data moving between the caches or local memories for nested parallel loops. This approach is based on relations between array element accesses and enclosed loop indexes in the nested parallel loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm to calculate the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented in the paper, even though there is more than one subscript expression of the same array variable in the loop.

This method benefits parallel code with nested loop constructs in a wide range of applications, in which the array elements are repeatedly referenced in the parallel loops. The experimental results show that the technique is extremely effective—capable of achieving double speedups over application programs such as unpack benchmarks.

Supported by the National Science Foundation under grant no. MIP 8809328.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abu-Sufah, W., Kuck, D., and Lawrie, D., On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C-30, 5, May 1981.

    Google Scholar 

  2. Allen, J.R., and Kennedy, K., PFC: A Program to Convert Fortran to Parallel Form, Report MASC-TR82-6, Rice University, Mar. 1982.

    Google Scholar 

  3. Baer, J., and Wang, W., Multilevel Cache Hierarchies: Organizations, Protocols, and Performance, Journal of Parallel and Distributed Computing, Vol. 6, 451–476.

    Google Scholar 

  4. Burke, M., and Cytron, R., Interprocedural Dependence Analysis and Parallelization, Proceedings of SIGPLAN 1986 Symposium on Compiler Construction, July 1986.

    Google Scholar 

  5. Calahan, D., A Global Approach to Detection of Parallelism, Feb. 1987, Ph.D. Thesis, Computer Science Department, Rice University, Houston, TX.

    Google Scholar 

  6. Callahan, D., Can, S., and Kennedy, K., Improving Register Allocation for Subscripted Variables, Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, White Plains, NY, June 20–22, 1990.

    Google Scholar 

  7. Dongarra, J., Sorensen, D., and Brewer, O., Tools and Methodology for Programming Parallel Processors, Aspects of Computation on Asynchronous Parallel Processors, IFIP 1989, pp. 125–137.

    Google Scholar 

  8. Fang, Z., and Lu, M., A Solution of Cache Ping-Pong Problem in RISC Based Parallel Processing Systems, Proceedings of International Conference on Parallel Processing 1991, St. Chaise, pp. 1–238-245.

    Google Scholar 

  9. Fang, Z., Yew, C., Tang, T., and Zhu, C., Dynamic Processor Self-scheduling for General Parallel Nested Loops, IEEE Transactions on Computer, Vol. 39, No. 7, (July 1990), 919–929.

    Google Scholar 

  10. Kuck, D., et. al., Parallel Supercomputing Today and Cedar Approach, Science, (Feb. 1986), 967–974.

    Google Scholar 

  11. Kuck, D.J., Kuhn, R.H., Leasure, B., and Wolfe, M.,The Structure of an Advanced Vectorizer for Pipeline Processor, Proceedings of IEEE Computer Society Fourth International Computer Software and Applications Conference, Oct. 1980.

    Google Scholar 

  12. Leasure, B., et. al., PCF Fortran: Language Definition by the Parallel Computing Forum, Proceedings of International Conferences on Parallel Processing, Aug. 1988.

    Google Scholar 

  13. Padua, D.A., and Wolfe, M., Advanced Compiler Optimizations for Supercomputers, Communications ACM, (Dec. 1986), 1184–1201.

    Google Scholar 

  14. Wolfe, M., Iteration Space Tiling for Memory Hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing, Los Angeles, CA, Dec. 1–4, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fang, J., Lu, M. (1992). An iteration partition approach for cache or local memory thrashing on parallel processing. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038673

Download citation

  • DOI: https://doi.org/10.1007/BFb0038673

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-55422-6

  • Online ISBN: 978-3-540-47063-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics