Skip to main content

An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

  • Conference paper
Transactions on High-Performance Embedded Architectures and Compilers I

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

Abstract

The tighter integration on chip multiprocessors exerts a higher pressure on off-chip accesses to the memory system. This makes minimizing the number of off-chip accesses a critical optimization goal. This paper discusses a compiler-based solution to this problem for the embedded applications that perform stencil computations. An important characteristic of this solution is that it distinguishes between the intra-processor data reuse and inter-processor data reuse. The first of these captures the data reuse that occurs across loop iterations assigned to the same processor, whereas the second one represents the data reuse that takes place across the loop iterations assigned to different processors. The proposed approach then optimizes inter-processor reuse by re-organizing the loop iterations of each processor carefully, considering how data elements are shared across processors. The goal is to ensure that the different processors access the shared data within a short period of time, so that the data can be captured in the on-chip memory space at the time of the reuse. This paper also presents an evaluation of the proposed optimization and compares it to an alternate scheme that optimizes data locality for each processor in isolation. The results obtained by applying our implementation to eight loop-intensive benchmark codes from the embedded computing domain show that our approach improves over the mentioned alternate scheme by 15.6% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems 9(4), 491–542 (1987)

    Article  MATH  Google Scholar 

  2. Banerjee, U.: A theory of loop permutations. In: Proc. 2nd Workshop on Languages and Compilers for Parallel Computing, August (1989)

    Google Scholar 

  3. Bareiss, E.H.: Sylvester’s Identity and Multistep Integer-Preserving Gaussian Elimination. Mathematics of Computation 22(103), 565–578 (1968)

    MathSciNet  MATH  Google Scholar 

  4. Barroso, L.A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., Verghese, B.: Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In: Proceedings of International Symposium on Computer Architecture (2000)

    Google Scholar 

  5. Bordawekar, R., Choudhary, A., Ramanujam, J.: Automatic optimization of communication in compiling out-of-core stencil codes. In: Proc. ACM International Conference on Supercomputing, May, ACM Press, New York (1996)

    Google Scholar 

  6. Brickner, R.G., George, W., Johnsson, S.L., Ruttenberg, A.: A stencil compiler for the connection machine models CM-2/200. Technical Report TR-22-93, Center for Research in Computing Technology, Harvard University (December 1993)

    Google Scholar 

  7. Brickner, R.G., Holian, K., Thiagarajan, B., Johnsson, S.L.: A stencil compiler for the Connection Machine model CM-5. Technical Report CRPC-TR94457, Center for Research on Parallel Computation, Rice University (June 1994)

    Google Scholar 

  8. Bromley, M., Heller, S., McNerney, T., Steele Jr., G.L.: Fortran at ten gigaflops: the connection machine convolution compiler. In: Proc. ACM Conference on Programming Language Design and Implementation, June, ACM Press, New York (1991)

    Google Scholar 

  9. Cabay, S.: Exact solution of linear equations. In: Proc. ACM Symposium on Symbolic and Algebraic Manipulation, pp. 392–398. ACM Press, New York (1971)

    Chapter  Google Scholar 

  10. Culler, D., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  11. Davis, K., Bassetti, F.: Exploiting temporal locality in stencil based applications. In: Proc. International Conference on Information Systems Analysis and Synthesis (1999)

    Google Scholar 

  12. Gomaa, M., Scarbrough, C., Vijaykumar, T.N., Pomeranz, I.: Transient-fault recovery for chip multiprocessors. In: Proc. International Symposium on Computer Architecture (2003)

    Google Scholar 

  13. Gschwind, M., Hofstee, P., Flachs, B., Hopkins, M., Watanabe, Y., Yamazaki, T.: A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor. Hot Chips 17 (2005)

    Google Scholar 

  14. Hammond, L., Nayfeh, B.A., Olukotun, K.: A single-chip multiprocessor. IEEE Computer Special Issue on ”Billion-Transistor Processors” (September 1997)

    Google Scholar 

  15. Hetheringtonh, R.: The UltraSPARC T1 Processor - Power Efficient Throughput Computing. Sun White Paper (December 2005)

    Google Scholar 

  16. Lee, F.F.: Partitioning of regular computation on multiprocessor systems. Journal of Parallel and Distributed Computing 9, 312–317 (1990)

    Article  Google Scholar 

  17. Leung, S.-T., Zahorjan, J.: Optimizing data locality by array restructuring. Technical Report 95-09-01, University of Washington (September 1995)

    Google Scholar 

  18. Li, W., Pingali, K.: A singular loop transformation framework based on non-singular matrices. In: Proc. 5th Workshop on Languages and Compilers for Parallel Computing, Yale University, August (1992)

    Google Scholar 

  19. MAJC-5200. http://www.sun.com/microelectronics/MAJC/5200wp.html

  20. MP98: A Mobile Processor. http://www.labs.nec.co.jp/MP98/top-e.htm

  21. Nayfeh, B.A., Olukotun, K.: Exploring the design space for a shared-cache multiprocessor. In: Proc. International Symposium on Computer Architecture (1994)

    Google Scholar 

  22. Olukotun, K., Hammond, L.: The future of microprocessors. ACM QUEUE Magazine (September 2005)

    Google Scholar 

  23. POWER4 System Microarchitecture, White Paper. http://www-1.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.html

  24. Richardson, S.: MPOC: A chip multiprocessor for embedded systems. Technical Report HPL-2002-186, HP Labs (2002)

    Google Scholar 

  25. Roth, G., Mellor-Crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance Fortran. In: Proc. ACM/IEEE conference on Supercomputing, IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  26. SIMICS Toolset. http://www.virtutech.com

  27. SUIF Compiler Infrastructure. http://suif.stanford.edu/

  28. Wolf, W.: The future of multiprocessor systems-on-chips. In: Proc. ACM Design Automation Conference, ACM Press, New York (2004)

    Google Scholar 

  29. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proc. ACM Conference on Programming Language Design and Implementation, June, pp. 30–44. ACM Press, New York (1991)

    Google Scholar 

  30. Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452–471 (1991)

    Article  Google Scholar 

  31. Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, G., Kandemir, M. (2007). An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71528-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71527-6

  • Online ISBN: 978-3-540-71528-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics