An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

Chen, Guilin; Kandemir, Mahmut

doi:10.1007/978-3-540-71528-3_14

Guilin Chen¹⁷ &
Mahmut Kandemir¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

541 Accesses
1 Citations

Abstract

The tighter integration on chip multiprocessors exerts a higher pressure on off-chip accesses to the memory system. This makes minimizing the number of off-chip accesses a critical optimization goal. This paper discusses a compiler-based solution to this problem for the embedded applications that perform stencil computations. An important characteristic of this solution is that it distinguishes between the intra-processor data reuse and inter-processor data reuse. The first of these captures the data reuse that occurs across loop iterations assigned to the same processor, whereas the second one represents the data reuse that takes place across the loop iterations assigned to different processors. The proposed approach then optimizes inter-processor reuse by re-organizing the loop iterations of each processor carefully, considering how data elements are shared across processors. The goal is to ensure that the different processors access the shared data within a short period of time, so that the data can be captured in the on-chip memory space at the time of the reuse. This paper also presents an evaluation of the proposed optimization and compares it to an alternate scheme that optimizes data locality for each processor in isolation. The results obtained by applying our implementation to eight loop-intensive benchmark codes from the embedded computing domain show that our approach improves over the mentioned alternate scheme by 15.6% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems 9(4), 491–542 (1987)
Article MATH Google Scholar
Banerjee, U.: A theory of loop permutations. In: Proc. 2nd Workshop on Languages and Compilers for Parallel Computing, August (1989)
Google Scholar
Bareiss, E.H.: Sylvester’s Identity and Multistep Integer-Preserving Gaussian Elimination. Mathematics of Computation 22(103), 565–578 (1968)
MathSciNet MATH Google Scholar
Barroso, L.A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., Verghese, B.: Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In: Proceedings of International Symposium on Computer Architecture (2000)
Google Scholar
Bordawekar, R., Choudhary, A., Ramanujam, J.: Automatic optimization of communication in compiling out-of-core stencil codes. In: Proc. ACM International Conference on Supercomputing, May, ACM Press, New York (1996)
Google Scholar
Brickner, R.G., George, W., Johnsson, S.L., Ruttenberg, A.: A stencil compiler for the connection machine models CM-2/200. Technical Report TR-22-93, Center for Research in Computing Technology, Harvard University (December 1993)
Google Scholar
Brickner, R.G., Holian, K., Thiagarajan, B., Johnsson, S.L.: A stencil compiler for the Connection Machine model CM-5. Technical Report CRPC-TR94457, Center for Research on Parallel Computation, Rice University (June 1994)
Google Scholar
Bromley, M., Heller, S., McNerney, T., Steele Jr., G.L.: Fortran at ten gigaflops: the connection machine convolution compiler. In: Proc. ACM Conference on Programming Language Design and Implementation, June, ACM Press, New York (1991)
Google Scholar
Cabay, S.: Exact solution of linear equations. In: Proc. ACM Symposium on Symbolic and Algebraic Manipulation, pp. 392–398. ACM Press, New York (1971)
Chapter Google Scholar
Culler, D., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Davis, K., Bassetti, F.: Exploiting temporal locality in stencil based applications. In: Proc. International Conference on Information Systems Analysis and Synthesis (1999)
Google Scholar
Gomaa, M., Scarbrough, C., Vijaykumar, T.N., Pomeranz, I.: Transient-fault recovery for chip multiprocessors. In: Proc. International Symposium on Computer Architecture (2003)
Google Scholar
Gschwind, M., Hofstee, P., Flachs, B., Hopkins, M., Watanabe, Y., Yamazaki, T.: A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor. Hot Chips 17 (2005)
Google Scholar
Hammond, L., Nayfeh, B.A., Olukotun, K.: A single-chip multiprocessor. IEEE Computer Special Issue on ”Billion-Transistor Processors” (September 1997)
Google Scholar
Hetheringtonh, R.: The UltraSPARC T1 Processor - Power Efficient Throughput Computing. Sun White Paper (December 2005)
Google Scholar
Lee, F.F.: Partitioning of regular computation on multiprocessor systems. Journal of Parallel and Distributed Computing 9, 312–317 (1990)
Article Google Scholar
Leung, S.-T., Zahorjan, J.: Optimizing data locality by array restructuring. Technical Report 95-09-01, University of Washington (September 1995)
Google Scholar
Li, W., Pingali, K.: A singular loop transformation framework based on non-singular matrices. In: Proc. 5th Workshop on Languages and Compilers for Parallel Computing, Yale University, August (1992)
Google Scholar
MAJC-5200. http://www.sun.com/microelectronics/MAJC/5200wp.html
MP98: A Mobile Processor. http://www.labs.nec.co.jp/MP98/top-e.htm
Nayfeh, B.A., Olukotun, K.: Exploring the design space for a shared-cache multiprocessor. In: Proc. International Symposium on Computer Architecture (1994)
Google Scholar
Olukotun, K., Hammond, L.: The future of microprocessors. ACM QUEUE Magazine (September 2005)
Google Scholar
POWER4 System Microarchitecture, White Paper. http://www-1.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.html
Richardson, S.: MPOC: A chip multiprocessor for embedded systems. Technical Report HPL-2002-186, HP Labs (2002)
Google Scholar
Roth, G., Mellor-Crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance Fortran. In: Proc. ACM/IEEE conference on Supercomputing, IEEE Computer Society Press, Los Alamitos (1997)
Google Scholar
SIMICS Toolset. http://www.virtutech.com
SUIF Compiler Infrastructure. http://suif.stanford.edu/
Wolf, W.: The future of multiprocessor systems-on-chips. In: Proc. ACM Design Automation Conference, ACM Press, New York (2004)
Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proc. ACM Conference on Programming Language Design and Implementation, June, pp. 30–44. ACM Press, New York (1991)
Google Scholar
Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452–471 (1991)
Article Google Scholar
Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Pennsylvania State University, USA
Guilin Chen & Mahmut Kandemir

Authors

Guilin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mahmut Kandemir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, G., Kandemir, M. (2007). An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-71528-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics