Skip to main content

On the Interaction of Tiling and Automatic Parallelization

  • Conference paper
OpenMP Shared Memory Parallel Programming (IWOMP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4315))

Included in the following conference series:

Abstract

Iteration space tiling is a well-explored programming and compiler technique to enhance program locality. Its performance benefit appears obvious, as the ratio of processor versus memory speed increases continuously. In an effort to include a tiling pass into an advanced parallelizing compiler, we have found that the interaction of tiling and parallelization raises unexplored issues. Applying existing, sequential tiling techniques, followed by parallelization, leads to performance degradation in many programs. Applying tiling after parallelization without considering parallel execution semantics may lead to incorrect programs. Doing so conservatively, also introduces overhead in some of the measured programs. In this paper, we present an algorithm that applies tiling in concert with parallelization. The algorithm avoids the above negative effects. Our paper also presents the first comprehensive evaluation of tiling techniques on compiler-parallelized programs. Our tiling algorithm improves the SPEC CPU95 floating-point programs by up to 21% over non-tiled versions (4.9% on average) and the SPEC CPU2000 Fortran 77 programs up to 49% (11% on average). Notably, in about half of the benchmarks, tiling does not have a significant effect.

This work was supported in part by the National Science Foundation under Grants 0103582-EIA, and 0429535-CCF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmed, N., Mateev, N., Pingali, K.: Tiling imperfectly-nested loop nests. In: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p. 31 (2000)

    Google Scholar 

  2. Allen, R., Kennedy, K.: Dependence: Theory and Practice. Optimizing compilers for modern architectures, pp. 45–55. Morgan Kaufman Publishers, San Francisco (2002)

    Google Scholar 

  3. Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: Proceedings of the conference on Programming language design and implementation, pp. 112–125 (1993)

    Google Scholar 

  4. Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A.: Automatic program parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)

    Article  Google Scholar 

  5. Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Advanced program restructuring for high-performance computers with polaris (1996)

    Google Scholar 

  6. Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)

    Google Scholar 

  7. Chame, J., Moon, S.: A tile selection algorithm for data locality and cache interference. In: Proceedings of the 13th international conference on Supercomputing, pp. 492–499 (1999)

    Google Scholar 

  8. Coleman, S., McKinley, K.S.: Tile size selection using cache organization and data layout. In: Proceedings of the conference on Programming language design and implementation, pp. 279–290 (1995)

    Google Scholar 

  9. Kennedy, K., McKinley, K.S.: Optimizing for parallelism and data locality. In: Proceedings of the 6th international conference on Supercomputing, pp. 323–334 (1992)

    Google Scholar 

  10. Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, pp. 63–74 (1991)

    Google Scholar 

  11. McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)

    Article  Google Scholar 

  12. Min, S.J., Kim, S.W., Voss, M., Lee, S.I., Eigenmann, R.: Portable compilers for OpenMP. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 11–19. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Ramanujam, J., Sadayappan, P.: Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing 16(2), 108–230 (1992)

    Article  Google Scholar 

  14. Song, Y., Li, Z.: A compiler framework for tiling imperfectly-nested loops. In: Languages and Compilers for Parallel Computing, pp. 185–200 (1999)

    Google Scholar 

  15. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the conference on Programming language design and implementation, pp. 30–44 (1991)

    Google Scholar 

  16. Wolf, M.E., Maydan, D.E., Chen, D.-K.: Combining loop transformations considering caches and scheduling. In: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pp. 274–286 (1996)

    Google Scholar 

  17. Wolfe, M.J.: Optimizing supercompilers for supercomputers. PhD thesis (1982)

    Google Scholar 

  18. Wolfe, M., Banerjee, U.: Data dependence and its application to parallel processing. Int. J. Parallel Program. 16(2), 137–178 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  19. Xue, J.: On tiling as a loop transformation. Parallel Processing Letters 7(4), 409–424 (1997)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Matthias S. Mueller Barbara M. Chapman Bronis R. de Supinski Allen D. Malony Michael Voss

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pan, Z., Armstrong, B., Bae, H., Eigenmann, R. (2008). On the Interaction of Tiling and Automatic Parallelization. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds) OpenMP Shared Memory Parallel Programming. IWOMP 2005. Lecture Notes in Computer Science, vol 4315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68555-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68555-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68554-8

  • Online ISBN: 978-3-540-68555-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics