Analytical Bounds for Optimal Tile Size Selection

Shirako, Jun; Sharma, Kamal; Fauzia, Naznin; Pouchet, Louis-Noël; Ramanujam, J.; Sadayappan, P.; Sarkar, Vivek

doi:10.1007/978-3-642-28652-0_6

Jun Shirako¹⁷,
Kamal Sharma¹⁷,
Naznin Fauzia¹⁸,
Louis-Noël Pouchet¹⁸,
J. Ramanujam¹⁹,
P. Sadayappan¹⁸ &
…
Vivek Sarkar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7210))

Included in the following conference series:

International Conference on Compiler Construction

1190 Accesses
34 Citations

Abstract

In this paper, we introduce a novel approach to guide tile size selection by employing analytical models to limit empirical search within a subspace of the full search space. Two analytical models are used together: 1) an existing conservative model, based on the data footprint of a tile, which ignores intra-tile cache block replacement, and 2) an aggressive new model that assumes optimal cache block replacement within a tile. Experimental results on multiple platforms demonstrate the practical effectiveness of the approach by reducing the search space for the optimal tile size by 1,307× to 11,879× for an Intel Core-2-Quad system; 358× to 1,978× for an Intel Nehalem system; and 45× to 1,142× for an IBM Power7 system. The execution of rectangularly tiled code tuned by a search of the subspace identified by our model achieves speed-ups of up to 1.40× (Intel Core-2 Quad), 1.28× (Nehalem) and 1.19× (Power 7) relative to the best possible square tile sizes on these different processor architectures. We also demonstrate the integration of the analytical bounds with existing search optimization algorithms. Our approach not only reduces the total search time from Nelder-Mead Simplex and Parallel Rank Ordering methods by factors of up to 4.95× and 4.33×, respectively, but also finds better tile sizes that yield higher performance in tuned tiled code.

Download to read the full chapter text

Chapter PDF

Towards Automated Variant Selection for Heterogeneous Tiled Architectures

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Barr, T.W., Cox, A.L., Rixner, S.: Translation caching: skip, don’t walk (the page table). In: ISCA 2010, pp. 48–59. ACM, New York (2010)
Google Scholar
Baskaran, M., Hartono, A., Tavarageri, S., Henretty, T., Ramanujam, J., Sadayappan, P.: Parameterized tiling revisited. In: CGO, pp. 200–209 (2010)
Google Scholar
Bhargava, R., Serebrin, B., Spadini, F., Manne, S.: Accelerating two-dimensional page walks for virtualized systems. In: ASPLOS XIII, pp. 26–35 (2008)
Google Scholar
Bilmes, J., Asanovic, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC. In: Proc. ICS, pp. 340–347 (1997)
Google Scholar
Bodin, F., Jalby, W., Windheiser, D., Eisenbeis, C.: A quantitative algorithm for data locality optimization. In: Code Generation, pp. 119–145 (1991)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral program optimization system. In: PLDI (2008)
Google Scholar
Boulet, P., Darte, A., Risset, T., Robert, Y. (Pen)-ultimate tiling? Integration, the VLSI Journal 17(1), 33–51 (1994)
Article Google Scholar
Chame, J., Moon, S.: A tile selection algorithm for data locality and cache interference. In: ICS, pp. 492–499 (1999)
Google Scholar
Chen, C., Chame, J., Hall, M.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: CGO 2005 (2005)
Google Scholar
Coleman, S., McKinley, K.: Tile Size Selection Using Cache Organization and Data Layout. In: PLDI, pp. 279–290 (1995)
Google Scholar
Datta, K.: Auto-tuning stencil codes for cache-based multicore platforms. Technical report, University of California, Berkeley (December 2009)
Google Scholar
Ferrante, J., Sarkar, V., Thrash, W.: On Estimating and Enhancing Cache Effectiveness. In: Banerjee, U., Nicolau, A., Gelernter, D., Padua, D.A. (eds.) LCPC 1991. LNCS, vol. 589, pp. 328–343. Springer, Heidelberg (1992)
Chapter Google Scholar
Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM TOPLAS 21(4), 703–746 (1999)
Article Google Scholar
Goto, K., van de Geijn, R.A.: High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1) (July 2008)
Google Scholar
Hartono, A., Baskaran, M.M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., Sadayappan, P.: Parametric multi-level tiling of imperfectly nested loops. In: Proc. ICS (2009)
Google Scholar
Hsu, C., Kremer, U.: A quantitative analysis of tile size selection algorithms. J. Supercomput. 27(3), 279–294 (2004)
Article MATH Google Scholar
Irigoin, F., Triolet, R.: Supernode partitioning. In: ACM POPL, pp. 319–329 (1988)
Google Scholar
Kim, D., Renganarayanan, L., Strout, M., Rajopadhye, S.: Multi-level tiling: ’m’ for the price of one. In: SC (2007)
Google Scholar
Knijnenburg, P.M.W., Kisuki, T., O’Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. The Journal of Supercomputing 24(1), 43–67 (2003)
Article MATH Google Scholar
Lam, M., Rothberg, E., Wolf, M.: The cache performance and optimizations of blocked algorithms. In: Proc. 4th ACM ASPLOS, pp. 63–74 (1991)
Google Scholar
Luersen, M., Riche, R.L., Guyon, F.: A constrained, globalized, and bounded nelder-mead method for engineering optimization. Structural and Multidisciplinary Optimization 27(1-2), 43–54 (2004)
Article Google Scholar
Nelder, J.A., Mead, R.: A simplex method for function minimization. Computer Journal 7(4), 308–313 (1965)
MATH Google Scholar
Ramanujam, J., Sadayappan, P.: Tiling multidimensional iteration spaces for multicomputers. JPDC 16(2), 108–230 (1992)
Google Scholar
Renganarayana, L., Kim, D., Rajopadhye, S., Strout, M.: Parameterized tiled loops for free. In: PLDI, pp. 405–414 (2007)
Google Scholar
Resource Characterization in the PACE Project, http://www.pace.rice.edu/Content.aspx?id=41
Rivera, G., Tseng, C.: Locality optimizations for multi-level caches. In: SC (1999)
Google Scholar
Sarkar, V.: Automatic Selection of High Order Transformations in the IBM XL Fortran Compilers. IBM J. Res. & Dev. 41(3) (May 1997)
Google Scholar
Sarkar, V., Megiddo, N.: An analytical model for loop tiling and its solution. In: IEEE ISPASS (2000)
Google Scholar
Schreiber, R., Dongarra, J.: Automatic blocking of nested loops. Tech. Report 90.38, RIACS, NASA Ames Research Center (1990)
Google Scholar
Tabatabaee, V., Tiwari, A., Hollingsworth, J.K.: Parallel parameter tuning for applications with performance variability. In: Proc. Supercomputing 2005 (2005)
Google Scholar
Tapus, C., Chung, I.-H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: SC, pp. 1–11 (2002)
Google Scholar
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.: Scalable autotuning framework for compiler optimization. In: IPDPS 2009 (2009)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)
Article MATH Google Scholar
Wolf, M., Lam, M.S.: A data locality optimizing algorithm. In: PLDI 1991, pp. 30–44 (1991)
Google Scholar
Wolfe, M.: More iteration space tiling. In: Proc. Supercomputing, pp. 655–664 (1989)
Google Scholar
Xue, J.: Loop tiling for parallelism. Kluwer Academic Publishers, Norwell (2000)
Book MATH Google Scholar
Yotov, K., Pingali, K., Stodghill, P.: Think globally, search locally. In: International Conference on Supercomputing (2005)
Google Scholar
Yuki, T., Renganarayanan, L., Rajopadhye, S., Anderson, C., Eichenberger, A., O’Brien, K.: Automatic creation of tile size selection models. In: CGO, pp. 190–199 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Rice University, USA
Jun Shirako, Kamal Sharma & Vivek Sarkar
The Ohio State University, USA
Naznin Fauzia, Louis-Noël Pouchet & P. Sadayappan
Louisiana State University, USA
J. Ramanujam

Authors

Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Naznin Fauzia
View author publications
You can also search for this author in PubMed Google Scholar
Louis-Noël Pouchet
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramanujam
View author publications
You can also search for this author in PubMed Google Scholar
P. Sadayappan
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School for Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, UK
Michael O’Boyle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shirako, J. et al. (2012). Analytical Bounds for Optimal Tile Size Selection. In: O’Boyle, M. (eds) Compiler Construction. CC 2012. Lecture Notes in Computer Science, vol 7210. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28652-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-28652-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28651-3
Online ISBN: 978-3-642-28652-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analytical Bounds for Optimal Tile Size Selection

Abstract

Chapter PDF

Similar content being viewed by others

Towards Automated Variant Selection for Heterogeneous Tiled Architectures

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analytical Bounds for Optimal Tile Size Selection

Abstract

Chapter PDF

Similar content being viewed by others

Towards Automated Variant Selection for Heterogeneous Tiled Architectures

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation