An Efficient Semi-Hierarchical Array Layout

  • N. P. Drakenberg
  • F. Lundevall
  • B. Lisper
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 613)


For high-level programming languages, linear array layout have de facto been the sole form of mapping array elements to memory, to see widespread use. The increasingly deep and complex memory hierarchies present in current computer systems expose several deficiencies of linear array layouts. One such deficiency is that linear array layouts strongly favor locality in one index dimension of multidimensional arrays. Secondly, the exact mapping of array elements to cache locations depend on the array’s size, which effectively renders linear array layouts non-analyzable with respect to cache behavior. We present and evaluate an alternative, semi-hierarchical, array layout which differs from linear array layouts by being neutral with respect to locality in different index dimensions and by enabling accurate and precise analysis of cache behaviors at compile-time.


Array Element Iteration Space Cache Line Index Expression Array Reference 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Carter, L., Ferrante, J., and Hummel, S. (1995). Hierarchical tiling for improved superscalar performance. In International Parallel Processing Symposium.Google Scholar
  2. [2]
    Chatterjee, S., Gilbert, J., Schreiber, R., and Teng, S.-H. (1992). Optimal evaluation of array expressions on massively parallel machines. Technical report, XEROX PARC.Google Scholar
  3. [3]
    Chatterjee, S., Jain, V. V., Lebeck, A. R., Mundhra, S., and Thottethodi, M. (1999a). Nonlinear array layouts for hierarchical memory systems. In Proc. 1999 ACM Int. Conf. on Supercomputing,pages 444-453, Rhodes, Greece.Google Scholar
  4. [4]
    Chatterjee, S., Lebeck, A. R., Patnala, P. K., and Thottethodi, M. (1999b). Recursive array layouts and fast parallel matrix multiplication. In Proc. Eleventh ACM Symposium on Parallel Algorithms and Architectures, pages 222–231, Saint-Malo, France.CrossRefGoogle Scholar
  5. [5]
    Cmelik, R. F. (1993). Spixtools user’s manual. Technical Report SMLI TR-93–6, Sun Microsystems Labs, Mountain View, CA.Google Scholar
  6. [6]
    Coleman, S. and McKinley, K. S. (1995). Tile size selection using cache organization and data layout. In Proc. ACM Conf. on Programming Language Design and Implementation, pages 279–290, La Jolla, CA.Google Scholar
  7. [7]
    Drakenberg, N. P. (2001). Hierarchical Array Tiling. Licentiate thesis, Department of Teleinformatics, Royal Institute of Technology, Stockholm. In preparation.
  8. [8]
    Gargantini, I. (1982). Linear octtrees for fast processing of three-dimensional objects. Comput. Graphics Image Process., 20: 365–374.CrossRefGoogle Scholar
  9. [9]
    Ghosh, S., Martonosi, M., and Malik, S. (1997). Cache miss equations: An analytical representation of cache misses. In Proc. 1997 International Conference on Supercomputing, pages 317–324, Vienna, Austria.Google Scholar
  10. [10]
    Ghosh, S., Martonosi, M., and Malik, S. (1998). Precise miss analysis for program transformations with caches of arbitrary associativity. In Proc. 8th Int. Conf. on Architectural Support for Programming Languages and Operating Systems,Vienna, Austria.Google Scholar
  11. [11]
    Gupta, M. (1992). Automatic Data Partitioning on Distributed Memory Multicomputers. PhD thesis, University of Illinios at Urbana-Champaign, Urbana, IL.Google Scholar
  12. [12]
    Hu, Y., Johnsson, S., and Teng, S.-H. (1997). High Performance Fortran for highly irregular problems. In Proc. Sixth ACM SIG-PLAN Symp. on Principles and Practice of Parallel Programming, pages 13–24, Las Vegas, NV.CrossRefGoogle Scholar
  13. [13]
    Knobe, K., Lucas, J. D., and Daily, W. J. (1992). Dynamic alignment on distributed memory systems. In Proc. 3rd Workshop on Compilers for Parallel Computers, pages 394–404.Google Scholar
  14. [14]
    Lam, M. S., Rothberg, E. E., and Wolf, M. E. (1991). The cache performance and optimizations of blocked algorithms. In Proc. 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 63–74.Google Scholar
  15. [15]
    Mitchell, N., Carter, L., and Ferrante, J. (1997). A compiler perspective on architectural evolutions. In Workshop on Interaction between Compilers and Computer Architectures,San Antonio, Texas.Google Scholar
  16. [16]
    Morton, G. M. (1966). A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario.Google Scholar
  17. [17]
    Pilkington, J. and Baden, S. (1996). Dynamic partitioning of nonuniform structured workloads with spacefilling curves. IEEE Trans. on Parallel and Distributed Systems, 7: 288–300.CrossRefGoogle Scholar
  18. [18]
    Rivera, G. and Tseng, C.-W. (1998a). Data transformations for eliminating conflict misses. In Proc. ACM SIGPLAN’98 Conference on Programming Language Design and Implementation, pages 3849, Montreal, Canada.Google Scholar
  19. [19]
    Rivera, G. and Tseng, C.-W. (1998b). Eliminating conflict misses for high performance architectures. In Proc. 1998 International Conference on Supercomputing, pages 353–360, Melbourne, Australia.Google Scholar
  20. [20]
    Schrack, G. (1992). Finding neighbors of equal size in linear quadtrees and octtrees in constant time. CVGIP: Image Underst., 55 (3): 221–230.zbMATHCrossRefGoogle Scholar
  21. [21]
    Schrijver, A. (1986). Teory of Linear and Integer Programming. John Wiley and Sons, Chichester.Google Scholar
  22. [22]
    Temam, O., Granston, E. D., and Jalby, W. (1993). To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proc. Supercomputing ‘83, Portland, OR.Google Scholar
  23. [23]
    Tocher, K. (1954). The application of computers to sampling experiments. J. Roy. Statist. Soc., 16 (1): 39–61.MathSciNetzbMATHGoogle Scholar
  24. [24]
    Vera, X., Llosa, J., Gonzalez, A., and Ciuraneta, C. (2000). A fast implementation of cache miss equations. In Proc. 8th Workshop on Compilers for Parallel Computers, pages 321–328, Aussois, France.Google Scholar
  25. [25]
    Wise, D. S. (2000). Ahnentafel indexing into morton-ordered arrays, or matrix locality for free. In Bode, A. et al., editors, Proc. Euro-Par 2000, pages 774–783. Springer-Verlag.Google Scholar
  26. [26]
    Wolfe, M. (1996). High Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, CA.Google Scholar
  27. [27]
    Woodwark, J. R. (1982). The explicit quadtree as a structure for computer graphics. Comput. J., 25 (2): 235–238.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2001

Authors and Affiliations

  • N. P. Drakenberg
    • 1
  • F. Lundevall
    • 1
  • B. Lisper
    • 2
  1. 1.Department of TeleinformaticsRoyal Institute of TechnologyKistaSweden
  2. 2.Department of Computer EngineeringMälardalen UniversityVästeråsSweden

Personalised recommendations