Loop Tiling for Reconfigurable Accelerators

  • Steven Derrien
  • Sanjay Rajopadhye
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2147)


In this paper, we focus on system level-optimizations for automatic parallelization of nested loop on Reconfigurable Accelerators. Specifically, as off-chip bandwidth plays a major role in total performances for such implementations, we propose some partitioning techniques based on loop tiling which can take advantage of the hierarchically structured RA memory systems.


Loop Nest Systolic Array Memory Hierarchy Processor Array Area Overhead 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Spyder Board x2 Manual Rev 1.1. FZI Website and
  2. [2]
    R. Andonov, H. Bourzoufi, and S. Rajopadhye. Two-dimensional orthogonal tiling: from theory to practice. In International Conference on High Performance Computing (HiPC), 1996.Google Scholar
  3. [3]
    J. Bu and E.F. Depreterre P. Dewilde. A Design Methodology for Partitioning Systolic Arrays. In IEEE conference on Application Specific Array Processor, 1990.Google Scholar
  4. [4]
    L. Carter, J. Ferrante, S. Hummel, B. Alpern, and K. Gatlin. Hierarchical tiling: a methodology for high performance. In Technical Report CS-96-508 and University of California at San Diego, 1996.Google Scholar
  5. [5]
    S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Combining Instruction and Loop Level Parrallelism for FPGAs. IRISA Research report N∘1376 and February 2001.Google Scholar
  6. [6]
    S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Loop Tiling for Reconfigurable Accelerators. IRISA Research report.Google Scholar
  7. [7]
    S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Optimal partitionning for FPGA based regular array implementations. In IEEE PARELEC’00, August 2000.Google Scholar
  8. [8]
    Uwe Eckhardt and Renate Merker. Co-Partitionning-A Method for Hardware/Software design for scalable Systolic Arrays. In Reconfigurable Architectures and ITPress, 1997.Google Scholar
  9. [9]
    J. Vuillemin et al. Programmable active memories: Reconfigurable systems comes of age. In IEEE Transaction on VLSI Systems, 1991.Google Scholar
  10. [10]
    K. Hogsted, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In ACM Symposium on Parallel Algorithms and Architectures, 1999.Google Scholar
  11. [11]
    D. Lavenier. FPGA Implementation of the k-means Clustering Algorithm for Hyper-Spectral Images. Los Alamos Unclassified Report 00-3079 and July 2000.Google Scholar
  12. [12]
    D. I. Moldovan and J. A.N. Forbes. Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays. In IEEE Transactions on Computers, January 1986.Google Scholar
  13. [13]
    P. Quinton. Automatic Synthesis of Systolic arrays from Recurrent Uniform Equations. In International Conference on Computer Architecture, pages 208–214, 1984.Google Scholar
  14. [14]
    L. Thiele J. Teich and L. Zhang. Scheduling of Partitioned Regular Algorithms on Processor Arrays with Contrained Resources. In International Conference on Application Specific Processor Arrays (ASAP), 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Steven Derrien
    • 1
  • Sanjay Rajopadhye
    • 1
  1. 1.IRISARennes CedexFrance

Personalised recommendations