Loop Tiling for Reconfigurable Accelerators
In this paper, we focus on system level-optimizations for automatic parallelization of nested loop on Reconfigurable Accelerators. Specifically, as off-chip bandwidth plays a major role in total performances for such implementations, we propose some partitioning techniques based on loop tiling which can take advantage of the hierarchically structured RA memory systems.
KeywordsLoop Nest Systolic Array Memory Hierarchy Processor Array Area Overhead
Unable to display preview. Download preview PDF.
- Spyder Board x2 Manual Rev 1.1. FZI Website and http://www.x2e.de/.
- R. Andonov, H. Bourzoufi, and S. Rajopadhye. Two-dimensional orthogonal tiling: from theory to practice. In International Conference on High Performance Computing (HiPC), 1996.Google Scholar
- J. Bu and E.F. Depreterre P. Dewilde. A Design Methodology for Partitioning Systolic Arrays. In IEEE conference on Application Specific Array Processor, 1990.Google Scholar
- L. Carter, J. Ferrante, S. Hummel, B. Alpern, and K. Gatlin. Hierarchical tiling: a methodology for high performance. In Technical Report CS-96-508 and University of California at San Diego, 1996.Google Scholar
- S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Combining Instruction and Loop Level Parrallelism for FPGAs. IRISA Research report N∘1376 and February 2001.Google Scholar
- S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Loop Tiling for Reconfigurable Accelerators. IRISA Research report.Google Scholar
- S. Derrien, S. Rajopadhye, and S. Sur-Kolay. Optimal partitionning for FPGA based regular array implementations. In IEEE PARELEC’00, August 2000.Google Scholar
- Uwe Eckhardt and Renate Merker. Co-Partitionning-A Method for Hardware/Software design for scalable Systolic Arrays. In Reconfigurable Architectures and ITPress, 1997.Google Scholar
- J. Vuillemin et al. Programmable active memories: Reconfigurable systems comes of age. In IEEE Transaction on VLSI Systems, 1991.Google Scholar
- K. Hogsted, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In ACM Symposium on Parallel Algorithms and Architectures, 1999.Google Scholar
- D. Lavenier. FPGA Implementation of the k-means Clustering Algorithm for Hyper-Spectral Images. Los Alamos Unclassified Report 00-3079 and July 2000.Google Scholar
- D. I. Moldovan and J. A.N. Forbes. Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays. In IEEE Transactions on Computers, January 1986.Google Scholar
- P. Quinton. Automatic Synthesis of Systolic arrays from Recurrent Uniform Equations. In International Conference on Computer Architecture, pages 208–214, 1984.Google Scholar
- L. Thiele J. Teich and L. Zhang. Scheduling of Partitioned Regular Algorithms on Processor Arrays with Contrained Resources. In International Conference on Application Specific Processor Arrays (ASAP), 1996.Google Scholar