Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations

Devos, Harald; Beyls, Kristof; Christiaens, Mark; Van Campenhout, Jan; D’Hollander, Erik H.; Stroobandt, Dirk

doi:10.1007/978-3-540-71528-3_11

Harald Devos¹⁷,
Kristof Beyls¹⁷,
Mark Christiaens¹⁷,
Jan Van Campenhout¹⁷,
Erik H. D’Hollander¹⁷ &
…
Dirk Stroobandt¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

628 Accesses
10 Citations

Abstract

When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.

We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.

The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.impulsec.com/
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
MATH Google Scholar
Ashar, P., Devadas, S., Newton, A.R.: Sequential logic synthesis. Kluwer Academic Publishers, Dordrecht (1992)
Book Google Scholar
Augé, I., Pétrot, F., Donnet, F., Gomez, P.: Platform-based design from parallel C specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24(12), 1811–1826 (2005)
Article Google Scholar
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)
Google Scholar
Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 209–225. Springer, Heidelberg (2004)
Chapter Google Scholar
Bastoul, C., Feautrier, P.: Improving data locality by chunking. In: Hedin, G. (ed.) CC 2003 and ETAPS 2003. LNCS, vol. 2622, pp. 320–335. Springer, Heidelberg (2003)
Chapter Google Scholar
Benini, L., Siegel, P., De Micheli, G.: Saving power by synthesizing gated clocks for sequential circuits. IEEE Design & Test of Computers 11(4), 32–41 (1994)
Article Google Scholar
Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: SIGMETRICS, Banff, Alberta, Canada, pp. 169–180 (2005), doi:10.1145/1064212.1064232
Google Scholar
Beyls, K., D’Hollander, E.: Discovery of locality-improving refactorings by reuse path analysis. In: Gerndt, M., Kranzlmüller, D. (eds.) HPCC 2006. LNCS, vol. 4208, pp. 220–229. Springer, Heidelberg (2006)
Chapter Google Scholar
Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. Journal of Systems Architecture 51(4), 223–250 (2005)
Article Google Scholar
Beyls, K., D’Hollander, E.H.: Intermediately executed code is the key to find refactorings that improve temporal data locality. In: Proceedings of the 3rd conference on Computing Frontiers, pp. 373–382 (2006)
Google Scholar
Bohm, W., Hammes, J., Draper, B., Chawathe, M., Ross, C., Rinker, R., Najjar, W.: Mapping a single assignment programming language to reconfigurable systems. Journal of Supercomputing 21(2), 117–130 (2002)
Article MATH Google Scholar
Bormans, J., Denolf, K., Wuytack, S., Nachtergaele, L., Bolsens, I.: Integrating system-level low power methodologies into a real-life design flow. In: PATMOS, pp. 19–28 (1999)
Google Scholar
Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ICS, June 2005, pp. 151–160 (2005)
Google Scholar
DeHon, A.: The density advantage of configurable computing. IEEE Computer 33(4), 41–49 (2000)
Article Google Scholar
Devos, H., Eeckhaut, H., Schrauwen, B., Christiaens, M., Stroobandt, D.: Ever considered SystemC? In: ProRISC Workshop, pp. 358–363 (2004)
Google Scholar
Fursin, G.: Iterative Compilation and Performance Prediction for Numerical Applications. PhD thesis, University of Edinburgh (2004)
Google Scholar
Girbal, S.: Optimisation d’applications - Composition de transformations de programme: modèle et outils. PhD thesis, l’Université de Paris XI Orsay (2005)
Google Scholar
Guillou, A.C., Quinton, P., Risset, T.: Hardware synthesis for systems of recurrence equations with multi-dimensional schedule. International Journal of Embedded Systems (to appear, 2005)
Google Scholar
Hannig, F., Dutta, H., Teich, J.: Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: Architectural parameters and methodology. International Journal of Embedded Systems 2(1/2), 114–127 (2006)
Article Google Scholar
Huang, C.-T., Tseng, P.-C., Chen, L.-G.: Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform. IEEE Transactions on Signal Processing 53(4), 1575–1586 (2005)
Article MathSciNet Google Scholar
Knijnenburg, P.M.W., Kisuki, T., O’Boyle, M.F.P.: Iterative compilation. In: Deprettere, F., Teich, J., Vassiliadis, S. (eds.) Embedded Processor Design Challenges. LNCS, vol. 2268, pp. 171–187. Springer, Heidelberg (2002)
Chapter Google Scholar
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674–693 (1989)
Article MATH Google Scholar
Martonosi, M., Gupta, A., Anderson, T.E.: Tuning memory performance of sequential and parallel programs. IEEE Computer (April 1995)
Google Scholar
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996), http://www.acm.org/pubs/toc/Abstracts/toplas/233564.html
Article Google Scholar
Rau, B.R., Schlansker, M.S.: Embedded computer architecture and automation. IEEE Computer 34(4), 75–83 (2001)
Article Google Scholar
Turjan, A., Kienhuis, B., Deprettere, E.: Translating affine nested-loop programs to process networks. In: CASES (2004), http://www.liacs.nl/~aturjan/publications.html
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: PLDI. SIGPLAN Notices, vol. 26, pp. 30–44 (1991), http://suif.stanford.edu/papers/wolf91a.ps
Wolfe, M.: Experiences with data dependence abstractions. In: ICS, Cologne, West Germany, pp. 321–329 (1991), doi:10.1145/109025.109104
Google Scholar
Yu, Y., D’Hollander, E.H.: Loop parallelization using the 3D iteration space visualizer. Journal of Visual Languages and Computing 12, 163–181 (2001)
Article Google Scholar
Zervas, N.D., Anagnostopoulos, G.P., Spiliotopoulos, V., Andreopoulos, Y., Goutis, C.E.: Evaluation of design alternatives for the 2D-discrete wavelet transform. IEEE Transactions on Circuits and Systems for Video Technology 11(12), 1246–1262 (2001)
Article Google Scholar
Zissulescu, C., Kienhuis, B., Deprettere, E.: Expression synthesis in process networks generated by LAURA. In: ASAP, July 2005, pp. 15–21 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium
Harald Devos, Kristof Beyls, Mark Christiaens, Jan Van Campenhout, Erik H. D’Hollander & Dirk Stroobandt

Authors

Harald Devos
View author publications
You can also search for this author in PubMed Google Scholar
Kristof Beyls
View author publications
You can also search for this author in PubMed Google Scholar
Mark Christiaens
View author publications
You can also search for this author in PubMed Google Scholar
Jan Van Campenhout
View author publications
You can also search for this author in PubMed Google Scholar
Erik H. D’Hollander
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Stroobandt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Devos, H., Beyls, K., Christiaens, M., Van Campenhout, J., D’Hollander, E.H., Stroobandt, D. (2007). Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-71528-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics