Skip to main content

Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations

  • Conference paper
Book cover Transactions on High-Performance Embedded Architectures and Compilers I

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

Abstract

When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.

We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.

The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.impulsec.com/

  2. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)

    MATH  Google Scholar 

  3. Ashar, P., Devadas, S., Newton, A.R.: Sequential logic synthesis. Kluwer Academic Publishers, Dordrecht (1992)

    Book  Google Scholar 

  4. Augé, I., Pétrot, F., Donnet, F., Gomez, P.: Platform-based design from parallel C specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24(12), 1811–1826 (2005)

    Article  Google Scholar 

  5. Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)

    Google Scholar 

  6. Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 209–225. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Bastoul, C., Feautrier, P.: Improving data locality by chunking. In: Hedin, G. (ed.) CC 2003 and ETAPS 2003. LNCS, vol. 2622, pp. 320–335. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Benini, L., Siegel, P., De Micheli, G.: Saving power by synthesizing gated clocks for sequential circuits. IEEE Design & Test of Computers 11(4), 32–41 (1994)

    Article  Google Scholar 

  9. Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. In: SIGMETRICS, Banff, Alberta, Canada, pp. 169–180 (2005), doi:10.1145/1064212.1064232

    Google Scholar 

  10. Beyls, K., D’Hollander, E.: Discovery of locality-improving refactorings by reuse path analysis. In: Gerndt, M., KranzlmĂ¼ller, D. (eds.) HPCC 2006. LNCS, vol. 4208, pp. 220–229. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Beyls, K., D’Hollander, E.H.: Generating cache hints for improved program efficiency. Journal of Systems Architecture 51(4), 223–250 (2005)

    Article  Google Scholar 

  12. Beyls, K., D’Hollander, E.H.: Intermediately executed code is the key to find refactorings that improve temporal data locality. In: Proceedings of the 3rd conference on Computing Frontiers, pp. 373–382 (2006)

    Google Scholar 

  13. Bohm, W., Hammes, J., Draper, B., Chawathe, M., Ross, C., Rinker, R., Najjar, W.: Mapping a single assignment programming language to reconfigurable systems. Journal of Supercomputing 21(2), 117–130 (2002)

    Article  MATH  Google Scholar 

  14. Bormans, J., Denolf, K., Wuytack, S., Nachtergaele, L., Bolsens, I.: Integrating system-level low power methodologies into a real-life design flow. In: PATMOS, pp. 19–28 (1999)

    Google Scholar 

  15. Cohen, A., Girbal, S., Parello, D., Sigler, M., Temam, O., Vasilache, N.: Facilitating the search for compositions of program transformations. In: ICS, June 2005, pp. 151–160 (2005)

    Google Scholar 

  16. DeHon, A.: The density advantage of configurable computing. IEEE Computer 33(4), 41–49 (2000)

    Article  Google Scholar 

  17. Devos, H., Eeckhaut, H., Schrauwen, B., Christiaens, M., Stroobandt, D.: Ever considered SystemC? In: ProRISC Workshop, pp. 358–363 (2004)

    Google Scholar 

  18. Fursin, G.: Iterative Compilation and Performance Prediction for Numerical Applications. PhD thesis, University of Edinburgh (2004)

    Google Scholar 

  19. Girbal, S.: Optimisation d’applications - Composition de transformations de programme: modèle et outils. PhD thesis, l’Université de Paris XI Orsay (2005)

    Google Scholar 

  20. Guillou, A.C., Quinton, P., Risset, T.: Hardware synthesis for systems of recurrence equations with multi-dimensional schedule. International Journal of Embedded Systems (to appear, 2005)

    Google Scholar 

  21. Hannig, F., Dutta, H., Teich, J.: Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: Architectural parameters and methodology. International Journal of Embedded Systems 2(1/2), 114–127 (2006)

    Article  Google Scholar 

  22. Huang, C.-T., Tseng, P.-C., Chen, L.-G.: Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform. IEEE Transactions on Signal Processing 53(4), 1575–1586 (2005)

    Article  MathSciNet  Google Scholar 

  23. Knijnenburg, P.M.W., Kisuki, T., O’Boyle, M.F.P.: Iterative compilation. In: Deprettere, F., Teich, J., Vassiliadis, S. (eds.) Embedded Processor Design Challenges. LNCS, vol. 2268, pp. 171–187. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  24. Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674–693 (1989)

    Article  MATH  Google Scholar 

  25. Martonosi, M., Gupta, A., Anderson, T.E.: Tuning memory performance of sequential and parallel programs. IEEE Computer (April 1995)

    Google Scholar 

  26. McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996), http://www.acm.org/pubs/toc/Abstracts/toplas/233564.html

    Article  Google Scholar 

  27. Rau, B.R., Schlansker, M.S.: Embedded computer architecture and automation. IEEE Computer 34(4), 75–83 (2001)

    Article  Google Scholar 

  28. Turjan, A., Kienhuis, B., Deprettere, E.: Translating affine nested-loop programs to process networks. In: CASES (2004), http://www.liacs.nl/~aturjan/publications.html

  29. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: PLDI. SIGPLAN Notices, vol. 26, pp. 30–44 (1991), http://suif.stanford.edu/papers/wolf91a.ps

  30. Wolfe, M.: Experiences with data dependence abstractions. In: ICS, Cologne, West Germany, pp. 321–329 (1991), doi:10.1145/109025.109104

    Google Scholar 

  31. Yu, Y., D’Hollander, E.H.: Loop parallelization using the 3D iteration space visualizer. Journal of Visual Languages and Computing 12, 163–181 (2001)

    Article  Google Scholar 

  32. Zervas, N.D., Anagnostopoulos, G.P., Spiliotopoulos, V., Andreopoulos, Y., Goutis, C.E.: Evaluation of design alternatives for the 2D-discrete wavelet transform. IEEE Transactions on Circuits and Systems for Video Technology 11(12), 1246–1262 (2001)

    Article  Google Scholar 

  33. Zissulescu, C., Kienhuis, B., Deprettere, E.: Expression synthesis in process networks generated by LAURA. In: ASAP, July 2005, pp. 15–21 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Devos, H., Beyls, K., Christiaens, M., Van Campenhout, J., D’Hollander, E.H., Stroobandt, D. (2007). Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71528-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71527-6

  • Online ISBN: 978-3-540-71528-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics