Advertisement

Coarse-Grained Reconfigurable Array Architectures

  • Bjorn De Sutter
  • Praveen Raghavan
  • Andy Lambrechts
Chapter

Abstract

Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code.

Keywords

Finite Impulse Response Register File Single Instruction Multiple Data Very Large Scale Integration Loop Body 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahn, M., Yoon, J.W., Paek, Y., Kim, Y., Kiemb, M., Choi, K.: A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures. In: DATE ’06: Proceedings of the conference on Design, automation and test in Europe, pp. 363–368. European Design and Automation Association, 3001 Leuven, Belgium, Belgium (2006)Google Scholar
  2. 2.
    Barua, R.: Maps: a compiler-managed memory system for software-exposed architectures. Ph.D. thesis, Massachusetss Institute of Technology (2000)Google Scholar
  3. 3.
    van Berkel, K., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handheld devices. EURASIP Journal on Applied Signal Processing 2005(16), 2613–2625 (2005). DOI 10.1155/ASP.2005.2613CrossRefGoogle Scholar
  4. 4.
    Betz, V., Rose, J., Marguardt, A.: Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Publishers (1999)Google Scholar
  5. 5.
    Bondalapati, K.: Parallelizing DSP nested loops on reconfigurable architectures using data context switching. In: DAC ’01: Proceedings of the 38th annual Design Automation Conference, pp. 273–276. ACM, New York, NY, USA (2001). DOI http://doi.acm.org/ 10.1145/378239.378483CrossRefGoogle Scholar
  6. 6.
    Bougard, B., De Sutter, B., Rabou, S., Novo, D., Allam, O., Dupont, S., Van der Perre, L.: A coarse-grained array based baseband processor for 100Mbps+ software defined radio. In: DATE ’08: Proceedings of the conference on Design, automation and test in Europe, pp. 716–721. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/ 10.1145/1403375.1403549CrossRefGoogle Scholar
  7. 7.
    Bougard, B., De Sutter, B., Verkest, D., Van der Perre, L., Lauwereins, R.: A coarse-grained array accelerator for software-defined radio baseband processing. IEEE Micro 28(4), 41–50 (2008). DOI http://doi.ieeecomputersociety.org 10.1109/MM.2008.49CrossRefGoogle Scholar
  8. 8.
    Bouwens, F., Berekovic, M., Gaydadjiev, G., De Sutter, B.: Architecture enhancements for the ADRES coarse-grained reconfigurable array. In: Proc. of HiPEAC Conf. (2008)Google Scholar
  9. 9.
    Burns, G., Gruijters, P.: Flexibility tradeoffs in SoC design for low-cost SDR. Proceedings of SDR Forum Technical Conference (2003)Google Scholar
  10. 10.
    Burns, G., Gruijters, P., Huiskens, J., van Wel, A.: Reconfigurable accelerators enabling efficient SDR for low cost consumer devices. Proceedings of SDR Forum Technical Conference (2003)Google Scholar
  11. 11.
    Cardoso, J.M.P., Weinhardt, M.: XPP-VC: A C compiler with temporal partitioning for the PACT-XPP architecture. In: FPL ’02: Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications, pp. 864–874. Springer-Verlag, London, UK (2002)CrossRefGoogle Scholar
  12. 12.
    Cervero, T., Kanstein, A., López, S., De Sutter, B., Sarmiento, R., Mignolet, J.Y.: Architectural exploration of the H.264/AVC decoder onto a coarse-grain reconfigurable architecture. In: Proc. of the International Conference on Design of Circuits and Integrated Systems (2008)Google Scholar
  13. 13.
    Coons, K.E., Chen, X., Burger, D., McKinley, K.S., Kushwaha, S.K.: A spatial path scheduling algorithm for EDGE architectures. SIGPLAN Not. 41(11), 129–140 (2006). DOI http://doi.acm.org/ 10.1145/1168918.1168875CrossRefGoogle Scholar
  14. 14.
    Corporaal, H.: Microprocessor Architectures from VLIW to TTA. John Wiley (1998)Google Scholar
  15. 15.
    Cronquist, D., Franklin, P., Fisher, C., Figueroa, M., Ebeling, C.: Architecture design of reconfigurable pipelined datapaths. In: Proceedings of the Twentieth Anniversary Conference on Advanced Research in VLSI (1999)Google Scholar
  16. 16.
    De Sutter, B., Coene, P., Vander Aa, T., Mei, B.: Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays. In: LCTES ’08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, pp. 151–160. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/ 10.1145/1375657.1375678CrossRefGoogle Scholar
  17. 17.
    Derudder, V., Bougard, B., Couvreur, A., Dewilde, A., Dupont, S., Folens, L., Hollevoet, L., Naessens, F., Novo, D., Raghavan, P., Schuster, T., Stinkens, K.,Weijers, J.W., Van der Perre, L.: A 200Mbps+ 2.14nJ/b digital baseband multi processor system-on-chip for SDRs. In: Proc of VLSI Symposum (2009)Google Scholar
  18. 18.
    Ebeling, C.: Compiling for coarse-grained adaptable architectures. Tech. Rep. UW-CSE-02-06-01, University of Washington (2002)Google Scholar
  19. 19.
    Ebeling, C.: The general RaPiD architecture description. Tech. Rep. UW-CSE-02-06-02, University of Washington (2002)Google Scholar
  20. 20.
    Fisher, J., Faraboschi, P., Young, C.: Embedded Computing, A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann (2005)Google Scholar
  21. 21.
    Friedman, S., Carroll, A., Van Essen, B., Ylvisaker, B., Ebeling, C., Hauck, S.: SPR: An architecture-adaptive CGRA mapping tool. In: FPGA ’09: Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, pp. 191–200. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/ 10.1145/1508128.1508158CrossRefGoogle Scholar
  22. 22.
    Galanis, M.D., Milidonis, A., Theodoridis, G., Soudris, D., Goutis, C.E.: A method for partitioning applications in hybrid reconfigurable architectures. Design Automation for Embedded Systems 10(1), 27–47 (2006)CrossRefGoogle Scholar
  23. 23.
    Galanis, M.D., Theodoridis, G., Tragoudas, S., Goutis, C.E.: A reconfigurable coarse-grain data-path for accelerating computational intensive kernels. Journal of Circuits, Systems and Computers (JCSC) pp. 877–893 (2005)Google Scholar
  24. 24.
    Gebhart, M., Maher, B.A., Coons, K.E., Diamond, J., Gratz, P., Marino, M., Ranganathan, N., Robatmili, B., Smith, A., Burrill, J., Keckler, S.W., Burger, D., McKinley, K.S.: An evaluation of the TRIPS computer system. In: ASPLOS ’09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, pp. 1–12. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/10.1145/ 1508244.1508246CrossRefGoogle Scholar
  25. 25.
    Hartenstein, R., Herz, M., Hoffmann, T., Nageldinger, U.: Mapping applications onto reconfigurable KressArrays. In: Proceedings of the 9th International Workshop on Field Programmable Logic and Applications (1999)Google Scholar
  26. 26.
    Hartenstein, R., Herz, M., Hoffmann, T., Nageldinger, U.: Generation of design suggestions for coarse-grain reconfigurable architectures. In: Proceedings of the 10th International Workshop on Field Programmable Logic and Applications (2000)Google Scholar
  27. 27.
    Hartenstein, R., Hoffmann, T., Nageldinger, U.: Design-space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proceedings of the International Workshop - Power and Timing Modeling, Optimization and Simulation (2000)Google Scholar
  28. 28.
    Kim, Y., Kiemb, M., Park, C., Jung, J., Choi, K.: Resource sharing and pipelining in coarsegrained reconfigurable architecture for domain-specific optimization. In: DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe, pp. 12–17. IEEE Computer Society, Washington, DC, USA (2005). DOI http://dx.doi.org/ 10.1109/DATE.2005.260Google Scholar
  29. 29.
    Kim, Y., Mahapatra, R.: A new array fabric for coarse-grained reconfigurable architecture. In: Proceedings of the IEEE EuroMicro Conference on Digital System Design, pp. 584–591 (2008)Google Scholar
  30. 30.
    Kim, Y., Mahapatra, R.: Dynamic context compression for low-power coarse-grained reconfigurable architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18(1), 15–28 (2010)CrossRefGoogle Scholar
  31. 31.
    Kim, Y., Mahapatra, R., Park, I., Choi, K.: Low power reconfiguration technique for coarsegrained reconfigurable architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17(5), 593–603 (2009)CrossRefGoogle Scholar
  32. 32.
    Lam, M.S.: Software pipelining: an effective scheduling technique for VLIW machines. In: Proc. PLDI, pp. 318–327 (1988)Google Scholar
  33. 33.
    Lambrechts, A., Raghavan, P., Jayapala, M., Catthoor, F., Verkest, D.: Energy-aware interconnect optimization for a coarse grained reconfigurable processor. VLSI Design, International Conference on pp. 201–207 (2008)Google Scholar
  34. 34.
    Lee, J.e., Choi, K., Dutt, N.D.: An algorithm for mapping loops onto coarse-grained reconfigurable architectures. In: LCTES ’03: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, pp. 183–188. ACM, New York, NY, USA (2003). DOI http://doi.acm.org/ 10.1145/780732.780758CrossRefGoogle Scholar
  35. 35.
    Lee, L.H., Moyer, B., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: ISLPED ’99: Proceedings of the 1999 international symposium on Low power electronics and design, pp. 267–269. ACM, New York, NY, USA (1999). DOI http://doi.acm.org/ 10.1145/313817.313944CrossRefGoogle Scholar
  36. 36.
    Lee, M.H., Singh, H., Lu, G., Bagherzadeh, N., Kurdahi, F.J., Filho, E.M.C., Alves, V.C.: Design and implementation of theMorphoSys reconfigurable computing processor. J. VLSI Signal Process. Syst. 24(2/3), 147–164 (2000). DOI http://dx.doi.org/ 10.1023/A:1008189221436CrossRefGoogle Scholar
  37. 37.
    Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., Bringmann, R.A.: Effective compiler support for predicated execution using the hyperblock. In: MICRO 25: Proceedings of the 25th annual international symposium on Microarchitecture, pp. 45–54. IEEE Computer Society Press, Los Alamitos, CA, USA (1992). DOI http://doi.acm.org/ 10.1145/144953.144998CrossRefGoogle Scholar
  38. 38.
    Mei, B., De Sutter, B., Vander Aa, T., Wouters, M., Kanstein, A., Dupont, S.: Implementation of a coarse-grained reconfigurable media processor for AVC decoder. Journal of Signal Processing Systems 51(3), 225–243 (2008)MATHCrossRefGoogle Scholar
  39. 39.
    Mei, B., Lambrechts, A., Verkest, D.,Mignolet, J.Y., Lauwereins, R.: Architecture exploration for a reconfigurable architecture template. IEEE Design and Test of Computers 22(2), 90–101 (2005)CrossRefGoogle Scholar
  40. 40.
    Mei, B., Vernalde, S., Verkest, D., De Man, H., Lauwereins, R.: ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In: Proc. of Field-Programmable Logic and Applications, pp. 61–70 (2003)Google Scholar
  41. 41.
    Mei, B., Vernalde, S., Verkest, D., De Man, H., Lauwereins, R.: Exploiting loop-level parallelism for coarse-grained reconfigurable architecture using modulo scheduling. IEE Proceedings: Computer and Digital Techniques 150(5) (2003)MATHGoogle Scholar
  42. 42.
    Mei, B., Vernalde, S., Verkest, D., Lauwereins, R.: Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: A case study. In: Proc. of Design, Automation and Test in Europe (DATE), pp. 1224–1229 (2004)Google Scholar
  43. 43.
    Novo, D., Schuster, T., Bougard, B., Lambrechts, A., Van der Perre, L., Catthoor, F.: Energyperformance exploration of a CGA-based SDR processor. Journal of Signal Processing Systems (2008)Google Scholar
  44. 44.
    Oh, T., Egger, B., Park, H., Mahlke, S.: Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In: LCTES ’09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pp. 21–30. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/ 10.1145/1542452.1542456CrossRefGoogle Scholar
  45. 45.
    PACT XPP Technologies: XPP-III Processor Overview White Paper (2006)Google Scholar
  46. 46.
    Park, H., Fan, K., Kudlur, M., Mahlke, S.: Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In: CASES ’06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pp. 136–146. ACM, New York, NY, USA (2006). DOI http://doi.acm.org/ 10.1145/1176760.1176778CrossRefGoogle Scholar
  47. 47.
    Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.S.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: PACT ’08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 166–176. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/ 10.1145/1454115.1454140CrossRefGoogle Scholar
  48. 48.
    Petkov, N.: Systolic Parallel Processing. North Holland Publishing (1992)Google Scholar
  49. 49.
    Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D., Corporaal, H.: Very wide register: An asymmetric register file organization for low power embedded processors. In: DATE ’07: Proceedings of the conference on Design, Automation and Test in Europe (2007)Google Scholar
  50. 50.
    Rau, B.R.: Iterative modulo scheduling. Tech. rep., Hewlett-Packard Lab: HPL-94-115 (1995)Google Scholar
  51. 51.
    Rau, B.R., Lee, M., Tirumalai, P.P., Schlansker, M.S.: Register allocation for software pipelined loops. In: PLDI ’92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, pp. 283–299 (1992)Google Scholar
  52. 52.
    Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. SIGARCH Comput. Archit. News 31(2), 422–433 (2003). DOI http://doi.acm.org/ 10.1145/871656.859667CrossRefGoogle Scholar
  53. 53.
    Scarpazza, D.P., Raghavan, P., Novo, D., Catthoor, F., Verkest, D.: Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and datalevel parallelism. In: Proceedings of the 16th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 107–116 (2006)Google Scholar
  54. 54.
    Schlansker, M., Mahlke, S., Johnson, R.: Control CPR: a branch height reduction optimization for EPIC architectures. SIGPLAN Not. 34(5), 155–168 (1999). DOI http://doi.acm.org/ 10.1145/301631.301659CrossRefGoogle Scholar
  55. 55.
    Shen, J., Lipasti, M.: Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill (2005)Google Scholar
  56. 56.
    Silicon Hive: HiveCC Databrief (2006)Google Scholar
  57. 57.
    Sudarsanam, A.: Code optimization libraries for retargetable compilation for embedded digital signal processors. Ph.D. thesis, Princeton University (1998)Google Scholar
  58. 58.
    Taylor, M., Kim, J., Miller, J., Wentzla, D., Ghodrat, F., Greenwald, B., Ho, H., Lee, M., Johnson, P., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Frank, V., Amarasinghe, S., Agarwal, A.: The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro 22(2), 25–35 (2002)CrossRefGoogle Scholar
  59. 59.
    Texas Instruments: TMS320C64x Technical Overview (2001)Google Scholar
  60. 60.
    Venkataramani, G., Najjar, W., Kurdahi, F., Bagherzadeh, N., Bohm, W., Hammes, J.: Automatic compilation to a coarse-grained reconfigurable system-on-chip. ACM Trans. Embed. Comput. Syst. 2(4), 560–589 (2003). DOI http://doi.acm.org/ 10.1145/950162.950167CrossRefGoogle Scholar
  61. 61.
    van de Waerdt, J.W., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J.P., Amirtharaj, D., Kalra, K., Rodriguez, P., van Antwerpen, H.: The TM3270 media-processor. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pp. 331–342. IEEE Computer Society, Washington, DC, USA (2005). DOI http://dx.doi.org/ 10.1109/MICRO.2005.35Google Scholar
  62. 62.
    Woh, M., Lin, Y., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., Bruce, R., Kershaw, D., Reid, A., Wilder, M., Flautner, K.: From SODA to scotch: The evolution of a wireless baseband processor. In: MICRO ’08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, pp. 152–163. IEEE Computer Society, Washington, DC, USA (2008). DOI http://dx.doi.org/ 10.1109/MICRO.2008.4771787CrossRefGoogle Scholar
  63. 63.
    Programming XPP-III Processors White Paper (2006)Google Scholar
  64. 64.
    Yoon, J., Ahn, M., Paek, Y., Kim, Y., Choi, K.: Temporal mapping for loop pipelining on a MIMD-style coarse-grained reconfigurable architecture. In: Proc. International SoC Design Conference (2006)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Bjorn De Sutter
    • 1
    • 2
  • Praveen Raghavan
    • 3
  • Andy Lambrechts
    • 3
  1. 1.Ghent UniversityGentBelgium
  2. 2.Vrije Universiteit BrusselBrusselBelgium
  3. 3.IMECHeverleeBelgium

Personalised recommendations