Skip to main content

Lookahead Memory Prefetching for CGRAs Using Partial Loop Unrolling

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10824))

Included in the following conference series:

Abstract

Coarse Grained Reconfigurable Arrays have become an established approach to provide high computational performance in various environments. Several researchers have found that the achievable performance highly depends on the interface between memory and CGRA. In this contribution we show that a smart prefetching mechanism can increase the performance of the CGRA. At the same time it consumes less hardware resources and energy as state of the art prefetching mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Values are based on our FPGA implementation of the System (work in progress).

  2. 2.

    Also called synthesis in previous publications.

  3. 3.

    Simply setting \(f=p=0\) and increasing u will result in a worse performance because high u decrease performance as shown in [9].

  4. 4.

    Note that the number of contexts does not directly correlate to the runtime, because some contexts are executed more often as they are part of inner loops or even different kernels.

References

  1. Archibald, J., Baer, J.L.: Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans. Comput. Syst. 4(4), 273–298 (1986)

    Article  Google Scholar 

  2. Cong, J., Huang, H., Ma, C., Xiao, B., Zhou, P.: A fully pipelined and dynamically composable architecture of CGRA. In: 2014 FCCM, pp. 9–16, May 2014

    Google Scholar 

  3. Dahlgren, F., Stenstrom, P.: Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. TPDS 7(4), 385–398 (1996)

    Google Scholar 

  4. Fuchs, A., Mannor, S., Weiser, U., Etsion, Y.: Loop-aware memory prefetching using code block working sets. In: 2014 MICRO, pp. 533–544, December 2014

    Google Scholar 

  5. Gatzka, S., Hochberger, C.: The AMIDAR class of reconfigurable processors. J. Supercomput. 32(2), 163–181 (2005)

    Article  Google Scholar 

  6. Gatzka, S., Hochberger, C.: Hardware based online profiling in AMIDAR processors. In: IPDPS, p. 144b (2005)

    Google Scholar 

  7. Hashemi, M., Mutlu, O., Patt, Y.N.: Continuous runahead: transparent hardware acceleration for memory intensive workloads. In: 2016 MICRO, pp. 1–12, October 2016

    Google Scholar 

  8. Hoy, C.H., Govindarajuz, V., Nowatzki, T., Nagaraju, R., Marzec, Z., Agarwal, P., Frericks, C., Cofell, R., Sankaralingam, K.: Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation. In: 2015 ISPASS, pp. 203–214, March 2015

    Google Scholar 

  9. Jung, L.J., Hochberger, C.: Feasibility of high level compiler optimizations in online synthesis. In: 2015 ReConFig, pp. 1–7, December 2015

    Google Scholar 

  10. Jung, L.J., Hochberger, C.: Optimal processor interface for CGRA-based accelerators implemented on FPGAs. In: 2016 ReConFig, pp. 1–7, November 2016

    Google Scholar 

  11. Lee, H., Nguyen, D., Lee, J.: Optimizing stream program performance on CGRA-based systems. In: Proceedings of the 52nd DAC, DAC 2015, pp. 110:1–110:6. ACM, New York (2015)

    Google Scholar 

  12. Prabhakar, R., Zhang, Y., Koeplinger, D., Feldman, M., Zhao, T., Hadjis, S., Pedram, A., Kozyrakis, C., Olukotun, K.: Plasticine: a reconfigurable architecture for parallel paterns. In: Proceedings of the 44th ISCA, ISCA 2017, pp. 389–402. ACM, New York (2017)

    Google Scholar 

  13. Ruschke, T., Jung, L.J., Wolf, D., Hochberger, C.: Scheduler for inhomogeneous and irregular CGRAs with support for complex control flow. In: 2016 IPDPSW, pp. 198–207, May 2016

    Google Scholar 

  14. Vahid, F., Stitt, G., Lysecky, R.: Warp processing: dynamic translation of binaries to FPGA circuits. Computer 41(7), 40–46 (2008)

    Article  Google Scholar 

  15. Veredas, F.J., Scheppler, M., Moffat, W., Mei, B.: Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes. In: FPL 2005, pp. 106–111, August 2005

    Google Scholar 

  16. Yang, C., Liu, L., Yin, S., Wei, S.: Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays. In: 2016 53nd DAC, pp. 1–6, June 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukas Johannes Jung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jung, L.J., Hochberger, C. (2018). Lookahead Memory Prefetching for CGRAs Using Partial Loop Unrolling. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78890-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78889-0

  • Online ISBN: 978-3-319-78890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics