Skip to main content

Domain-Specific Language and Compiler for Stencil Computation on FPGA-Based Systolic Computational-Memory Array

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7199))

Abstract

This paper presents a domain-specific language for stencil computation (DSLSC) and its compiler for our FPGA-based systolic computational-memory array (SCMA). In DSLSC, we can program stencil computations by describing their mathematical form instead of writing explicit procedure optimally. The compiler automatically parallelizes stencil computations for processing elements (PEs) of SCMA, and schedules multiply-and-add operations for PEs considering data-reference delay via a local memory or communication FIFOs between PEs. For arbitrary grid-sizes of 2D Jacobi compilation with 3x3 and 5x5 stencils, the compiler achieves high utilization of PEs, 85.6 % and 92.18 %, which are close to 87.5 % and 93.75 % for ideal cases, respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boost C++ Library, http://www.boost.org

  2. Chamberlain, B.L., Snyder, L.: Array language support for parallel sparse computation. In: Proceedings of the 15th International Conference on Supercomputing, pp. 133–145 (June 2001)

    Google Scholar 

  3. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–12 (November 2008)

    Google Scholar 

  4. Elliott, D.G., Stumm, M., Snelgrove, W., Cojocaru, C., Mckenzie, R.: Computational ram: Implementing processors in memory. Design & Test of Computers 16(1), 32–41 (1999)

    Article  Google Scholar 

  5. Ferziger, J.H., Perić, M.: Computational Methods for Fluid Dynamics. Springer, Heidelberg (1996)

    Book  MATH  Google Scholar 

  6. Hageman, L.A., Young, D.M.: Applied Iterative Methods. Academic Press (1981)

    Google Scholar 

  7. Kung, H.T.: Why systolic architecture? Computer 15(1), 37–46 (1982)

    Article  Google Scholar 

  8. Luzhou, W., Sano, K., Yamamoto, S.: Local-and-global stall mechanism for systolic computational-memory array on extensible multi-fpga system. In: Proceedings of the International Conference on Field-Programmable Technology (FPT 2010), pp. 102–109 (December 2010)

    Google Scholar 

  9. Mycroft, D.O.A.: Efficient and correct stencil computation via pattern matching and static typing. In: Proceedings of IFIP Working Conference on Domain-Specific Languages (September 2011) (to appear)

    Google Scholar 

  10. Sano, K., Iizuka, T., Yamamoto, S.: Systolic architecture for computational fluid dynamics on FPGAs. In: Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 107–116 (April 2007)

    Google Scholar 

  11. Sano, K., Luzhou, W., Hatsuda, Y., Iizuka, T., Yamamoto, S.: FPGA-array with bandwidth-reduction mechanism for scalable and power-efficient numerical simulations based on finite difference methods. ACM Transactions on Reconfigurable Technology and Systems 3(4) (November 2010), doi:10.1145/1862648.1862651

    Google Scholar 

  12. Tang, Y., Chowdhury, R., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: Proceedings of the 23th ACM Symposium on Parallelism in Algorithms and Architectures (June 2011)

    Google Scholar 

  13. Teich, J., Thiele, L.: Partitioning processor arrays under resource constrains. Journal of VLSI Signal Processing 17, 5–20 (1997)

    Article  MATH  Google Scholar 

  14. Underwood, K.D., Hemmert, K.S.: Closing the gap: CPU and FPGA trends in sustainable floating-point blas performance. In: Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 219–228 (2004)

    Google Scholar 

  15. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  16. Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452–471 (1991)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luzhou, W., Sano, K., Yamamoto, S. (2012). Domain-Specific Language and Compiler for Stencil Computation on FPGA-Based Systolic Computational-Memory Array. In: Choy, O.C.S., Cheung, R.C.C., Athanas, P., Sano, K. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2012. Lecture Notes in Computer Science, vol 7199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28365-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28365-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28364-2

  • Online ISBN: 978-3-642-28365-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics