Skip to main content

Array Replication to Increase Parallelism in Applications Mapped to Configurable Architectures

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Abstract

Configurable architectures, with multiple independent on-chip RAM modules, offer the unique opportunity to exploit inherent parallel memory accesses in a sequential program by not only tailoring the number and configuration of the modules in the resulting hardware design but also the accesses to them. In this paper we explore the possibility of array replication for loop computations that is beyond the reach of traditional privatization and parallelization analyses. We present a compiler analysis that identifies portions of array variables that can be temporarily replicated within the execution of a given loop iteration, enabling the concurrent execution of statements or even non-perfectly nested loops. For configurable architectures where array replication is essentially free in terms of execution time, this replication enables not only parallel execution but also reduces or even eliminates memory contention. We present preliminary experiments applying the proposed technique to hardware designs for commercially available FPGA devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, F., Burke, M., Cytron, R., Ferrante, J., Hsieh, W., Sarkar, V.: A Framework for Determining Useful Parallelism. In: Proc. Intl. Conf. Supercomputing, pp. 207–215. ACM, New York (1988)

    Google Scholar 

  2. Allen, R., Kennedy, K.: Automatic Translation of Fortran Programs to Vector Form 9(4), 491–542 (1987)

    Google Scholar 

  3. Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. In: Proc. ACM Conf. Programming Languages Design and Implementation, pp. 41–53 (1989)

    Google Scholar 

  4. Eigenmann, R., Hoeflinger, J., Li, Z., Padua, D.: Experience in the AutomaticParallelization of four Perfect Benchmark Programs. In: Banerjee, U., Nicolau, A., Gelernter, D., Padua, D.A. (eds.) LCPC 1991. LNCS, vol. 589. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  5. Goldstein, S., Schmit, H., Moe, M., Budiu, M., Cadambi, S., Taylor, R., Laufer, R.: PipeRench: a coprocessor for streaming multimedia acceleration. In: Proc. 26th Intl. Symp. Comp. Arch., pp. 28–39 (1999)

    Google Scholar 

  6. Li, Z.: Array privatization for parallel execution of loops. In: Proc. ACM Intl. Conf. Supercomputing (1992)

    Google Scholar 

  7. Mentor Graphics Inc. MonetTM (1999)

    Google Scholar 

  8. Rinard, M., Diniz, P.: Eliminating Synchronization Bottlenecks in object-based Programs using Adaptive Replication. In: Proc. Intl. Conf. Supercomputing, pp. 83–92. ACM, New York (1999)

    Chapter  Google Scholar 

  9. So, B., Hall, M., Ziegler, H.: Custom Data Layout for Memory Parallelism. In: Proc. Intl. Symp. Code Gen. Opt., March 2004, pp. 291–302 (2004)

    Google Scholar 

  10. Tseng, C.-W.: Compiler optimizations for eliminating barrier synchronization. In: Proc. Fifth Symp. Principles and Practice of Parallel Programming. ACM SIGPLAN Notices, vol. 30(8), pp. 144–155 (1995)

    Google Scholar 

  11. Tu, P., Padua, D.: Automatic Array Privatization. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)

    Google Scholar 

  12. Xilinx Inc. Virtex-II ProTM Platform FPGAs: introduction and overview, DS083- 1(v2.4.1) edn. (March 2003)

    Google Scholar 

  13. Ziegler, H., Hall, M., Diniz, P.: Compiler-generated Communication for Pipelined FPGA applications. In: Proc. 40th Design Automation Conference (June 2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ziegler, H.E., Malusare, P.L., Diniz, P.C. (2006). Array Replication to Increase Parallelism in Applications Mapped to Configurable Architectures. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69330-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69329-1

  • Online ISBN: 978-3-540-69330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics