Abstract
Configurable architectures, with multiple independent on-chip RAM modules, offer the unique opportunity to exploit inherent parallel memory accesses in a sequential program by not only tailoring the number and configuration of the modules in the resulting hardware design but also the accesses to them. In this paper we explore the possibility of array replication for loop computations that is beyond the reach of traditional privatization and parallelization analyses. We present a compiler analysis that identifies portions of array variables that can be temporarily replicated within the execution of a given loop iteration, enabling the concurrent execution of statements or even non-perfectly nested loops. For configurable architectures where array replication is essentially free in terms of execution time, this replication enables not only parallel execution but also reduces or even eliminates memory contention. We present preliminary experiments applying the proposed technique to hardware designs for commercially available FPGA devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, F., Burke, M., Cytron, R., Ferrante, J., Hsieh, W., Sarkar, V.: A Framework for Determining Useful Parallelism. In: Proc. Intl. Conf. Supercomputing, pp. 207–215. ACM, New York (1988)
Allen, R., Kennedy, K.: Automatic Translation of Fortran Programs to Vector Form 9(4), 491–542 (1987)
Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. In: Proc. ACM Conf. Programming Languages Design and Implementation, pp. 41–53 (1989)
Eigenmann, R., Hoeflinger, J., Li, Z., Padua, D.: Experience in the AutomaticParallelization of four Perfect Benchmark Programs. In: Banerjee, U., Nicolau, A., Gelernter, D., Padua, D.A. (eds.) LCPC 1991. LNCS, vol. 589. Springer, Heidelberg (1992)
Goldstein, S., Schmit, H., Moe, M., Budiu, M., Cadambi, S., Taylor, R., Laufer, R.: PipeRench: a coprocessor for streaming multimedia acceleration. In: Proc. 26th Intl. Symp. Comp. Arch., pp. 28–39 (1999)
Li, Z.: Array privatization for parallel execution of loops. In: Proc. ACM Intl. Conf. Supercomputing (1992)
Mentor Graphics Inc. MonetTM (1999)
Rinard, M., Diniz, P.: Eliminating Synchronization Bottlenecks in object-based Programs using Adaptive Replication. In: Proc. Intl. Conf. Supercomputing, pp. 83–92. ACM, New York (1999)
So, B., Hall, M., Ziegler, H.: Custom Data Layout for Memory Parallelism. In: Proc. Intl. Symp. Code Gen. Opt., March 2004, pp. 291–302 (2004)
Tseng, C.-W.: Compiler optimizations for eliminating barrier synchronization. In: Proc. Fifth Symp. Principles and Practice of Parallel Programming. ACM SIGPLAN Notices, vol. 30(8), pp. 144–155 (1995)
Tu, P., Padua, D.: Automatic Array Privatization. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)
Xilinx Inc. Virtex-II ProTM Platform FPGAs: introduction and overview, DS083- 1(v2.4.1) edn. (March 2003)
Ziegler, H., Hall, M., Diniz, P.: Compiler-generated Communication for Pipelined FPGA applications. In: Proc. 40th Design Automation Conference (June 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ziegler, H.E., Malusare, P.L., Diniz, P.C. (2006). Array Replication to Increase Parallelism in Applications Mapped to Configurable Architectures. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-69330-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)