Abstract
As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n 2) cycles for a software implementation to Θ(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup.
The authors would like to acknowledge the support of the EPSRC (Grant EP/C549481/1 and EP/E00024X/1) and the support of Dr. Eric Kerrigan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hestenes, M., Stiefel, E.: Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49(6), 409–436 (1952)
Wright, S.: Parallel Algorithms for Banded Linear Systems. SIAM Journal on Scientific and Statistical Computing 12(4), 824–842 (1991)
Biglieri, E., Calderbank, R., Constantinides, A., Goldsmith, A., Paulraj, A.: MIMO Wireless Communications. Cambridge University Press, Cambridge (2007)
Wright, S.: Interior Point Methods for Optimal Control of Discrete Time Systems. Journal of Optimization Theory and Applications 77(1), 161–187 (1993)
Cray, XD1 Datasheet (2005) (Accessed on 2/03/2007) http://www.cray.com/downloads/Cray-_XD1_Datasheet.pdf
SGI, RASC RC100 Blade (2006) (Accessed on 2/03/2007) http://www.sgi.com/-pdfs/3920.pdf
Zhuo, L., Prasanna, V.K.: High Performance Linear Algebra Operations on Reconfigurable Systems. In: Proc. of SuperComputing, pp. 12–18 (2005)
Underwood, K.: FPGAs vs. CPUs: Trends in Peak Floating-Point Performance. In: Proc. ACM. Int. Symp. on Field-Programmable Gate Arrays, pp. 171–180 (2004)
Haridas, S., Ziavras, S.: FPGA Implementation of a Cholesky Algorithm for a Shared-Memory Multiprocessor Architecture. Journal of Parallel Algorithms and Applications 19(6), 411–226 (2004)
Morris, G., Prasanna, V.: An FPGA-Based Floating-Point Jacobi Iterative Solver. In: Proc. of the 8th International Symposium on Parallel Architectures, Algorithms and Networks, pp. 420–427 (2005)
Maslennikow, V.L.O., Sergyienko, A.: FPGA Implementation of the Conjugate Gradient Method. In: Proc. Parallel Processing and Applied Mathematics, pp. 526–533 (2005)
Callanan, A.N.O., Gregg, D., Peardon, M.: High Performance Scientific Computing Using FPGAs with IEEE Floating Point and Logarithmic Arithmetic For Lattice QCD. In: Proc. of Field Programmable Logic and Applications, pp. 29–35 (2006)
IEEE, 754 Standard for Binary Floating-Point Arithmetic (1985) (Accessed on 18/03/2007), http://grouper.ieee.org/groups/754/
Shewchuk, J.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Edition 1\(\frac{1}{4}\) (2003) (Accessed on 28/02/2007), http://www.cs.cmu.edu/~jrs/+jrspapers.html#cg
Meurant, G.: The Lanczos and Conjugate Gradient Algorithms from theory to Finite Precision Computation, SIAM, 323–324 (2006)
Xilinx, DS100 (v3.0) Virtex5 Family Overview - LX , LXT, and SXT Platforms (2007) (Accessed on 1/03/2007), http://direct.xilinx.com/bvdocs/publications/ds100.pdf
Dongarra, J.: Performance of Various Computers Using Standard Linear Equations Software (2007) (Accessed on 15/03/2007), http://www.netlib.org/benchmark/performance.ps
Bhatt, A.: PCI-Express - Creating a Third Generation I/O Interconnect (2007) (Accessed on 19/06/2007), http://www.intel.com/technology/pciexpress/devnet/docs/WhatisPCIExpress.pdf
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lopes, A.R., Constantinides, G.A. (2008). A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2008. Lecture Notes in Computer Science, vol 4943. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78610-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-78610-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78609-2
Online ISBN: 978-3-540-78610-8
eBook Packages: Computer ScienceComputer Science (R0)