A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

  • Antonio Roldao Lopes
  • George A. Constantinides
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4943)


As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n 2) cycles for a software implementation to Θ(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup.


Conjugate Gradient Clock Cycle Conjugate Gradient Method Conjugate Gradient Algorithm Sustained Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hestenes, M., Stiefel, E.: Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49(6), 409–436 (1952)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Wright, S.: Parallel Algorithms for Banded Linear Systems. SIAM Journal on Scientific and Statistical Computing 12(4), 824–842 (1991)zbMATHCrossRefGoogle Scholar
  3. 3.
    Biglieri, E., Calderbank, R., Constantinides, A., Goldsmith, A., Paulraj, A.: MIMO Wireless Communications. Cambridge University Press, Cambridge (2007)Google Scholar
  4. 4.
    Wright, S.: Interior Point Methods for Optimal Control of Discrete Time Systems. Journal of Optimization Theory and Applications 77(1), 161–187 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Cray, XD1 Datasheet (2005) (Accessed on 2/03/2007)
  6. 6.
    SGI, RASC RC100 Blade (2006) (Accessed on 2/03/2007)
  7. 7.
    Zhuo, L., Prasanna, V.K.: High Performance Linear Algebra Operations on Reconfigurable Systems. In: Proc. of SuperComputing, pp. 12–18 (2005)Google Scholar
  8. 8.
    Underwood, K.: FPGAs vs. CPUs: Trends in Peak Floating-Point Performance. In: Proc. ACM. Int. Symp. on Field-Programmable Gate Arrays, pp. 171–180 (2004)Google Scholar
  9. 9.
    Haridas, S., Ziavras, S.: FPGA Implementation of a Cholesky Algorithm for a Shared-Memory Multiprocessor Architecture. Journal of Parallel Algorithms and Applications 19(6), 411–226 (2004)Google Scholar
  10. 10.
    Morris, G., Prasanna, V.: An FPGA-Based Floating-Point Jacobi Iterative Solver. In: Proc. of the 8th International Symposium on Parallel Architectures, Algorithms and Networks, pp. 420–427 (2005)Google Scholar
  11. 11.
    Maslennikow, V.L.O., Sergyienko, A.: FPGA Implementation of the Conjugate Gradient Method. In: Proc. Parallel Processing and Applied Mathematics, pp. 526–533 (2005)Google Scholar
  12. 12.
    Callanan, A.N.O., Gregg, D., Peardon, M.: High Performance Scientific Computing Using FPGAs with IEEE Floating Point and Logarithmic Arithmetic For Lattice QCD. In: Proc. of Field Programmable Logic and Applications, pp. 29–35 (2006)Google Scholar
  13. 13.
    IEEE, 754 Standard for Binary Floating-Point Arithmetic (1985) (Accessed on 18/03/2007),
  14. 14.
    Shewchuk, J.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Edition 1\(\frac{1}{4}\) (2003) (Accessed on 28/02/2007),
  15. 15.
    Meurant, G.: The Lanczos and Conjugate Gradient Algorithms from theory to Finite Precision Computation, SIAM, 323–324 (2006)Google Scholar
  16. 16.
    Xilinx, DS100 (v3.0) Virtex5 Family Overview - LX , LXT, and SXT Platforms (2007) (Accessed on 1/03/2007),
  17. 17.
    Dongarra, J.: Performance of Various Computers Using Standard Linear Equations Software (2007) (Accessed on 15/03/2007),
  18. 18.
    Bhatt, A.: PCI-Express - Creating a Third Generation I/O Interconnect (2007) (Accessed on 19/06/2007),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Antonio Roldao Lopes
    • 1
  • George A. Constantinides
    • 1
  1. 1.Electrical & Electronic EngineeringImperial College LondonLondonEngland

Personalised recommendations