Time and Energy Efficient Matrix Factorization Using FPGAs

  • Seonil Choi
  • Viktor K. Prasanna
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2778)


In this paper, new algorithms and architectures for matrix factorization are presented. Two fully-parallel and block-based designs for LU decomposition on configurable devices are proposed. A linear array architecture is employed to minimize the usage of long interconnects, leading to lower energy dissipation. The designs are made scalable by using a fixed I/O bandwidth independent of the problem size. High level models for energy profiling are built and the energy performance of many possible designs is predicted. Through the analysis of design tradeoffs, the block size that minimizes the total energy dissipation is identified. A set of candidate designs was implemented on the Xilinx Virtex-II to verify the estimates. Also, the performance of our designs is compared with that of state-of-the-art DSP based designs and with the performance of designs obtained using a state-of-the-art commercial compilation tool such as Celoxica DK1. Our designs on the FPGAs are significantly more time and energy efficient in both cases.


Block Size Matrix Factorization Field Programmable Gate Array Candidate Design Total Latency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altera Corporation (2002),
  2. 2.
    Becker, J., Pionteck, T., Glesner, M.: DReAM: A Dynamically Reconfigurable Architecture for Future Mobile Communication Applications. In: Glesner, M., Zipf, P., Renovell, M. (eds.) FPL 2002. LNCS, vol. 2438, Springer, Heidelberg (2002)Google Scholar
  3. 3.
    Casseau, E., Degrugillier, D.: A Linear Systolic Array for LU Decomposition. VLSI Design (1994)Google Scholar
  4. 4.
    Celoxica Corporation. DK1.1 Design Suite (2003),
  5. 5.
    Choi, J., Dongarra, J.J., Ostrouchov, L.S., Petitet, A.P., Walker, D.W., Whaley, R.C.: The Design and Implementation of the Scalapack LU, QR, and Cholesky Factorization Routines. Scientific Programming 5, 173–184 (1996)Google Scholar
  6. 6.
    Choi, S., Jang, J., Mohanty, S., Prasanna, V.K.: Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures. In: ERSA (2002)Google Scholar
  7. 7.
    Choi, S., Scrofano, R., Prasanna, V.K., Jang, J.-W.: Energy Efficient Signal Processing using FPGAs. In: Field Programmable Gate Array (2003)Google Scholar
  8. 8.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. McGraw-Hill, New York (2001)zbMATHGoogle Scholar
  9. 9.
    Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, Englewood Cliffs (2002)zbMATHGoogle Scholar
  10. 10.
    Shang, L., Kaviani, A., Bathala, K.: Dynamic Power Consumption in Virtex-II FPGA Family. In: Field Programmable Gate Arrays (2001)Google Scholar
  11. 11.
    Shirazi, N., Walters, A., Athanas, P.: Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines. In: FCCM (1995)Google Scholar
  12. 12.
    Styles, H., Luk, W.: Customising Graphics Application: Techniques and Programming Interface. In: Grünbacher, H., Hartenstein, R.W. (eds.) FPL 2000. LNCS, vol. 1896, Springer, Heidelberg (2000)Google Scholar
  13. 13.
    Texas Instruments. TMS320C64xx Power Consumption Summary,
  14. 14.
    Tuttlebee, W.: Software Defined Radio: Enabling Technologies. J. Wiley, Chichester (2002)CrossRefGoogle Scholar
  15. 15.
    Xilinx Incorporated,

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Seonil Choi
    • 1
  • Viktor K. Prasanna
    • 1
  1. 1.Electrical Engineering-SystemsUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations