Performance Modeling of Pipelined Linear Algebra Architectures on FPGAs

  • Sam Skalicky
  • Sonia López
  • Marcin Łukowiak
  • James Letendre
  • Matthew Ryan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7806)


The potential design space of FPGA accelerators is very large. The factors that define the performance of a particular implementation include the architecture design, number of pipelines, and memory bandwidth. In this paper we present a mathematical model that, based on these factors, predicts the computation time of pipelined FPGA accelerators. This model can be used to quickly explore the design space without any implementation or simulation. We evaluate the model, its usefulness, and ability to identify the bottlenecks and improve performance. Being the core of many compute-intensive applications, linear algebra computations are the main contributors to their total execution time. Hence, five relevant linear algebra computations are selected, analyzed, and the accuracy of the model is validated against implemented designs.


Linear Algebra Matrix Inverse Systolic Array Memory Bandwidth Total Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wang, L., et al.: Physiological-model-constrained noninvasive reconstruction of volumetric myocardial transmembrane potentials. IEEE TBME 57(2) (2010)Google Scholar
  2. 2.
    Bengtsson, T., et al.: Adaptive methods in numerical weather prediction. In: First Spanish Workshop on SpatioTemporal Modeling of Environmental Processes (2001)Google Scholar
  3. 3.
    Zhuo, L., et al.: High-performance designs for linear algebra operations on reconfigurable hardware. IEEE Transactions on Computers 57(8) (2008)Google Scholar
  4. 4.
    Sotiropoulos, I., et al.: A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems. In: Field Programmable Logic and Applications (FPL) (2009)Google Scholar
  5. 5.
    Edman, F., et al.: Implementation of a highly scalable architecture for fast inversion of triangular matrices. In: IEEE Intl. Conf. on Electronics, Circuits and Systems (ICECS) (2003)Google Scholar
  6. 6.
    Yang, D., et al.: Compressed sensing and cholesky decomposition on FPGAs and GPUs. Parallel Computing 38(8) (2012)Google Scholar
  7. 7.
    Dongarra, J.J., et al.: A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software (TOMS) 16(1) (1990)Google Scholar
  8. 8.
    Anderson, E., et al.: LAPACK Users’ guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)CrossRefGoogle Scholar
  9. 9.
    nVidia Corporation, CUDA toolkit 4.2 CUBLAS library documentation (2012)Google Scholar
  10. 10.
    U. of Tenessee Innovative Comp. Lab. Matrix algebra on GPU and multicore architectures, Knoxville, TN, USA (2012),
  11. 11.
    Akella, S., et al.: Sparse matrix-vector multiplication kernel on a reconfigurable computer. In: 2005 IEEE High Performance Extreme Computing Conference (HPEC 2005) (2005)Google Scholar
  12. 12.
    Lin, C.Y., et al.: Design space exploration for sparse matrix-matrix multiplication on FPGAs. In: Field-Programmable Technology (FPT) (2010)Google Scholar
  13. 13.
    Lin, C.Y., et al.: A model for matrix multiplication performance on FPGAs. In: Field Programmable Logic and Applications (FPL) (2011)Google Scholar
  14. 14.
    Prasanna, V.K., et al.: Sparse matrix computations on reconfigurable hardware. Computer 40(3) (2007)Google Scholar
  15. 15.
    Holland, B., et al.: RAT: RC amenability test for rapid performance prediction. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 1(4) (2009)Google Scholar
  16. 16.
    Holland, B., et al.: An analytical model for multilevel performance prediction of multi-FPGA systems. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 4(3) (2011)Google Scholar
  17. 17.
    Xilinx Inc., Memory Interface Generator - White Paper WP260, (February 2007)Google Scholar
  18. 18.
    Xilinx Inc., Xilinx Virtex 6 Memory Interface Solutions - User Guide UG406 (June 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sam Skalicky
    • 1
  • Sonia López
    • 1
  • Marcin Łukowiak
    • 1
  • James Letendre
    • 1
  • Matthew Ryan
    • 1
  1. 1.Rochester Institute of TechnologyRochesterUSA

Personalised recommendations