Skip to main content

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

  • Conference paper
Computational Science and Its Applications – ICCSA 2014 (ICCSA 2014)

Abstract

In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intel’s MKL. As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising in model reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)

    Google Scholar 

  2. Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990)

    Google Scholar 

  3. The Top500 list (2013), http://www.top500.org

  4. Kirk, D., Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, 2nd edn. Morgan Kaufmann (2012)

    Google Scholar 

  5. Farber, R.: CUDA application design and development. Morgan Kaufmann (2011)

    Google Scholar 

  6. Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley (2008)

    Google Scholar 

  7. Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 739–748. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU–GPU platforms with applications in control theory. Concurrency and Computation: Practice and Experience 25, 1170–1182 (2013)

    Article  Google Scholar 

  9. Penzl, T.: LYAPACK: A MATLAB toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear-quadratic optimal control problems. User’s guide, version 1.0. (2000), http://www.netlib.org/lyapack/guide.pdf

  10. Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998)

    Google Scholar 

  11. Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)

    Google Scholar 

  12. Penzl, T.: A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci. Comput. 21, 1401–1418 (1999)

    Article  MathSciNet  Google Scholar 

  13. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, ACM 1969, pp. 157–172. ACM, New York (1969)

    Chapter  Google Scholar 

  14. IMTEK (Oberwolfach model reduction benchmark collection), http://www.imtek.de/simulation/benchmark/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Benner, P., Dufrechou, E., Ezzatti, P., Igounet, P., Quintana-Ortí, E.S., Remón, A. (2014). Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8584. Springer, Cham. https://doi.org/10.1007/978-3-319-09153-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09153-2_29

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09152-5

  • Online ISBN: 978-3-319-09153-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics