A Parallel Solving Algorithm on GPU for the Time-Domain Linear System with Diagonal Sparse Matrices

  • Yifei Xia
  • Jiaquan GaoEmail author
  • Guixia He
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 911)


For the time-domain linear system with diagonal sparse matrices, based on the popular preconditioned generalized minimum residual method (GMRES), we proposed an efficient solving algorithm on the graphics processing unit (GPU), which is called T-GMRES. In the proposed T-GMRES, three are the following novelties: (1) a new sparse storage format BRCSD is presented to alleviate the drawback of the diagonal format (DIA) that a large number of zeros are filled to maintain the diagonal structure when many diagonals are far away from the main diagonal; (2) an efficient sparse matrix-vector multiplication on GPU for BRCSD is proposed; and (3) for assembling the sparse matrix for BRCSD and the vector efficiently on GPU, a new kernel is suggested. The experimental results have validated the high efficiency and good performance of our proposed algorithm.


Linear system Diagonal sparse matrices Parallel solving algorithm CUDA GPU 


  1. 1.
    Qin, M., Wang, Y.: Structure-Preserving Algorithm of Partial Differential Equations. Zhejiang Science and Technology Press, Hangzhou (2011)Google Scholar
  2. 2.
  3. 3.
    Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings Conference on High Performance Computing Networking, Storage and Analysis (SC 2009), pp. 14–19. ACM, New York (2009)Google Scholar
  4. 4.
    Saad, Y.: Iterative Methods for Sparse Linear Systems, second version. SIAM, Philadelphia, PA (2003)Google Scholar
  5. 5.
    Couturier, R., Domas, S.: Sparse systems solving on GPUs with GMRES. J. Supercomput. 59(3), 1504–1516 (2012)CrossRefGoogle Scholar
  6. 6.
    Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomput. 63(2), 443–466 (2013)CrossRefGoogle Scholar
  7. 7.
    Yang, B., Liu, H., Chen, Z.: Preconditioned GMRES solver on multiple-GPU architecture. Comput. Math. Appl. 72(4), 1076–1095 (2016)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gao, J., Wu, K., Wang, Y., Qi, P., He, G.: GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell’s equations. Int. J. Comput. Math. 94(10), 2122–2144 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium Principles and Practice of Parallel Programming (PPoPP 2010), pp. 9–14. ACM, Bangalore (2010)Google Scholar
  10. 10.
    Yan, S., Li, C., Zhang, Y.: yaSpMV: Yet another SpMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium Principles and Practice of Parallel Programming (PPoPP 2014), pp. 107–118. ACM, New York (2014)Google Scholar
  11. 11.
    Kreutzer, M., Hager, G., Wellein, G.: A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide simd units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)CrossRefGoogle Scholar
  12. 12.
    Gao, J., Liang, R., Wang, J.: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J. Parallel Distr. Comput. 74(2), 2088–2098 (2014)CrossRefGoogle Scholar
  13. 13.
    Filippone, S., Cardellini, V., Barbieri, D.: Sparse matrix-vector multiplication on GPGPUs. ACM Trans. Math. Software 43(4), 30 (2017)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gao, J., Wang, Y., Wang, J.: A novel multi-graphics processing unit parallel optimization framework for the sparse matrix-vector multiplication. Concurr. Comput.-Pract. E. 29(5), e3936 (2017)CrossRefGoogle Scholar
  15. 15.
    Gao, J., Wang, Y., Wang, J., Liang, R.: Adaptive optimization modeling of preconditioned conjugate gradient on multi-GPUs. ACM Trans. Parallel Comput. 3(3), 16 (2016)CrossRefGoogle Scholar
  16. 16.
    Sun, X., Zhang, Y., Wang, T.: Optimizing SpMV for diagonal sparse matrices on GPU. In: 2011 International Conference on Parallel Processing, ICPP 2011, pp. 492–501. IEEE, Taipei (2011)Google Scholar
  17. 17.
  18. 18.
    He, G., Gao, J., Wang, J.: Efficient dense matrix-vector multiplication on GPU. Concurr. Comput.-Pract. E., e4705(2018). Scholar
  19. 19.
    Abdelfattah, A., Keyes, D., Ltaief, H.: KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators. ACM Trans. Math. Software 42(3), 18 (2014)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Software 38(1), 1–25 (2011)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Gao, J., Zhou, Y., He, G., Xia, Y.: A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm. Parallel Comput. 63, 1–16 (2017)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wang, T., Zhao, X., Jiang, J.: Unconditional and optimal \(H^2\)-error estimates of two linear and conservative finite difference schemes for the Klein-Gordon-Schrödinger equation in high dimensions. Adv. Comput. Math. 44(2), 477–503 (2018)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyNanjing Normal UniversityNanjingChina
  2. 2.Zhijiang CollegeZhejiang University of TechnologyHangzhouChina

Personalised recommendations