A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs


In this paper, we present PARASOF, an algorithm for the solution of linear systems with BABD matrices on massively parallel computing systems like graphic processing units or GPUs. This algorithm is compared with the state-of-the-art algorithms, in particular SOF, from which it is inspired and takes the same stability properties. We detail its design and implementation issues and give the main figures of its theoretical and experimental performances.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) High Performance Computing. ISBN 978-3-319-41321-1, pp 21–38. Springer International Publishing, Cham (2016)

  2. 2.

    Amodio, P., Paprzycki, M.: Parallel solution of almost block diagonal systems on a hypercube. Linear Algebra and its Applications 241-243, 85–103 (1996). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(95)00588-9. http://www.sciencedirect.com/science/article/pii/0024379595005889. Proceedings of the Fourth Conference of the International Linear Algebra Society

    MathSciNet  Article  Google Scholar 

  3. 3.

    Amodio, P., Romanazzi, G.: Algorithm 859: BABDCR - a Fortran 90 package for the solution of bordered ABD linear systems. ACM Trans. Math. Softw. 32, 597–608 (2006)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Amodio, P., Cash, J. R., Roussos, G., Wright, R. W., Fairweather, G., Gladwell, I., Kraut, G. L., Paprzycki, M.: Almost block diagonal linear systems: sequential and parallel solution techniques, and applications. Numerical Linear Algebra with Applications 7(5), 275–317 (2000)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Beghi, A., Marcuzzi, F., Rampazzo, M., Virgulin, M.: Enhancing the simulation-centric design of cyber-physical and multi-physics systems through co-simulation. In: 2014 17th Euromicro Conference on Digital System Design. https://doi.org/10.1109/DSD.2014.97, pp 687–690 (2014)

  6. 6.

    Beghi, A., Marcuzzi, F., Rampazzo, M.: A virtual laboratory for the prototyping of cyber-physical systems. IFAC-PapersOnLine 49(6), 63–68 (2016)

    Article  Google Scholar 

  7. 7.

    Bertolazzi, E., Biral, F., Da Lio, M.: Symbolic-numeric efficient solution of optimal control problems for multibody systems. Journal of Computational and Applied Mathematics 185(2), 404–421 (2006). ISSN 0377-0427. https://doi.org/10.1016/j.cam.2005.03.019. Special Issue: International Workshop on the Technological Aspects of Mathematics

    MathSciNet  Article  Google Scholar 

  8. 8.

    Bock, H.: Recent advances in parameter identification techniques for O.D.E., pp. 95–121. https://doi.org/10.1007/978-1-4684-7324-7_7 (1983)

  9. 9.

    NVIDIA Corporation: CUDA C Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Version 10.1 (2019)

  10. 10.

    Dessole, M., Marcuzzi, F.: Fully iterative ILU preconditioning of the unsteady Navier–Stokes equations for GPGPU. Computers & Mathematics with Applications 77(4), 907–927 (2019). ISSN 0898-1221. https://doi.org/10.1016/j.camwa.2018.10.037

    MathSciNet  Article  Google Scholar 

  11. 11.

    Fairweather, G., Gladwell, I.: Algorithms for almost block diagonal linear systems. SIAM Rev. 46(1), 49–58 (2004)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Gallopoulos, E., Philippe, B., Sameh, A.: Parallelism in matrix computations. ISBN 978-94-017-7188-7. https://doi.org/10.1007/978-94-017-7188-7 (2016)

  13. 13.

    Haidar, A., Dong, T., Tomov, S., Luszczek, P., Dongarra, J.: Framework for batched and GPU-resident factorization algorithms to block householder transformations. In: ISC High Performance, Frankfurt, Germany, 07-2015. Springer (2015)

  14. 14.

    Kontovasilis, K., Plemmons, R. J., Stewart, W. J.: Block cyclic SOR for Markov chains with p-cyclic infinitesimal generator. Linear Algebra and its Applications 154-156, 145–223 (1991). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(91)90377-9. http://www.sciencedirect.com/science/article/pii/0024379591903779

    MathSciNet  Article  Google Scholar 

  15. 15.

    Romanazzi, G., Gladwell, I., Amodio, P.: Numerical solution of general bordered abd linear systems by cyclic reduction. Journal of Numerical Analysis Industrial and Applied Mathematics 1, 5–12 (2006)

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Hockney, R.W., Jessope, C.R.: Parallel computers (1983)

  17. 17.

    Wright, S.: A collection of problems for which Gaussian elimination with partial pivoting is unstable. SIAM J. Sci. Comput. 14(1), 231–238 (1993). https://doi.org/10.1137/0914013

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Wright, S. J.: Stable parallel algorithms for two-point boundary value problems. SIAM J. Sci. Statist. Comput 13, 742–764 (1992)

    MathSciNet  Article  Google Scholar 

Download references


The authors received a doctoral grant funded by BeanTech s.r.l. “GPU computing for modeling, nonlinear optimization and machine learning.” This work was partially supported by INdAM-GNCS 2019 project “Tecniche innovative e parallele per sistemi lineari e nonlineari di grandi dimensioni, funzioni ed equazioni matriciali ed applicazioni.”

Author information



Corresponding author

Correspondence to M. Dessole.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dessole, M., Marcuzzi, F. A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs. Numer Algor 86, 1243–1263 (2021). https://doi.org/10.1007/s11075-020-00931-8

Download citation


  • GPU
  • Parallel algorithms
  • BABD system
  • Batched routines
  • Optimal control
  • GPGPU computing