Abstract
In this paper, we present PARASOF, an algorithm for the solution of linear systems with BABD matrices on massively parallel computing systems like graphic processing units or GPUs. This algorithm is compared with the state-of-the-art algorithms, in particular SOF, from which it is inspired and takes the same stability properties. We detail its design and implementation issues and give the main figures of its theoretical and experimental performances.
This is a preview of subscription content, access via your institution.






References
- 1.
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) High Performance Computing. ISBN 978-3-319-41321-1, pp 21–38. Springer International Publishing, Cham (2016)
- 2.
Amodio, P., Paprzycki, M.: Parallel solution of almost block diagonal systems on a hypercube. Linear Algebra and its Applications 241-243, 85–103 (1996). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(95)00588-9. http://www.sciencedirect.com/science/article/pii/0024379595005889. Proceedings of the Fourth Conference of the International Linear Algebra Society
- 3.
Amodio, P., Romanazzi, G.: Algorithm 859: BABDCR - a Fortran 90 package for the solution of bordered ABD linear systems. ACM Trans. Math. Softw. 32, 597–608 (2006)
- 4.
Amodio, P., Cash, J. R., Roussos, G., Wright, R. W., Fairweather, G., Gladwell, I., Kraut, G. L., Paprzycki, M.: Almost block diagonal linear systems: sequential and parallel solution techniques, and applications. Numerical Linear Algebra with Applications 7(5), 275–317 (2000)
- 5.
Beghi, A., Marcuzzi, F., Rampazzo, M., Virgulin, M.: Enhancing the simulation-centric design of cyber-physical and multi-physics systems through co-simulation. In: 2014 17th Euromicro Conference on Digital System Design. https://doi.org/10.1109/DSD.2014.97, pp 687–690 (2014)
- 6.
Beghi, A., Marcuzzi, F., Rampazzo, M.: A virtual laboratory for the prototyping of cyber-physical systems. IFAC-PapersOnLine 49(6), 63–68 (2016)
- 7.
Bertolazzi, E., Biral, F., Da Lio, M.: Symbolic-numeric efficient solution of optimal control problems for multibody systems. Journal of Computational and Applied Mathematics 185(2), 404–421 (2006). ISSN 0377-0427. https://doi.org/10.1016/j.cam.2005.03.019. Special Issue: International Workshop on the Technological Aspects of Mathematics
- 8.
Bock, H.: Recent advances in parameter identification techniques for O.D.E., pp. 95–121. https://doi.org/10.1007/978-1-4684-7324-7_7 (1983)
- 9.
NVIDIA Corporation: CUDA C Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Version 10.1 (2019)
- 10.
Dessole, M., Marcuzzi, F.: Fully iterative ILU preconditioning of the unsteady Navier–Stokes equations for GPGPU. Computers & Mathematics with Applications 77(4), 907–927 (2019). ISSN 0898-1221. https://doi.org/10.1016/j.camwa.2018.10.037
- 11.
Fairweather, G., Gladwell, I.: Algorithms for almost block diagonal linear systems. SIAM Rev. 46(1), 49–58 (2004)
- 12.
Gallopoulos, E., Philippe, B., Sameh, A.: Parallelism in matrix computations. ISBN 978-94-017-7188-7. https://doi.org/10.1007/978-94-017-7188-7 (2016)
- 13.
Haidar, A., Dong, T., Tomov, S., Luszczek, P., Dongarra, J.: Framework for batched and GPU-resident factorization algorithms to block householder transformations. In: ISC High Performance, Frankfurt, Germany, 07-2015. Springer (2015)
- 14.
Kontovasilis, K., Plemmons, R. J., Stewart, W. J.: Block cyclic SOR for Markov chains with p-cyclic infinitesimal generator. Linear Algebra and its Applications 154-156, 145–223 (1991). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(91)90377-9. http://www.sciencedirect.com/science/article/pii/0024379591903779
- 15.
Romanazzi, G., Gladwell, I., Amodio, P.: Numerical solution of general bordered abd linear systems by cyclic reduction. Journal of Numerical Analysis Industrial and Applied Mathematics 1, 5–12 (2006)
- 16.
Hockney, R.W., Jessope, C.R.: Parallel computers (1983)
- 17.
Wright, S.: A collection of problems for which Gaussian elimination with partial pivoting is unstable. SIAM J. Sci. Comput. 14(1), 231–238 (1993). https://doi.org/10.1137/0914013
- 18.
Wright, S. J.: Stable parallel algorithms for two-point boundary value problems. SIAM J. Sci. Statist. Comput 13, 742–764 (1992)
Funding
The authors received a doctoral grant funded by BeanTech s.r.l. “GPU computing for modeling, nonlinear optimization and machine learning.” This work was partially supported by INdAM-GNCS 2019 project “Tecniche innovative e parallele per sistemi lineari e nonlineari di grandi dimensioni, funzioni ed equazioni matriciali ed applicazioni.”
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dessole, M., Marcuzzi, F. A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs. Numer Algor 86, 1243–1263 (2021). https://doi.org/10.1007/s11075-020-00931-8
Received:
Accepted:
Published:
Issue Date:
Keywords
- GPU
- Parallel algorithms
- BABD system
- Batched routines
- Optimal control
- GPGPU computing