In this paper, we present PARASOF, an algorithm for the solution of linear systems with BABD matrices on massively parallel computing systems like graphic processing units or GPUs. This algorithm is compared with the state-of-the-art algorithms, in particular SOF, from which it is inspired and takes the same stability properties. We detail its design and implementation issues and give the main figures of its theoretical and experimental performances.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Abdelfattah, A., Haidar, A., Tomov, S., Dongarra, J.: Performance, design, and autotuning of batched GEMM for GPUs. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) High Performance Computing. ISBN 978-3-319-41321-1, pp 21–38. Springer International Publishing, Cham (2016)
Amodio, P., Paprzycki, M.: Parallel solution of almost block diagonal systems on a hypercube. Linear Algebra and its Applications 241-243, 85–103 (1996). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(95)00588-9. http://www.sciencedirect.com/science/article/pii/0024379595005889. Proceedings of the Fourth Conference of the International Linear Algebra Society
Amodio, P., Romanazzi, G.: Algorithm 859: BABDCR - a Fortran 90 package for the solution of bordered ABD linear systems. ACM Trans. Math. Softw. 32, 597–608 (2006)
Amodio, P., Cash, J. R., Roussos, G., Wright, R. W., Fairweather, G., Gladwell, I., Kraut, G. L., Paprzycki, M.: Almost block diagonal linear systems: sequential and parallel solution techniques, and applications. Numerical Linear Algebra with Applications 7(5), 275–317 (2000)
Beghi, A., Marcuzzi, F., Rampazzo, M., Virgulin, M.: Enhancing the simulation-centric design of cyber-physical and multi-physics systems through co-simulation. In: 2014 17th Euromicro Conference on Digital System Design. https://doi.org/10.1109/DSD.2014.97, pp 687–690 (2014)
Beghi, A., Marcuzzi, F., Rampazzo, M.: A virtual laboratory for the prototyping of cyber-physical systems. IFAC-PapersOnLine 49(6), 63–68 (2016)
Bertolazzi, E., Biral, F., Da Lio, M.: Symbolic-numeric efficient solution of optimal control problems for multibody systems. Journal of Computational and Applied Mathematics 185(2), 404–421 (2006). ISSN 0377-0427. https://doi.org/10.1016/j.cam.2005.03.019. Special Issue: International Workshop on the Technological Aspects of Mathematics
Bock, H.: Recent advances in parameter identification techniques for O.D.E., pp. 95–121. https://doi.org/10.1007/978-1-4684-7324-7_7 (1983)
NVIDIA Corporation: CUDA C Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Version 10.1 (2019)
Dessole, M., Marcuzzi, F.: Fully iterative ILU preconditioning of the unsteady Navier–Stokes equations for GPGPU. Computers & Mathematics with Applications 77(4), 907–927 (2019). ISSN 0898-1221. https://doi.org/10.1016/j.camwa.2018.10.037
Fairweather, G., Gladwell, I.: Algorithms for almost block diagonal linear systems. SIAM Rev. 46(1), 49–58 (2004)
Gallopoulos, E., Philippe, B., Sameh, A.: Parallelism in matrix computations. ISBN 978-94-017-7188-7. https://doi.org/10.1007/978-94-017-7188-7 (2016)
Haidar, A., Dong, T., Tomov, S., Luszczek, P., Dongarra, J.: Framework for batched and GPU-resident factorization algorithms to block householder transformations. In: ISC High Performance, Frankfurt, Germany, 07-2015. Springer (2015)
Kontovasilis, K., Plemmons, R. J., Stewart, W. J.: Block cyclic SOR for Markov chains with p-cyclic infinitesimal generator. Linear Algebra and its Applications 154-156, 145–223 (1991). ISSN 0024-3795. https://doi.org/10.1016/0024-3795(91)90377-9. http://www.sciencedirect.com/science/article/pii/0024379591903779
Romanazzi, G., Gladwell, I., Amodio, P.: Numerical solution of general bordered abd linear systems by cyclic reduction. Journal of Numerical Analysis Industrial and Applied Mathematics 1, 5–12 (2006)
Hockney, R.W., Jessope, C.R.: Parallel computers (1983)
Wright, S.: A collection of problems for which Gaussian elimination with partial pivoting is unstable. SIAM J. Sci. Comput. 14(1), 231–238 (1993). https://doi.org/10.1137/0914013
Wright, S. J.: Stable parallel algorithms for two-point boundary value problems. SIAM J. Sci. Statist. Comput 13, 742–764 (1992)
The authors received a doctoral grant funded by BeanTech s.r.l. “GPU computing for modeling, nonlinear optimization and machine learning.” This work was partially supported by INdAM-GNCS 2019 project “Tecniche innovative e parallele per sistemi lineari e nonlineari di grandi dimensioni, funzioni ed equazioni matriciali ed applicazioni.”
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Dessole, M., Marcuzzi, F. A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs. Numer Algor 86, 1243–1263 (2021). https://doi.org/10.1007/s11075-020-00931-8
- Parallel algorithms
- BABD system
- Batched routines
- Optimal control
- GPGPU computing