Towards a Lightweight Method to Predict the Performance of Sparse Triangular Solvers on Heterogeneous Hardware Platforms

  • Raúl Marichal
  • Ernesto DufrechouEmail author
  • Pablo Ezzatti
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1087)


The solution of sparse triangular linear systems (SpTrSV) is a fundamental building block for many numerical methods. The important presence in different fields and the considerable computational cost of this operation have motivated several efforts to accelerate it on different hardware platforms and, in particular, on those equipped with massively-parallel processors. Until recently, the dominant approach to parallelize this operation on this sort of hardware was the level-set method, which relies on a costly preprocessing phase. For this reason, much of the research on the subject is focused on the case where several triangular linear systems have to be solved for the same matrix. However, the latest efforts have proposed efficient one-phase routines that can be advantageous even when only one SpTrSV needs to be applied for each matrix. In these cases, the decision of which solver to employ strongly depends of the degree of parallelism offered by the linear system. In this work we provide an inexpensive algorithm to estimate the degree of parallelism of a triangular matrix, and explore some heuristics to select between the SpTrSV routine provided by the Intel MKL library and our one-phase GPU solver. The experimental evaluation performed shows that our proposal achieves generally accurate predictions with runtimes two orders lower than the state of the art method to compute the DAG levels.


Multi-core GPU Sparse triangular linear systems Parallelism estimation 



The researchers from UdelaR were supported by Universidad de la República and the PEDECIBA.


  1. 1.
    Anderson, E., Saad, Y.: Solving sparse triangular linear systems on parallel computers. Int. J. High Speed Comput. 1(01), 73–95 (1989)CrossRefGoogle Scholar
  2. 2.
    Davis, T.: Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2006)CrossRefGoogle Scholar
  3. 3.
    Dufrechou, E., Ezzatti, P.: Using analysis information in the synchronization-free GPU solution of sparse triangular systems. Concurr. Comput.: Pract. Exp., e5499.
  4. 4.
    Dufrechou, E., Ezzatti, P.: Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 196–203, March 2018Google Scholar
  5. 5.
    Erguiz, D., Dufrechou, E., Ezzatti, P.: Assessing sparse triangular linear system solvers on GPUs. In: International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 37–42, October 2017Google Scholar
  6. 6.
    Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (2013)zbMATHGoogle Scholar
  7. 7.
    Intel. Math Kernel Library (2012).
  8. 8.
    Liu, W., Li, A., Hogg, J., Duff, I.S., Vinter, B.: A synchronization-free algorithm for parallel sparse triangular solves. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 617–630. Springer, Cham (2016). Scholar
  9. 9.
    Liu, W., Li, A., Hogg, J.D., Duff, I.S., Vinter, B.: Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurr. Comput. Prac. Exp. 29(21), e4244 (2017)CrossRefGoogle Scholar
  10. 10.
    Marichal, R., Dufrechou, E., Ezzatti, P.: Assessing the solution of one sparse triangular linear system on multi-many core platforms. CLEI (2019). Under ReviewGoogle Scholar
  11. 11.
    Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Corp., Westford, MA, USA, Technical report, NVR-2011, 1 (2011)Google Scholar
  12. 12.
    Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)CrossRefGoogle Scholar
  13. 13.
    Saltz, J.H., Screduliog Alyc, Becucticy cf Syacercnizatxoi, National Aeronautics, Saltz, J.E.: Automated problem scheduling and reduction of synchronization delay effects. Technical report (1987)Google Scholar
  14. 14.
    Wang, X., Liu, W., Xue, W., Li, W.: swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures. SIGPLAN Not. 53(1), 338–353 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Raúl Marichal
    • 1
  • Ernesto Dufrechou
    • 1
    Email author
  • Pablo Ezzatti
    • 1
  1. 1.Instituto de ComputaciónUniversidad de la RepúblicaMontevideoUruguay

Personalised recommendations