The Journal of Supercomputing

, Volume 74, Issue 6, pp 2823–2840 | Cite as

Improving performance of iterative solvers with the AXC format using the Intel Xeon Phi

  • Edoardo Coronado-Barrientos
  • Guillermo Indalecio
  • Antonio García-Loureiro


This work is focused on the application of the new AXC format in iterative algorithms on the Intel Xeon Phi coprocessor to solve linear systems by accelerating the sparse matrix-vector (SpMV) product. These algorithms are the Conjugate Gradient (CG) and the Biconjugate Gradient Stabilised (BiCGS) methods used to solve symmetric and non-symmetric linear systems. Two highly efficient formats were selected to compare the AXC format: the Compressed Sparse Row (CSR) format and the Sliced ELLPACK-C-\(\alpha \) (SELL-C-\(\alpha \)) format. The evaluation process consists of two phases. The first phase is focused on the performance comparison of the SpMV kernels alone. The second phase compares the performance of the solvers with the different kernels integrated within them. As this work is oriented to explore the full capabilities of the Intel Xeon Phi architecture, the Intel’s intrinsic instructions set were used to code the kernels for the AXC and SELL-C-\(\alpha \) formats and compared with the Intel’s MKL optimal implementation of the CSR format. Numerical results demonstrate that the AXC format achieves the higher average performance 15.9 GFLOPS against the 5.5 GFLOPS for the SELL-C-\(\alpha \) format and 8.4 GFLOPs for the MKL CSR format, and it also have the lowest variability when performing the SpMV product alone. Results also show that the AXC format achieves up to 6.5\(\times \), 1.5\(\times \) and 1.9\(\times \) greater performance over the MKL CSR format, its closest competitor, for the SpMV product, the CG and the BiCGS algorithms, respectively.


Sparse matrix storage Sparse matrix vector product Conjugate Gradient Biconjugate Gradient Stabilised Intel Xeon Phi MIC intrinsics 



This research was supported in part by the Spanish Government under the Projects TIN2013-41129-P and TIN2016-76373-P, by Xunta de Galicia and FEDER Funds (GRC 2014/008), by financial support from the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016–2019, ED431G/08), the European Regional Development Fund (ERDF).

Supplementary material


  1. 1.
    Petkov P, Grancharov D, Markov S, Georgiev G, Lilkova E, Ilieva N, Litov L (2016) Massively parallel Poisson Equation Solver for hybrid Intel Xeon–Xeon Phi HPC Systems. Technical report, Partnership for Advanced Computing in Europe (PRACE).
  2. 2.
    Labus P (2015) Lattice quantum chromodynamics on Intel Xeon Phi based supercomputers. Master’s thesis, Abdus Salam International Centre for Theoretical Physics, ItalyGoogle Scholar
  3. 3.
    Coronado-Barrientos E, Indalecio G, Seoane N, García-Loureiro A (2015) Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL. In: Proceedings of the 2015 Spanish Conference on Electron Devices, CDE 2015. IEEE.
  4. 4.
    Wong J, Kuhl E, Darve E (2015) A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int J Numer Methods Eng 102(12):1784–1814. MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
  6. 6.
  7. 7.
    Bolz J, Farmer I, Grinspun E, Schrder P (2003) Sparse matrix solvers on the GPU conjugate gradients and multigrid. ACM Trans Graph 22:7CrossRefGoogle Scholar
  8. 8.
    Buatois L, Caumon G, Lévy B (2009) Concurrent number cruncher a GPU implementation of a general sparse linear solver. Int J Parallel Emerg Distrib Syst 24:20MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ortega G, Garzón EM, Vázquez F, García I (2012) The BiConjugate gradient on GPUs. J Supercomput 64:49–58. CrossRefGoogle Scholar
  10. 10.
    Vázquez F, Ortega G, Fernández JJ, Garzón EM (2010) Improving the performance of the sparse matrix vector with GPUs. In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, CIT’10 IEEE Computer Society, Washington, DC, USA, pp 1146–1151.
  11. 11.
    Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, der Vorst HV (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, PhiladelphiaCrossRefzbMATHGoogle Scholar
  12. 12.
    Kreutzer M, Hager G, Wellein G, Fehske H, Bishop AR (2014) A unified sparse matrix data format for modern processors with wide SIMD units. SIAM J Sci Comput 36:C401–C423CrossRefzbMATHGoogle Scholar
  13. 13.
  14. 14.
    Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):28MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    der Vorst HAV (1991) BiCGSTAB A fast and smoothly converging variant of BiCG for the solution of nonsymmetric linear systems. SIAM J Sci Stat Comput 13(2):28. Google Scholar
  16. 16.
  17. 17.
  18. 18.
    Reinders J (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors.
  19. 19.
    Saule E, Kaya K, Çatalyürek ÜV (2014) Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. Springer, Berlin, pp 559–570. Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Centro Singular de Investigación en Tecnoloxías da Información (CiTIUS)Universidade de Santiago de CompostelaSantiago de CompostelaSpain

Personalised recommendations