Improving performance of iterative solvers with the AXC format using the Intel Xeon Phi
- 72 Downloads
This work is focused on the application of the new AXC format in iterative algorithms on the Intel Xeon Phi coprocessor to solve linear systems by accelerating the sparse matrix-vector (SpMV) product. These algorithms are the Conjugate Gradient (CG) and the Biconjugate Gradient Stabilised (BiCGS) methods used to solve symmetric and non-symmetric linear systems. Two highly efficient formats were selected to compare the AXC format: the Compressed Sparse Row (CSR) format and the Sliced ELLPACK-C-\(\alpha \) (SELL-C-\(\alpha \)) format. The evaluation process consists of two phases. The first phase is focused on the performance comparison of the SpMV kernels alone. The second phase compares the performance of the solvers with the different kernels integrated within them. As this work is oriented to explore the full capabilities of the Intel Xeon Phi architecture, the Intel’s intrinsic instructions set were used to code the kernels for the AXC and SELL-C-\(\alpha \) formats and compared with the Intel’s MKL optimal implementation of the CSR format. Numerical results demonstrate that the AXC format achieves the higher average performance 15.9 GFLOPS against the 5.5 GFLOPS for the SELL-C-\(\alpha \) format and 8.4 GFLOPs for the MKL CSR format, and it also have the lowest variability when performing the SpMV product alone. Results also show that the AXC format achieves up to 6.5\(\times \), 1.5\(\times \) and 1.9\(\times \) greater performance over the MKL CSR format, its closest competitor, for the SpMV product, the CG and the BiCGS algorithms, respectively.
KeywordsSparse matrix storage Sparse matrix vector product Conjugate Gradient Biconjugate Gradient Stabilised Intel Xeon Phi MIC intrinsics
This research was supported in part by the Spanish Government under the Projects TIN2013-41129-P and TIN2016-76373-P, by Xunta de Galicia and FEDER Funds (GRC 2014/008), by financial support from the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016–2019, ED431G/08), the European Regional Development Fund (ERDF).
- 1.Petkov P, Grancharov D, Markov S, Georgiev G, Lilkova E, Ilieva N, Litov L (2016) Massively parallel Poisson Equation Solver for hybrid Intel Xeon–Xeon Phi HPC Systems. Technical report, Partnership for Advanced Computing in Europe (PRACE). http://www.prace-ri.eu/IMG/pdf/wp143.pdf
- 2.Labus P (2015) Lattice quantum chromodynamics on Intel Xeon Phi based supercomputers. Master’s thesis, Abdus Salam International Centre for Theoretical Physics, ItalyGoogle Scholar
- 3.Coronado-Barrientos E, Indalecio G, Seoane N, García-Loureiro A (2015) Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL. In: Proceedings of the 2015 Spanish Conference on Electron Devices, CDE 2015. IEEE. https://doi.org/10.1109/CDE.2015.7087476. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7087476&queryText=indalecio&newsearch=true&searchField=Search_All
- 10.Vázquez F, Ortega G, Fernández JJ, Garzón EM (2010) Improving the performance of the sparse matrix vector with GPUs. In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, CIT’10 IEEE Computer Society, Washington, DC, USA, pp 1146–1151. http://dx.doi.org/10.1109/CIT.2010.208
- 13.Intel Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/
- 16.Intel Xeon Phi Coprocessor 7120P. https://ark.intel.com/products/75799/Intel-Xeon-Phi-Coprocessor-7120P-16GB-1_238-GHz-61-core
- 17.OpenCL Programming Guide for the Intel Xeon Phi coprocessor. https://software.intel.com/en-us/articles/opencl-design-and-programming-guide-for-the-intel-xeon-phi-coprocessor
- 18.Reinders J (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. https://software.intel.com/sites/default/files/article/330164/an-overview-of-programming-for-intel-xeon-processors-and-intel-xeon-phi-coprocessors_1.pdf