Sparse CSB_Coo Matrix-Vector and Matrix-Matrix Performance on Intel Xeon Architectures

  • Brandon CookEmail author
  • Charlene Yang
  • Thorsten Kurth
  • Jack Deslippe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


The CSB_Coo sparse matrix format is especially useful in situations such as eigenvalue problems where efficient SPMV and transposed SPMV_T operations are required. One strategy to increase the arithmetic intensity of large scale parallel solvers is to use a blocked eigensolver such LOBPCG and to operate on blocks of vectors to achieve greater performance. However, this solution is not always practical as MPI communication may be higher leading to inefficiencies or the increased memory usage of dense vectors may be impractical. Additionally the Lanczos algorithm is well tested in production and may be preferred in some situations. On modern architectures vectorization is key for obtaining good performance. In this paper we show the performance optimization and benefits of vectorization with AVX-512 Conflict Detection (CD) instructions in the case of a standard SPMV operation on a single vector. We also present a modified version of the CSB_Coo format which allows more efficient vector operations. We compare and analyze performance on Haswell, Xeon Phi (KNL and KNM) and Intel Xeon Scalable processors (Skylake).


SPMV SPMM Performance AVX-512 Vectorization 



This work used resources provided by the Performance Research Laboratory at the University of Oregon. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.


  1. 1.
    Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1213–1222. IEEE (2014)Google Scholar
  2. 2.
    Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016). Scholar
  3. 3.
    Cook, B., Maris, P., Shao, M., Wichmann, N., Wagner, M., OâĂŹNeill, J., Phung, T., Bansal, G.: High performance optimizations for nuclear physics code MFDn on KNL. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 366–377. Springer, Cham (2016). Scholar
  4. 4.
    Knyazev, A.V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23(2), 517–541 (2001)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl Bur. Std. 45, 255–282 (1950)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), Article no. 014310 (2015).
  7. 7.
    Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of \(^{14}\)C. Phys. Rev. Lett. 106(20), Article no. 202502 (2011).
  8. 8.
    Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations