Skip to main content

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2015)

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel\(^{{\textregistered }}\) Xeon\(^{{\textregistered }}\) processors. We are up to 3.8X faster than Intel\(^{{\textregistered }}\) Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.

  2. 2.

    Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel micro-architecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804.

References

  1. Combinatorial Blas v 1.3. http://gauss.cs.ucsb.edu/~aydin/CombBLAS/html/

  2. Thread affinity interface. https://software.intel.com/en-us/node/522691

  3. Intel math kernel library (2015). https://software.intel.com/en-us/intel-mkl

  4. Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  5. Buluc, A., Gilbert, J.: On the representation and multiplication of hypersparse matrices. In: Proceedings of IPDPS, pp. 1–11, April 2008

    Google Scholar 

  6. Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. CoRR abs/1109.3739 (2011)

    Google Scholar 

  7. Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput. 39(5), 2075–2089 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  8. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)

    MathSciNet  Google Scholar 

  9. Gilbert, J., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  10. Gilbert, J.R., Reinhardt, S., Shah, V.B.: High-performance graph algorithms from parallel sparse matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 260–269. Springer, Heidelberg (2007)

    Google Scholar 

  11. Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans. Math. Softw. 4(3), 250–269 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  12. Kaplan, H., Sharir, M., Verbin, E.: Colored intersection searching via sparse rectangular matrix multiplication. In: Symposium on Computational Geometry, pp. 52–60. ACM (2006)

    Google Scholar 

  13. Liu, W., Vinter, B.: An efficient GPU general sparse matrix-matrix multiplication for irregular data. In: Proceedings of IPDPS, pp. 370–381. IEEE (2014)

    Google Scholar 

  14. Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray User’s Group (2010)

    Google Scholar 

  15. Siegel, J., et al.: Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems. In: IEEE Cluster Computing, pp. 1–8 (2010)

    Google Scholar 

  16. Sulatycke, P., Ghose, K.: Caching-efficient multithreaded fast multiplication of sparse matrices. In: Proceedings of IPPS/SPDP 1998, pp. 117–123, March 1998

    Google Scholar 

  17. Vassilevska, V., Williams, R., Yuster, R.: Finding heaviest h-subgraphs in real weighted graphs, with applications. CoRR abs/cs/0609009 (2006)

    Google Scholar 

  18. Zhu, Q., Graf, T., Sumbul, H., Pileggi, L., Franchetti, F.: Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In: IEEE HPEC, pp. 1–6 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Mostofa Ali Patwary .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Patwary, M.M.A. et al. (2015). Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics