Skip to main content

Benchmarking SpMV Methods on Many-Core Platforms

  • Conference paper
  • First Online:
Benchmarking, Measuring, and Optimizing (Bench 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11459))

Included in the following conference series:

Abstract

SpMV is an essential kernel existing in many HPC and data center applications. Meanwhile, the emerging many-core hardware provides promising computational power, and is widely used for acceleration. Many methods and formats have been proposed aiming at better performance of SpMV on many-core platforms. However, there is still lack of a comprehensive comparison of SpMV methods to show their performance difference on sparse matrices with various sparse patterns. Moreover, there is still no systematic work to bridge the gap between SpMV performance and sparse pattern.

In this paper, we investigate the performance of 27 SpMV methods with 1500+ sparse matrices on two many-core platforms: Intel Xeon Phi (Knights Landing 7250) and Nvidia GPGPU (Tesla M40). Our work shows that no single SpMV methods is optimal for all sparse patterns, but some methods can achieve approximately the best performance on most sparse matrices. We further select 13 features to describe the sparse pattern and analyze their correlations to the performance of each SpMV method. Our observations should help other researchers and practitioners to better understand the SpMV performance and provide implications to guide the selection of suitable SpMV method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ravishankar, M., et al.: Distributed memory code generation for mixed irregular/regular computations. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 65–75. ACM, New York (2015). http://doi.acm.org/10.1145/2688500.2688515

  2. Venkat, A., Hall, M., Strout, M.: Loop and data transformations for sparse matrix code. SIGPLAN Not. 506, 521–532 (2015). https://doi.org/10.1145/2737924.2738003

    Article  Google Scholar 

  3. Wang, L., et al.: Bigdatabench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, pp. 488–499, Feburary 2014

    Google Scholar 

  4. Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2013, pp. 66–76, September 2013

    Google Scholar 

  5. Liu, C., Xie, B., Liu, X., Xue, W., Yang, H., Liu, X.: Towards efficient SpMV on sunway manycore architectures. In: Proceedings of the 2018 International Conference on Supercomputing, ICS 2018, pp. 363–373. ACM, New York (2018). http://doi.acm.org/10.1145/3205289.3205313

  6. Buono, D., et al.: Optimizing sparse matrix-vector multiplication for large-scale data analytics. In: Proceedings of the 30th International Conference on Supercomputing, ICS 2016, pp. 37:1–37:12. ACM, New York (2016). http://doi.acm.org/10.1145/2925426.2926278

  7. Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 13th ACM/IEEE Conference on Supercomputing, ICS 1999. ACM, New York (1999). http://doi.acm.org/10.1145/331532.331562

  8. Yavits, L., Ginosar, R.: Accelerator for sparse machine learning. IEEE Comput. Archit. Lett. 99, 1 (2017)

    Google Scholar 

  9. Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 769–780. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.68

  10. Abu-Sufah, W., Abdel Karim, A.: Auto-tuning of sparse matrix-vector multiplication on graphics processors. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 151–164. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_12

    Chapter  Google Scholar 

  11. Li, J., Tan, G., Chen, M., Sun, N.: SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 117–126. ACM, New York (2013). http://doi.acm.org/10.1145/2462156.2462181

  12. Elafrou, A., Goumas, G., Koziris, N.: A lightweight optimization selection method for sparse matrix-vector multiplication. arXiv e-prints, November 2015

    Google Scholar 

  13. Yan, S., Li, C., Zhang, Y., Zhou, H.: YASPMV: yet another SpMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 107–118. ACM, New York (2014). http://doi.acm.org/10.1145/2555243.2555255

  14. Sedaghati, N., Mu, T., Pouchet, L.-N., Parthasarathy, S., Sadayappan, P.: Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 99–108. ACM, New York (2015). http://doi.acm.org/10.1145/2751205.2751244

  15. Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, pp. 94–108. ACM, New York (2018)

    Google Scholar 

  16. Sodani, A., et al.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 362, 34–46 (2016). https://doi.org/10.1109/MM.2016.25

    Article  Google Scholar 

  17. Wang, E., et al.: Intel math kernel library. In: Wang, E. (ed.) High-Performance Computing on the Intel® Xeon Phi™, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7

    Chapter  Google Scholar 

  18. CUDA CUSPARSE Library: NVIDIA, August 2010

    Google Scholar 

  19. Dalton, S., Bell, N., Olson, L., Garland, M.: CUSP: generic parallel algorithms for sparse matrix and graph computations, version 0.5.0. (2014). http://cusplibrary.github.io/

  20. Bosma, W., Cannon, J., Playoust, C.: The magma algebra system I: the user language. J. Symb. Comput. 243–4, 235–265 (1997). https://doi.org/10.1006/jsco.1996.0125

    Article  MathSciNet  MATH  Google Scholar 

  21. Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 273–282. ACM, New York (2014). http://doi.acm.org/10.1145/2597652.2597678

  22. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th ACM International Conference on Supercomputing, ICS 2013, pp. 273–282. ACM, New York (2013). http://doi.acm.org/10.1145/2464996.2465013

  23. Xie, B., et al.: CVR: efficient vectorization of spmv on x86 processors. In: Proceedings of the 16th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2018 (2018)

    Google Scholar 

  24. Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015)

    Google Scholar 

  25. Tang, W.T., et al.: Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi. In: Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, pp. 136–145. IEEE Computer Society, Washington (2015)

    Google Scholar 

  26. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009). http://doi.acm.org/10.1145/1654059.1654078

  27. Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 58:1–58:12. IEEE, Piscataway (2016). https://doi.org/10.1109/SC.2016.57

  28. Davis, T.A.: The University of Florida sparse matrix collection. NA DIGEST (1997)

    Google Scholar 

Download references

Acknowledgement

This work was supported partially by National Key R&D Program of China (2016YFB1000201), and the National Natural Science Foundation of China (Grant No. 61420106013), and Youth Innovation Promotion Association of Chinese Academy of Sciences (2013073).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biwei Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, B., Jia, Z., Bao, Y. (2019). Benchmarking SpMV Methods on Many-Core Platforms. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32813-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32812-2

  • Online ISBN: 978-3-030-32813-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics