Benchmarking SpMV Methods on Many-Core Platforms

Xie, Biwei; Jia, Zhen; Bao, Yungang

doi:10.1007/978-3-030-32813-9_19

Biwei Xie¹⁰,
Zhen Jia¹¹ &
Yungang Bao^10,12

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11459))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

1305 Accesses
1 Citations

Abstract

SpMV is an essential kernel existing in many HPC and data center applications. Meanwhile, the emerging many-core hardware provides promising computational power, and is widely used for acceleration. Many methods and formats have been proposed aiming at better performance of SpMV on many-core platforms. However, there is still lack of a comprehensive comparison of SpMV methods to show their performance difference on sparse matrices with various sparse patterns. Moreover, there is still no systematic work to bridge the gap between SpMV performance and sparse pattern.

In this paper, we investigate the performance of 27 SpMV methods with 1500+ sparse matrices on two many-core platforms: Intel Xeon Phi (Knights Landing 7250) and Nvidia GPGPU (Tesla M40). Our work shows that no single SpMV methods is optimal for all sparse patterns, but some methods can achieve approximately the best performance on most sparse matrices. We further select 13 features to describe the sparse pattern and analyze their correlations to the performance of each SpMV method. Our observations should help other researchers and practitioners to better understand the SpMV performance and provide implications to guide the selection of suitable SpMV method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ravishankar, M., et al.: Distributed memory code generation for mixed irregular/regular computations. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pp. 65–75. ACM, New York (2015). http://doi.acm.org/10.1145/2688500.2688515
Venkat, A., Hall, M., Strout, M.: Loop and data transformations for sparse matrix code. SIGPLAN Not. 506, 521–532 (2015). https://doi.org/10.1145/2737924.2738003
Article Google Scholar
Wang, L., et al.: Bigdatabench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, pp. 488–499, Feburary 2014
Google Scholar
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2013, pp. 66–76, September 2013
Google Scholar
Liu, C., Xie, B., Liu, X., Xue, W., Yang, H., Liu, X.: Towards efficient SpMV on sunway manycore architectures. In: Proceedings of the 2018 International Conference on Supercomputing, ICS 2018, pp. 363–373. ACM, New York (2018). http://doi.acm.org/10.1145/3205289.3205313
Buono, D., et al.: Optimizing sparse matrix-vector multiplication for large-scale data analytics. In: Proceedings of the 30th International Conference on Supercomputing, ICS 2016, pp. 37:1–37:12. ACM, New York (2016). http://doi.acm.org/10.1145/2925426.2926278
Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 13th ACM/IEEE Conference on Supercomputing, ICS 1999. ACM, New York (1999). http://doi.acm.org/10.1145/331532.331562
Yavits, L., Ginosar, R.: Accelerator for sparse machine learning. IEEE Comput. Archit. Lett. 99, 1 (2017)
Google Scholar
Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 769–780. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.68
Abu-Sufah, W., Abdel Karim, A.: Auto-tuning of sparse matrix-vector multiplication on graphics processors. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 151–164. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_12
Chapter Google Scholar
Li, J., Tan, G., Chen, M., Sun, N.: SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 117–126. ACM, New York (2013). http://doi.acm.org/10.1145/2462156.2462181
Elafrou, A., Goumas, G., Koziris, N.: A lightweight optimization selection method for sparse matrix-vector multiplication. arXiv e-prints, November 2015
Google Scholar
Yan, S., Li, C., Zhang, Y., Zhou, H.: YASPMV: yet another SpMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 107–118. ACM, New York (2014). http://doi.acm.org/10.1145/2555243.2555255
Sedaghati, N., Mu, T., Pouchet, L.-N., Parthasarathy, S., Sadayappan, P.: Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 99–108. ACM, New York (2015). http://doi.acm.org/10.1145/2751205.2751244
Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, pp. 94–108. ACM, New York (2018)
Google Scholar
Sodani, A., et al.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 362, 34–46 (2016). https://doi.org/10.1109/MM.2016.25
Article Google Scholar
Wang, E., et al.: Intel math kernel library. In: Wang, E. (ed.) High-Performance Computing on the Intel® Xeon Phi™, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7
Chapter Google Scholar
CUDA CUSPARSE Library: NVIDIA, August 2010
Google Scholar
Dalton, S., Bell, N., Olson, L., Garland, M.: CUSP: generic parallel algorithms for sparse matrix and graph computations, version 0.5.0. (2014). http://cusplibrary.github.io/
Bosma, W., Cannon, J., Playoust, C.: The magma algebra system I: the user language. J. Symb. Comput. 243–4, 235–265 (1997). https://doi.org/10.1006/jsco.1996.0125
Article MathSciNet MATH Google Scholar
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 273–282. ACM, New York (2014). http://doi.acm.org/10.1145/2597652.2597678
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th ACM International Conference on Supercomputing, ICS 2013, pp. 273–282. ACM, New York (2013). http://doi.acm.org/10.1145/2464996.2465013
Xie, B., et al.: CVR: efficient vectorization of spmv on x86 processors. In: Proceedings of the 16th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2018 (2018)
Google Scholar
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015)
Google Scholar
Tang, W.T., et al.: Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi. In: Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, pp. 136–145. IEEE Computer Society, Washington (2015)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009). http://doi.acm.org/10.1145/1654059.1654078
Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 58:1–58:12. IEEE, Piscataway (2016). https://doi.org/10.1109/SC.2016.57
Davis, T.A.: The University of Florida sparse matrix collection. NA DIGEST (1997)
Google Scholar

Download references

Acknowledgement

This work was supported partially by National Key R&D Program of China (2016YFB1000201), and the National Natural Science Foundation of China (Grant No. 61420106013), and Youth Innovation Promotion Association of Chinese Academy of Sciences (2013073).

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Biwei Xie & Yungang Bao
Department of Computer Science, Princeton University, Princeton, USA
Zhen Jia
University of Chinese Academy of Sciences, Beijing, China
Yungang Bao

Authors

Biwei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yungang Bao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biwei Xie .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Chen Zheng
Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, B., Jia, Z., Bao, Y. (2019). Benchmarking SpMV Methods on Many-Core Platforms. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-32813-9_19
Published: 08 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics