Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs

Guo, Song; Dou, Yong; Lei, Yuanwu; Wang, Qiang; Xia, Fei; Chen, Jianning

doi:10.1007/978-3-662-49283-3_7

Song Guo¹⁴,
Yong Dou¹⁴,
Yuanwu Lei¹⁴,
Qiang Wang¹⁴,
Fei Xia¹⁵ &
…
Jianning Chen¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 592))

Included in the following conference series:

CCF Conference on Computer Engineering and Technology

725 Accesses

Abstract

In this paper, we proposed a parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format on the graphic processing units. By utilizing the tremendous memory bandwidth and the texture memory, the performance of this algorithm can be efficiently improved. Experimental results show that the performance of the proposed algorithm can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU. It also can be concluded that it is not wise to accelerate the transposition algorithm for the matrices in the ELLPACK-R format with violent divergence in the number of nonzero elements among the rows.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vazquez, F., Fernandez, J.J., Garzon, E.M.: A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency Comput.: Pract. Experimence. 23, 815–826 (2011)
Article Google Scholar
Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C.C., Sadayappan, P.: Efficient parallel out-of-core matrix transposition. Int. J. High Perform. Comput. Netw. 2, 110–119 (2004)
Article Google Scholar
Mateescu, G., Bauer, G.H., Fiedler, R.A.: Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the Second International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Seattle, 12-18, pp. 5–6 (2011)
Google Scholar
Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM. Trans. Math. Software. 4(3), 250–269 (1978)
Article MathSciNet MATH Google Scholar
Stathis, P., Cheresiz, D., Vassiliadis, S., Juurlink, B.: Sparse matrix transpose unit. In: Proceedings of the 18th International Parallel and Distribute Processing Symposium (IPDPS04) (2004)
Google Scholar
Weng, T.H., Batjargal, D., Pham, H., Hsieh, M.Y., Li, K.C.: Parallel matrix transposition and vector multiplication using OpenMP. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 243–249 (2013)
Chapter Google Scholar
Weng, T.H., Pham, H., Jiang, H., Li, K.C.: Designing parallel sparse matrix transposition algorithm using CSR for GPUs. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 251–257 (2013)
Chapter Google Scholar
Davis, T.: The University of Florida Sparse Matrix Collection. Technical report, University of Florida (2011)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Foundation of China under Grants 61402499 and 61202127, and the National High Technology Research and Development Program of China under Grants 2012AA012706.

Author information

Authors and Affiliations

National Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Deya Road 109#, Changsha, 410073, People’s Republic of China
Song Guo, Yong Dou, Yuanwu Lei & Qiang Wang
Electronic Engineering College, Naval University of Engineering, Wuhan, 430033, China
Fei Xia
Guangzhou Military Tactical Luzhai Base, Guangzhou, 510000, China
Jianning Chen

Authors

Song Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar
Yuanwu Lei
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Jianning Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Song Guo .

Editor information

Editors and Affiliations

National Univ Defense Tech, Changsha, China
Weixia Xu
National Univ. of Defense Technolog, Changsha, China
Liquan Xiao
School of Computer Science, National Univ. of Defense Technolog, Changsha, China
Jinwen Li
National Univ Defense Tech, Changsha, China
Chengyi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, S., Dou, Y., Lei, Y., Wang, Q., Xia, F., Chen, J. (2016). Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs. In: Xu, W., Xiao, L., Li, J., Zhang, C. (eds) Computer Engineering and Technology. NCCET 2015. Communications in Computer and Information Science, vol 592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49283-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-662-49283-3_7
Published: 14 January 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49282-6
Online ISBN: 978-3-662-49283-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)