Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs

  • Song GuoEmail author
  • Yong Dou
  • Yuanwu Lei
  • Qiang Wang
  • Fei Xia
  • Jianning Chen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 592)


In this paper, we proposed a parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format on the graphic processing units. By utilizing the tremendous memory bandwidth and the texture memory, the performance of this algorithm can be efficiently improved. Experimental results show that the performance of the proposed algorithm can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU. It also can be concluded that it is not wise to accelerate the transposition algorithm for the matrices in the ELLPACK-R format with violent divergence in the number of nonzero elements among the rows.


Parse matrix transposition ELLPACK-R Graphic processing units 



This work was supported by the National Science Foundation of China under Grants 61402499 and 61202127, and the National High Technology Research and Development Program of China under Grants 2012AA012706.


  1. 1.
    Vazquez, F., Fernandez, J.J., Garzon, E.M.: A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency Comput.: Pract. Experimence. 23, 815–826 (2011)CrossRefGoogle Scholar
  2. 2.
    Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C.C., Sadayappan, P.: Efficient parallel out-of-core matrix transposition. Int. J. High Perform. Comput. Netw. 2, 110–119 (2004)CrossRefGoogle Scholar
  3. 3.
    Mateescu, G., Bauer, G.H., Fiedler, R.A.: Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the Second International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Seattle, 12-18, pp. 5–6 (2011)Google Scholar
  4. 4.
    Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM. Trans. Math. Software. 4(3), 250–269 (1978)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Stathis, P., Cheresiz, D., Vassiliadis, S., Juurlink, B.: Sparse matrix transpose unit. In: Proceedings of the 18th International Parallel and Distribute Processing Symposium (IPDPS04) (2004)Google Scholar
  6. 6.
    Weng, T.H., Batjargal, D., Pham, H., Hsieh, M.Y., Li, K.C.: Parallel matrix transposition and vector multiplication using OpenMP. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 243–249 (2013)CrossRefGoogle Scholar
  7. 7.
    Weng, T.H., Pham, H., Jiang, H., Li, K.C.: Designing parallel sparse matrix transposition algorithm using CSR for GPUs. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 251–257 (2013)CrossRefGoogle Scholar
  8. 8.
    Davis, T.: The University of Florida Sparse Matrix Collection. Technical report, University of Florida (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Song Guo
    • 1
    Email author
  • Yong Dou
    • 1
  • Yuanwu Lei
    • 1
  • Qiang Wang
    • 1
  • Fei Xia
    • 2
  • Jianning Chen
    • 3
  1. 1.National Laboratory for Parallel and Distribution ProcessingNational University of Defense TechnologyChangshaPeople’s Republic of China
  2. 2.Electronic Engineering CollegeNaval University of EngineeringWuhanChina
  3. 3.Guangzhou Military Tactical Luzhai BaseGuangzhouChina

Personalised recommendations