Abstract
In this chapter, we propose two parallel algorithms for sparse matrix transposition and vector multiplication using CSR format: with and without actual matrix transposition. Both algorithms are parallelized using OpenMP. Experimentations are run on a quad-core Intel Xeon64 CPU E5507. We measure and compare the performance of our algorithms with that of using CSB scheme. Our experimental results show that actual matrix transposition algorithm is comparable to the CSB-based algorithm; on the other hand, direct sparse matrix-transpose-vector multiplication using CSR significantly outperforms CSB-based algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kotakemori H, Hasegawa H, Kajiyama T, Nukada A, Suda R, Nishida A (2008) Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700. In: Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming, LNCS, West Lafayette, 12–14 May 2008, vol 5315, pp 153–163
Godwin J, Holewinski J, Sadayappan P (2012) High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In: Proceedings of the 5th annual workshop on general purpose processing with graphics processing units, New York, 2012, pp 47–56
Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the 21th annual symposium on parallelism in algorithms and architectures, Calgary, 11–13 Aug 2009, pp 233–244
Mateescu G, Bauer GH, Fiedler RA (2011) Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the second international workshop on performance modeling, benchmarking and simulation of high performance computing systems, Seattle, 12–18 Nov 2011, pp 5–6
Gustavson FG (1978) Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans Math Softw 4(3):250–269
OpenMP Architecture Review Board, Fortran 2.0 and C/C++ 1.0 specifications. Available at: www.openmp.org
Cilk Arts, Inc. (2009) Cilk++ Programmer’s Guide. Cilk Arts, Inc., Burlington. Available at: http://www.cilk.com
Buluç A (2011) Parallel SpMV and SpMVT using CSB, research software. Available at: http://gauss.cs.ucsb.edu/~aydin/software.html
Li K-C, Weng T-H (2009) Performance-based parallel application toolkit for high-performance clusters. J Supercomput 48(1):43–65
Davis TA (1994) University of Florida sparse matrix collection. NA Digest 92
Li K-C, Chang H-C (2007) The design and implementation of Visuel performance monitoring and analysis toolkit for cluster and grid environments. J Supercomput 40(3):299–317
Acknowledgements
This chapter is based upon work supported in part by Taiwan National Science Council (NSC) grants no. NSC101-2221-E-126-002– and Delta Electronics. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSC or Delta Electronics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Weng, TH., Batjargal, D., Pham, H., Hsieh, MY., Li, KC. (2013). Parallel Matrix Transposition and Vector Multiplication Using OpenMP. In: Juang, J., Huang, YC. (eds) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol 234. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6747-2_30
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6747-2_30
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6746-5
Online ISBN: 978-1-4614-6747-2
eBook Packages: EngineeringEngineering (R0)