Abstract
In this study, for solving the three-dimensional partial differential equation u t = u xx + u yy + u zz , an efficient parallel method based on the modified incomplete Cholesky preconditioned conjugate gradient algorithm (MICPCGA) on the GPU is presented. In our proposed method, for this case, we overcome the drawbacks that the MIC preconditioner is generally difficult to be parallelized on the GPU due to the forward/backward substitutions, and thus present an efficient parallel implementation method on the GPU. Moreover, a vector kernel for the sparse matrix-vector multiplication, and optimization of vector operations by grouping several vector operations into a single kernel are adopted. Numerical results show that our proposed forward/backward substitutions and MICPCGA on the GPU both can achieve a significant speedup, and compared to an approximate inverse SSOR preconditioned conjugate gradient algorithm (SSORPCGA), our proposed MICPCGA obtains a bigger speedup, and outperforms it in solving the three-dimensional partial differential equation.
The research has been supported by the Chinese Natural Science Foundation under grant number 61202049 and the Natural Science Foundation of Zhejiang Province, China under grant number LY12A01027.
Chapter PDF
References
NVIDIA Corporation: Cuda programming guide 2.3. Technical Report, NVIDIA (2009)
Buatois, L., Caumon, G.: Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int. J. Parallel Emergent Distrib. Syst. 24(3), 205–223 (2009)
Helfenstein, R., Koko, J.: Parallel preconditioned conjugate gradient algorithm on GPU. J. Comput. Appl. Math. 236(15), 3584–3590 (2012)
Gravvanis, G.A., Filelis-Papadopoulos, C.K., Giannoutakis, K.M.: Solving finite difference linear systems on GPUs: CUDA parallel explicit preconditioned biconjugate conjugate gradient type methods. J. Supercomput. 61(3), 590–604 (2012)
Galiano, V., Migallón, H., Migallón, V.: GPU-based parallel algorithms for sparse nonlinear systems. J. Parallel Distrib. Comput. 72(9), 1098–1105 (2012)
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technique report, NVIDIA (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gao, J., Li, B., He, G. (2013). Modified Incomplete Cholesky Preconditioned Conjugate Gradient Algorithm on GPU for the 3D Parabolic Equation. In: Hsu, CH., Li, X., Shi, X., Zheng, R. (eds) Network and Parallel Computing. NPC 2013. Lecture Notes in Computer Science, vol 8147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40820-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-40820-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40819-9
Online ISBN: 978-3-642-40820-5
eBook Packages: Computer ScienceComputer Science (R0)