Parallel ILU preconditioners in GPU computation
- 197 Downloads
Accelerating large-scale linear solvers is always crucial for scientific research and industrial applications. In this regard, preconditioners play a key role in improving the performance of iterative linear solvers. This paper presents a summary and review of our work about the development of parallel ILU preconditioners on GPUs. The mechanisms of ILU(0), ILU(k), ILUT, enhanced ILUT, and block-wise ILU(k) are reviewed and analyzed, which give a clear guidance in the development of iterative linear solvers. ILU(0) is the most commonly used preconditioner, and the nonzero pattern of its matrix is exactly the same as the original matrix to be solved. ILU(k) uses k levels to control the pattern of its preconditioner matrix. ILUT selects entries for its preconditioner matrix by setting thresholds without considering its original matrix pattern. In addition to point-wise ILU preconditioners, a block-wise ILU(k) preconditioner is designed delicately in support of block-wise matrices. In implementation, the RAS (Restricted Additive Schwarz) method is adopted to optimize the parallel structure of a preconditioner matrix. Coupling with the configuration parameters of ILU preconditioners, a complex situation appears in the parallel solution process, so decoupled algorithms are adopted. These algorithms are implemented and tested on NVIDIA GPUs. The experiment results show that a single-GPU implementation can speed up an ILU preconditioner by a factor of 10, compared to traditional CPU implementation. The results also show that the ILU(0) has better speedup than ILU(k) but slower convergence than ILU(k). Level k of ILU(k) and threshold (p, t) of ILUT are effective adjustment factors for controlling the equilibrium point between acceleration and convergence for ILU(k) and ILUT, respectively. All these ILU preconditioners are characterized and compared in this work, which shows a clear picture and numerical insights for practitioners in the ILU family.
KeywordsILU Block-wise matrix Parallel computing GPU Preconditioner
The support of Department of Chemical and Petroleum Engineering, University of Calgary and Reservoir Simulation Group, is gratefully acknowledged. The research is partly supported by NSERC/AIEE/Foundation CMG and AITF Chairs.
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
- Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report, NVR-2008-004, NVIDIA CorporationGoogle Scholar
- Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the supercomputingGoogle Scholar
- Bell N, Dalton S, Olson L (2011) Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods, NVIDIA Technical Report NVR-2011-002Google Scholar
- Cao H, Tchelepi HA, Wallis JR, Yardumian HE (2005) Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE annual technical conference and exhibitionGoogle Scholar
- Chen Z, Huan G, Ma Y (2006) Computational methods for multiphase flows in porous media. In: The computational science and engineering series, vol 2. SIAM, PhiladelphiaGoogle Scholar
- Chen Z, Liu H, Yang B (2013a) Parallel triangular solvers on GPU. In: Proceedings of international workshop on data-intensive scientific discovery (DISD), Shanghai University, Shanghai, ChinaGoogle Scholar
- Chen Z, Liu H, Yu S (2013b) Development of algebraic multigrid solvers using GPUs, SPE-163661-MS. In: SPE reservoir simulation symposium, 18–20 February. The Woodlands, TX, USAGoogle Scholar
- Chen Y, Liu H, Wang K, Chen Z, Zhang P (2016) Large-scale reservoir simulations on parallel computers. In: Proceedings of the 2nd IEEE international conference on high performance and smart computing (HPSC 2016), New York, NY, April 9–10. doi: 10.1109/BigDataSecurity-HPSC-IDS.2016.20
- Davis TA (1994) University of Florida sparse matrix collection, NA digestGoogle Scholar
- Haase G, Liebmann M, Douglas CC, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units, high performance computing and applications, pp 38–47Google Scholar
- Heuveline et al. V (2011) Enhanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs, The Power(q)-pattern Method, EMCL Preprint 2011-08Google Scholar
- Hu X, Liu W, Qin G, Xu J, Yan Y, Zhang C (2011) Development of a fast auxiliary subspace pre-conditioner for numerical reservoir simulators. In: SPE reservoir characterisation and simulation conference and exhibition, 9C11 October, Abu Dhabi, UAE, SPE-148388-MSGoogle Scholar
- Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, ISBN: 978-0-12-381472-2Google Scholar
- Klie H, Sudan H, Li R, Saad Y (2011) Exploiting capabilities of many core platforms in reservoir simulation. In: SPE RSS reservoir simulation symposium, 21–23 FebruaryGoogle Scholar
- Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers, Technical Report umsi-2010-112. University of Minnesota, Minneapolis, MN, Minnesota Supercomputer InstituteGoogle Scholar
- Liu H, Yang B, Chen Z (2015) Accelerating the GMRES solver with block ILU (K) preconditioner on GPUs in reservoir simulation. J Geol Geosci 4:199. doi: 10.4172/2329-6755.1000199
- Liu H, Zhang P, Wang K, Yang B, Chen Z (2016c) Performance and scalability analysis for parallel reservoir simulations on three supercomputer architectures. In: Proceedings of the 2016 XSEDE conference: diversity, big data, & science at scale, Miami, FL, USA. doi: 10.1145/2949550.2949577
- Lukarski D, Anzt H, Tomov S, Dongarra J (2014) Multi-elimination ILU preconditioners on GPUs, Technical report UT-CS-14-723. University of Tennessee, Innovative Computing LaboratoryGoogle Scholar
- NVIDIA Corporation (2008) CUSP: generic parallel algorithms for sparse matrix and graph. http://code.google.com/p/cusp-library/
- NVIDIA Corporation (2010) Nvidia CUDA programming guide (version 3.2)Google Scholar
- NVIDIA Developer Zone (2008) http://developer.nvidia.com/about-cuda
- NVIDIA Official Website (2008) http://www.nvidia.com/object/cuda_home_new.html
- Vinsome PKW (1976) An iterative method for solving sparse sets of simultaneous linear equations. In: SPE symposium on numerical simulation of reservoir performance, Los Angeles, CAGoogle Scholar
- Yang B, Liu H, Chen Z (2016) GPU-accelerated preconditioned GMRES solver. In: The 2nd IEEE international conference on high performance and smart computing, IEEE HPSC 2016, 8–10 April, Columbia University, New York, USAGoogle Scholar
- Yuan C, Xia Z, Sun X (2017) Coverless image steganography based on SIFT and BOF. J Internet Technol 18(2):209–216Google Scholar