A Distributed CPU-GPU Sparse Direct Solver

Sao, Piyush; Vuduc, Richard; Li, Xiaoye Sherry

doi:10.1007/978-3-319-09873-9_41

Piyush Sao¹⁶,
Richard Vuduc¹⁶ &
Xiaoye Sherry Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2968 Accesses
21 Citations

Abstract

This paper presents the first hybrid MPI+OpenMP+CUDA implementation of a distributed memory right-looking unsymmetric sparse direct solver (i.e., sparse LU factorization) that uses static pivoting. While BLAS calls can account for more than 40% of the overall factorization time, the difficulty is that small problem sizes dominate the workload, making efficient GPU utilization challenging. This fact motivates our approach, which is to find ways to aggregate collections of small BLAS operations into larger ones; to schedule operations to achieve load balance and hide long-latency operations, such as PCIe transfer; and to exploit simultaneously all of a node’s available CPU cores and GPUs.

Download to read the full chapter text

Chapter PDF

A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators

Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

IPM : Integrated performance monitoring, http://ipm-hpc.sourceforge.net/ (accessed: January 26, 2014)
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38(1), 1 (2011)
MathSciNet Google Scholar
George, T., Saxena, V., Gupta, A., Singh, A., Choudjury, A.: Multifrontal factorization of sparse spd matrices on GPUs. In: Proc. of IEEE International Parallel and Distributed Processing Symposium (IPDPS 2011), Anchorage, Alaska, May 16-20 (2011)
Google Scholar
Krawezik, G., Poole, G.: Accelerating the ANSYS direct sparse solver with GPUs. In: Proc. Symposium on Application Accelerators in High Performance Computing (SAAHPC). Urbana-Champaign, IL (2009), http://saahpc.ncsa.illinois.edu/09
Li, X.S., Demmel, J.W.: SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Mathematical Software 29(2), 110–140 (2003)
Article MathSciNet MATH Google Scholar
Lucas, R.F., Wagenbreth, G., Davis, D.M., Grimes, R.: Multifrontal computations on GPUs and their multi-core hosts. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 71–82. Springer, Heidelberg (2011), http://vecpar.fe.up.pt/2010/papers/5.php
Chapter Google Scholar
Sao, P., Vuduc, R., Li, X.: A distributed CPU-GPU sparse direct solver. Technical report, Georgia Institute of technology (2014)
Google Scholar
Schenk, O., Christen, M., Burkhart, H.: Algorithmic performance studies on graphics processing units. J. Parallel and Distributed Computing 68(10), 1360–1369 (2008)
Article Google Scholar
Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M., Shringarpure, A.: On the limits of GPU acceleration. In: Proc. of the 2nd USENIX Conference on Hot Topics in Parallelism, HotPar 2010, Berkeley, CA (2010)
Google Scholar
Yamazaki, I., Li, X.S.: New scheduling strategies and hybrid programming for a parallel right-looking sparse LU factorization algorithm on multicore cluster systems. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 619–630. IEEE (2012)
Google Scholar
Yeralan, S.N., Davis, T., Ranka, S.: Sparse QR factorization on gpu architectures. Technical report, University of Florida (November 2013)
Google Scholar
Yu, C.D., Wang, W., Pierce, D.: A CPU-GPU hybrid approach for the unsymmetric multifrontal method. Parallel Computing 37, 759–770 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Georgia Institute of Technology, USA
Piyush Sao & Richard Vuduc
Lawrence Berkeley National Laboratory, USA
Xiaoye Sherry Li

Authors

Piyush Sao
View author publications
You can also search for this author in PubMed Google Scholar
Richard Vuduc
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye Sherry Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sao, P., Vuduc, R., Li, X.S. (2014). A Distributed CPU-GPU Sparse Direct Solver. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Distributed CPU-GPU Sparse Direct Solver

Abstract

Chapter PDF

Similar content being viewed by others

A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators

Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Distributed CPU-GPU Sparse Direct Solver

Abstract

Chapter PDF

Similar content being viewed by others

A Randomized LU-based Solver Using GPU and Intel Xeon Phi Accelerators

Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation