Implementation of a Parallel Sparse Direct Solver on Vector Architecture

Suzuki, Atsushi; Roux, François-Xavier

doi:10.1007/978-3-319-46735-1_8

Implementation of a Parallel Sparse Direct Solver on Vector Architecture

Atsushi Suzuki⁶ &
François-Xavier Roux⁷

Conference paper
First Online: 02 December 2016

277 Accesses

Abstract

Linear systems with large sparse matrices are solved in finite element analysis of elasticity and/or fluid problems. Thanks to development of graph partitioning software, it becomes feasible to extract dense sub-matrices efficiently with minimizing fill-in during factorization. By analyzing task dependency of block factorization of dense matrix, multi-cores of CPUs which share the main memory are used in parallel and asynchronously. The tasks in dense sub-matrices consist of BLAS level 3 kernels which efficiently use arithmetic capabilities of modern super-scalar CPU with large cache memory and also of modern vector CPU. BLAS level 3 kernels can also efficiently use vector architecture, without writing any directives for explicit vectorization in the code. Nevertheless, the sparse part still remains in factorization process. Although it is only a small fraction of the whole process and almost negligible on the super-scalar CPU, its optimization is important on vector architecture due to short vector loop.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

George, A.: Numerical experiments using dissection methods to solve n by n grid problems. SIAM J. Numer. Anal. 14, 161–179 (1977). doi:10.1137/0714011
Article MathSciNet MATH Google Scholar
Amestoy, P.R., Duff, I.S., L’Excellent, J.-Y.: Multifrontal parallel distributed symmetric and unsymmetirc solvers. Comput. Meth. Appl. Mech. Eng. 184, 501–520 (2000). doi:10.1016/S0045-7825(99)00242-X
Article MATH Google Scholar
Farhat, C., Roux, F.-X.: Implicit parallel processing in structural mechanics. Comput. Mech. Adv. 2, 1–124 (1994)
Article MathSciNet MATH Google Scholar
Mandel, J.: Balancing domain decomposition. Commun. Numer. Meth. Eng. 9, 233–241 (1993). doi:10.1002/cnm.1640090307
Article MathSciNet MATH Google Scholar
Suzuki, A., Roux, F.-X.: A dissection solver with kernel detection for symmteric finite element martices on shared memory computers. Int. J. Numer. Meth. Eng. 100, 136–164 (2014). doi:10.1002/nme.4729
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998). doi:10.1137/S1064827595287997
Article MathSciNet MATH Google Scholar
Pellegrini, F., Roman, J., Amestoy, P. : Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering. Concurr.: Pract. Experience 12, 69–84 (2000)
Google Scholar
George, A., Liu, J.W.H.: Algorithms for matrix partitioning and the numerical solution of finite element systems. SIAM J. Numer. Anal. 15, 297–327 (1978). doi:10.1137/0715021
Article MathSciNet MATH Google Scholar
Lewis, B., Berg, D.J.: Multithreaded Programming with Pthreads. Sun Microsystems Press (1998)
Google Scholar
Web site of Soctch and PT-Scotch. https://www.labri.fr/perso/pelegrin/scotch. Accessed 9 Sep 2016
Web site of Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl. Accessed 9 Sep
Schenk, O., Gärtner, K.: On fast factorization pivoting methods for sparse symmetric indefinite systems. Electron. Trans. Numer. Anal. 23, 158–179 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” in Japan. Computational time for Cray XC30 in Institute for Information Management and Communication, Kyoto University, and for NEC SX-ACE in Cybermedia Center, Osaka University, are provided by this grant.

Author information

Authors and Affiliations

Cybermedia Center, Osaka University, Machikaneyama, Toyonaka, Osaka, 560-0043, Japan
Atsushi Suzuki
LJLL, UPMC (Paris 6)/ONERA, 4 place Jussieu, 75005, Paris, France
François-Xavier Roux

Authors

Atsushi Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
François-Xavier Roux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atsushi Suzuki .

Editor information

Editors and Affiliations

Höchstleistungsrechenzentrum, Universität Stuttgart , Stuttgart, Baden-Württemberg, Germany
Michael M. Resch
Europe GmbH, NEC High Performance Computing Europe GmbH, Düsseldorf, Nordrhein-Westfalen, Germany
Wolfgang Bez
Europe GmbH, NEC High Performance Computing Europe GmbH, Stuttgart, Germany
Erich Focht
High Performance Computing, University of Stuttgart, Stuttgart, Germany
Nisarg Patel
Cyberscience Center, Tohoku University , Sendai, Japan
Hiroaki Kobayashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, A., Roux, FX. (2016). Implementation of a Parallel Sparse Direct Solver on Vector Architecture. In: Resch, M., Bez, W., Focht, E., Patel, N., Kobayashi, H. (eds) Sustained Simulation Performance 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-46735-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-46735-1_8
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46734-4
Online ISBN: 978-3-319-46735-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics