Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems

Genet, Damien; Guermouche, Abdou; Bosilca, George

doi:10.1007/978-3-319-14313-2_29

Damien Genet³⁴,
Abdou Guermouche³⁵ &
George Bosilca³⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8806))

Included in the following conference series:

European Conference on Parallel Processing

1731 Accesses
3 Citations

Abstract

Traditionally, numerical simulations based on finite element methods consider the algorithm as being divided in three major steps: the generation of a set of blocks and vectors, the assembly of these blocks in a matrix and a big vector, and the inversion of the matrix. In this paper we tackle the second step, the block assembly, where no parallel algorithm is widely available. Several strategies are proposed to decompose the assembly problem while relying on a scheduling middle-ware to maximize the overlap between stages and increase the parallelism and thus the performance. These strategies are quantified using examples covering two extremes in the field, large number of non-overlapping small blocks for CFD-like problems, and a smaller number of larger blocks with significant overlap which can be met in sparse linear algebra solvers.

Download to read the full chapter text

Chapter PDF

Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication

Programming Heterogeneous Architectures Using Hierarchical Tasks

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agullo, E., Bramas, B., Coulaud, O., Darve, E., Messner, M., Takahashi, T.: Task-based fmm for multicore architectures. SIAM SISC 36(1) (2014)
Google Scholar
Agullo, E., Buttari, A., Guermouche, A., Lopez, F.: Multifrontal QR factorization for multicore architectures over runtime systems. In: Euro-Par 2013 Parallel Processing - 19th International Conference, pp. 521–532 (2013)
Google Scholar
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics 180(1) (2009)
Google Scholar
Amestoy, P.R., Guermouche, A., L’Excellent, J.-Y., Pralet, S.: Hybrid scheduling for the parallel solution of linear systems. Parallel Computing 32(2), 136–156 (2006)
Article MathSciNet Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience 23, 187–198 (2011)
Article Google Scholar
Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurrency and Computation: Practice and Experience 21(18) (2009)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J.: Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach. In: Scalable Computing and Communications: Theory and Practice (2013)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: Exploiting heterogeneity to enhance scalability. Computing in Science and Engineering 15(6), 36–45 (2013)
Article Google Scholar
Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. Int. J. for Numerical Methods in Engineering 85(5), 640–669 (2011)
Article MATH Google Scholar
Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear systems. ACM Transactions on Mathematical Software 9, 302–325 (1983)
Article MathSciNet MATH Google Scholar
Fu, Z., Lewis, T.J., Kirby, R.M., Whitaker, R.T.: Architecting the finite element method pipeline for the GPU. Journal of Computational and Applied Mathematics 257, 195–211 (2014)
Article MathSciNet MATH Google Scholar
Hanzlikova, N., Rodrigues, E.R.: A novel finite element method assembler for co-processors and accelerators. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, ACM, NY (2013)
Google Scholar
Huthwaite, P.: Accelerated finite element elastodynamic simulations using the GPU. Journal of Computational Physics 257(pt. A), 687–707 (2014)
Article MathSciNet Google Scholar
Lacoste, X., Faverge, M., Ramet, P., Thibault, S., Bosilca, G.: Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. Rapport de recherche RR-8446, INRIA (January 2014)
Google Scholar
Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi-core and many-core architectures. International Journal for Numerical Methods in Fluids 71(1), 80–97 (2013)
Article MathSciNet Google Scholar
Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Bordeaux, France
Damien Genet
INRIA, LaBRI, Univ. Bordeaux, Bordeaux, France
Abdou Guermouche
University of Tennessee, Knoxville, USA
George Bosilca

Authors

Damien Genet
View author publications
You can also search for this author in PubMed Google Scholar
Abdou Guermouche
View author publications
You can also search for this author in PubMed Google Scholar
George Bosilca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Luís Lopes
Vilnius University, 08663, Vilnius, Lithuania
Julius Žilinskas
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan
Inria, Campus Universitaire de Beaulieu, 35042, Rennes, France
Roberto G. Cascella
MTA SZTAKI, Budapest, Hungary
Gabor Kecskemeti
Inria, LaBRI, France
Emmanuel Jeannot
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
University of Pisa, Italy
Laura Ricci
Faculty of Computer Science, University of Vienna, Wien, Austria
Siegfried Benkner
Universitat Politècnica de València, Spain
Salvador Petit
ISISLab - Dipartimento di Informatica, Università di Salerno, Italy
Vittorio Scarano
High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70550, Stuttgart, Germany
José Gracia
Vienna University of Technology, 1040, Vienna, Austria
Sascha Hunold
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
RWTH Aachen University, Aachen, Germany
Stefan Lankes
Department of Informatics and Mathematics, University of Passau, Germany
Christian Lengauer
Universidad Carlos III de Madrid, 28911, Leganés, Spain
Jesús Carretero
TU München, 85747, Garching bei München, Germany
Jens Breitbart
TU Vienna, 1040, Vienna, Austria
Michael Alexander

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Genet, D., Guermouche, A., Bosilca, G. (2014). Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-14313-2_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication

Programming Heterogeneous Architectures Using Hierarchical Tasks

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication

Programming Heterogeneous Architectures Using Hierarchical Tasks

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation