Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication

Bader, Michael

doi:10.1007/978-3-540-85451-7_85

Michael Bader¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5168))

Included in the following conference series:

European Conference on Parallel Processing

867 Accesses
20 Citations

Abstract

The present work studies an approach to exploit the locality properties of an inherently cache-efficient algorithm for matrix multiplication in a parallel implementation. The algorithm is based on a blockwise element layout and an execution order that are derived from a Peano space-filling curve. The strong locality properties induced in the resulting algorithm motivate a parallel algorithm that replicates matrix blocks in local caches that will prefetch remote blocks before they are used. As a consequence, the block size for matrix multiplication and the cache sizes, and hence the granularity of communication, can be chosen independently. The influence of these parameters on parallel efficiency is studied on a compute cluster with 128 processors. Performance studies show that the largest influence on performance stems from the size of the local caches, which makes the algorithm an interesting option for all situations where memory is scarce, or where existing cache hierarchies can be exploited (as in future manycore environments, e.g.).

Download to read the full chapter text

Chapter PDF

FooPar: A Functional Object Oriented Parallel Framework in Scala

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bader, M., Zenger, C.: Cache oblivious matrix multiplication using an element ordering based on a Peano curve. Linear Algebra Appl. 417(2–3) (2006)
Google Scholar
Bader, M., Franz, R., Guenther, S., Heinecke, A.: Hardware-oriented Implementation of Cache Oblivious Matrix Operations Based on Space-filling Curves. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967. Springer, Heidelberg (2008)
Google Scholar
Choi, J., Dongarra, J.J., Walker, D.W.: PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers. Concurrency: Practice and Experience 6(7) (1994)
Google Scholar
Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1) (2004)
Google Scholar
van de Geijn, R., Watts, J.: SUMMA: Scalable Universal Matrix Multiplication Algorithm. Concurrency: Practice and Experience 9(4) (1997)
Google Scholar
Heinecke, A., Bader, M.: Parallel Matrix Multiplication based on Space-filling Curves on Shared Memory Multicore Platforms. In: Proc. 2008 Computing Frontiers Conf. and co-located workshops: MAW 2008 & WREFT 2008, Ischia (2008)
Google Scholar
Krishnan, M., Nieplocha, J.: SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems. In: Proc. of the 18th Int. Parallel and Distributed Processing Symposium (IPDPS 2004) (2004)
Google Scholar
Nieplocha, J., Carpenter, B.: ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems. In: Proc. of RTSPP IPPS/SDP (1999)
Google Scholar
Nieplocha, J., Palmer, B., Tipparaju, V., Krishnan, M., Trease, H., Apra, E.: Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit. Int. J. of High Perf. Comp. Appl. 20(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, Technische Universität München, Germany
Michael Bader

Authors

Michael Bader
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Emilio Luque Tomàs Margalef Domingo Benítez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bader, M. (2008). Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_85

Download citation

DOI: https://doi.org/10.1007/978-3-540-85451-7_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication

Abstract

Chapter PDF

Similar content being viewed by others

FooPar: A Functional Object Oriented Parallel Framework in Scala

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication

Abstract

Chapter PDF

Similar content being viewed by others

FooPar: A Functional Object Oriented Parallel Framework in Scala

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation