Skip to main content

Fast Parallel Algorithms for Blocked Dense Matrix Multiplication on Shared Memory Architectures

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7439))

Abstract

The current trend of multicore and Symmetric Multi-Processor (SMP), architectures underscores the need for parallelism in most scientific computations. Matrix-matrix multiplication is one of the fundamental computations in many algorithms for scientific and numerical analysis. Although a number of different algorithms (such as Cannon, PUMMA, SUMMA etc), have been proposed for the implementation of matrix-matrix multiplication on distributed memory architectures, matrix-matrix algorithms for multicore and SMP architectures have not been extensively studied. We present two types of algorithms, based largely on blocked dense matrices, for parallel matrix-matrix multiplication on shared memory systems. The first algorithm is based on blocked matrices whiles the second algorithm uses blocked matrices with the MapReduce framework in shared memory. Our experimental results show that, our blocked dense matrix approach outperforms the known existing implementations by up to 50% whiles our MapReduce blocked matrix-matrix algorithm outperforms the existing matrix-matrix multiplication algorithm of the Phoenix shared memory MapReduce approach, by about 40%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cannon, L.E.: A cellular computer to implement the kalman filter algorithm. PhD thesis, Montana State University (1969)

    Google Scholar 

  2. Choi, J., Dongarra, J., Walker, D.: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience 6(7), 543–570 (1994)

    Article  Google Scholar 

  3. van de Geijn, R.A., Watts, J.: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9(4), 255–274 (1997)

    Article  Google Scholar 

  4. Krishnan, M., Nieplocha, J.: Srumma: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems. In: Proceedings of Parallel and Distributed Processing Symposium (2004)

    Google Scholar 

  5. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: Proc. of the 13th Int’l Symposium on High Performance Computer Architecture, pp. 13–24 (2007)

    Google Scholar 

  6. Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In: Proc. of the 2009 IEEE Int’l Symposium on Workload Characterization, pp. 198–207 (2009)

    Google Scholar 

  7. Dean, J., Ghemawat, J.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symp. on Operating Systems Design and Implementation (2004)

    Google Scholar 

  8. Blackford, L., Choi, J., Cleary, A., DAzevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.: Scalapack users guide. SIAM, Philadelphia (1997)

    Book  MATH  Google Scholar 

  9. Anderson, E., Bai, Z., Bischof, C., Blackford, L., Demmel, J., Dongarra, J., Hammarling, S., Croz, J., Greenbaum, A., McKenney, A., Sorensen, D.: Lapack users guide. SIAM, Philadelphia (1992)

    MATH  Google Scholar 

  10. Jakub, K., Ltaief, H., Dongarra, J., Badia, R.: Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation: Practice and Experience 22(1) (2010)

    Google Scholar 

  11. Bentz, J.L., Kendall, R.A.: Parallelization of General Matrix Multiply Routines Using OpenMP. In: Chapman, B.M. (ed.) WOMPAT 2004. LNCS, vol. 3349, pp. 1–11. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Hackenberg, D., Schöne, R., Nagel, W.E., Pflüger, S.: Optimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 145–154. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geiju, R., Wu, J.: Plapack: Parallel linear algebra package. In: Proceedings of the SIAM Parallel Processing Conference (1997)

    Google Scholar 

  14. Strassen, V.: Guassian elimination is not optimal. Numerische Mathematick 14(3), 354–356 (1969)

    Article  MathSciNet  Google Scholar 

  15. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of Symbolic Computing 9, 251–280 (1990)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nimako, G., Otoo, E.J., Ohene-Kwofie, D. (2012). Fast Parallel Algorithms for Blocked Dense Matrix Multiplication on Shared Memory Architectures. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33078-0_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33077-3

  • Online ISBN: 978-3-642-33078-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics