Advertisement

Parallelization of sparse cholesky factorization on an SMP cluster

  • Shigehisa Satoh
  • Kazuhiro Kusano
  • Yoshio Tanaka
  • Motohiko Matsuda
  • Mitsuhisa Sato
Track C2: Computational Science
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1593)

Abstract

In this paper, we present parallel implementations of the sparse Cholesky factorization kernel in the SPLASH-2 programs to evaluate performance of a Pentium Pro based SMP cluster. Solaris threads and remote memory operations are utilized for intranode parallelism and internode communications, respectively. Sparse Cholesky factorization is a typical irregular application with a high communication to computation ratio and no global synchronization between steps. We efficiently parallelized using asynchronous message handling instead of lock-based mutual exclusion between nodes, because synchronization between nodes reduces the performance significantly. We also found that the mapping of processes to processors on an SMP cluster affects the performance especially when the communication latency can not be hidden.

Keywords

Mutual Exclusion Sharing Pattern Task Queue Synchronization Overhead Asynchronous Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Y. Tanaka, et al, COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience, In Proceedings of IPPS/SPDP workshop on Personal Computers Based Networks of Workstations, pages 486–497, 1998.Google Scholar
  2. 2.
    Y. Tanaka, et al, Performance Improvement by Overlapping Computation and Communication on SMP Clusters, In Proceedings of the 1998 International Conference on Parallel and Distributed Processing Techniques and Applications, Vol. 1, pages 275–282, July 1998.Google Scholar
  3. 3.
    E. Rothberg and A. Gupta, An Efficient Block-Oriented Approach To Parallel Sparse Cholesky Factorization, In Proceedings of Supercomputing'93, pages 503–512, November 1993.Google Scholar
  4. 4.
    A. Gupta, G. Karypis, and V. Kumar, Highly Scalable Parallel Algorithms for Sparse Matrix Factorization, IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 5, pages 502–520, May 1997.CrossRefGoogle Scholar
  5. 5.
    S. C. Woo, et al, The SPLASH-2 Programs: Characterization and Methodological Considerations, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 42–36, June 1995.Google Scholar
  6. 6.
    L. Iftode, J. P. Singh and K. Li, Understanding Application Performance on Shared Virtual Memory Systems, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.Google Scholar
  7. 7.
    C. Liao, et al, Monitoring Shared Virtual Memory Performance on a Myrinet-based PC Cluster, in Proceedings of the International Conference on Supercomputing, pages 251–258, July 1998.Google Scholar
  8. 8.
    D. J. Scales, K. Gharachorloo and C. A. Thekkath, Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory, In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 174–185, October 1996.Google Scholar
  9. 9.
    I. S. Duff, R. G. Grimes and J. G. Lewis, Sparse Matrix Test Problems, In ACM Transactions on Mathematical Software, Vol. 15, No. 1, pages 1–14, March 1989.Google Scholar

Copyright information

© Springer-Verlag 1999

Authors and Affiliations

  • Shigehisa Satoh
    • 1
  • Kazuhiro Kusano
    • 1
  • Yoshio Tanaka
    • 1
  • Motohiko Matsuda
    • 1
  • Mitsuhisa Sato
    • 1
  1. 1.Real World Computing PartnershipTsukuba Research CenterTsukuba, IbarakiJapan

Personalised recommendations