Parallelization of sparse cholesky factorization on an SMP cluster
In this paper, we present parallel implementations of the sparse Cholesky factorization kernel in the SPLASH-2 programs to evaluate performance of a Pentium Pro based SMP cluster. Solaris threads and remote memory operations are utilized for intranode parallelism and internode communications, respectively. Sparse Cholesky factorization is a typical irregular application with a high communication to computation ratio and no global synchronization between steps. We efficiently parallelized using asynchronous message handling instead of lock-based mutual exclusion between nodes, because synchronization between nodes reduces the performance significantly. We also found that the mapping of processes to processors on an SMP cluster affects the performance especially when the communication latency can not be hidden.
KeywordsMutual Exclusion Sharing Pattern Task Queue Synchronization Overhead Asynchronous Message
Unable to display preview. Download preview PDF.
- 1.Y. Tanaka, et al, COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience, In Proceedings of IPPS/SPDP workshop on Personal Computers Based Networks of Workstations, pages 486–497, 1998.Google Scholar
- 2.Y. Tanaka, et al, Performance Improvement by Overlapping Computation and Communication on SMP Clusters, In Proceedings of the 1998 International Conference on Parallel and Distributed Processing Techniques and Applications, Vol. 1, pages 275–282, July 1998.Google Scholar
- 3.E. Rothberg and A. Gupta, An Efficient Block-Oriented Approach To Parallel Sparse Cholesky Factorization, In Proceedings of Supercomputing'93, pages 503–512, November 1993.Google Scholar
- 5.S. C. Woo, et al, The SPLASH-2 Programs: Characterization and Methodological Considerations, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 42–36, June 1995.Google Scholar
- 6.L. Iftode, J. P. Singh and K. Li, Understanding Application Performance on Shared Virtual Memory Systems, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.Google Scholar
- 7.C. Liao, et al, Monitoring Shared Virtual Memory Performance on a Myrinet-based PC Cluster, in Proceedings of the International Conference on Supercomputing, pages 251–258, July 1998.Google Scholar
- 8.D. J. Scales, K. Gharachorloo and C. A. Thekkath, Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory, In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 174–185, October 1996.Google Scholar
- 9.I. S. Duff, R. G. Grimes and J. G. Lewis, Sparse Matrix Test Problems, In ACM Transactions on Mathematical Software, Vol. 15, No. 1, pages 1–14, March 1989.Google Scholar