Case Study 1: Parallel Fast Givens QR Factorization
- 177 Downloads
Although parallel QR factorization has been the topic of much research, available parallel algorithms exhibit poor scalability characteristics on matrices with dimensions less than 3000. As a consequence, there is little flexibility to meet stringent latency constraints by manipulating the number of processors. This is particularly true of parallel algorithms based on block cyclic distribution schemes such as ScaLAPACK’s PDGEQRF (Choi et al., 1995; Blackford et al., 1997). Further compounding the problem of scalability is the fact that block cyclic distribution schemes are often not compatible with the data movement patterns of many applications. Note that some very recent work on efficient real time redistribution techniques promises to make these algorithms more attractive to high performance signal processing applications (Park et al., 1999; Petit and Dongarra, 1999).
KeywordsExecution Time Shared Memory Message Passing Hybrid Version Minimum Execution Time
Unable to display preview. Download preview PDF.