Fast block QR update in digital signal processing
The processing of digital sound signals often requires the computation of the QR factorization of a rectangular system matrix. However, sometimes, only a given (and probably small) part of the system matrix varies from the current sample to the next one. We exploit this fact to reuse some computations carried out to process the former sample in order to save execution time in the processing of the current sample. These savings can be critical for real-time applications running on low power consumption devices with high mobility. In addition, we propose a simple out-of-order task-parallel algorithm for the QR factorization using OpenMP that exploits the multicore capability of modern processors. Furthermore, in the presence of a Graphics Processing Unit (GPU) in the system, our algorithm is able to off-load some tasks to the GPU to accelerate the computation on these hardware devices.
KeywordsQR factorization QR update jagged Matrix Real time Block QR
This work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TEC2015-67387-C4-1-R and TIN2014-53495-R; and the Generalitat Valenciana PROMETEOII/2014/003.
- 1.Augonnet C, Thibault S, Namyst R (2010) StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Research Report RR-7240, INRIAGoogle Scholar
- 4.Chan E, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R (2007) Supermatrix out-of-order scheduling of matrix operations for smp and multi-core architectures. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’07. ACM, New York, pp 116–125Google Scholar
- 5.Chan E, Van Zee FG, Quintana-Ortí ES, Quintana-Ortí G, De Van Geijn R (2007) Satisfying your dependencies with supermatrix. In: Proceedings—2007 IEEE International Conference on Cluster Computing, CLUSTER 2007. pp 91–99Google Scholar
- 6.Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn RA (2008) Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: Chatterjee S, Scott ML (eds) PPOPP. ACM, New york, pp 123–132Google Scholar
- 7.Golub GH, Van Loan CF (2013) Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, BaltimoreGoogle Scholar
- 9.Joffrain T, Quintana-Ortí ES, van de Geijn RA (2004) Rapid development of high-performance out-of-core solvers. In: Applied Parallel Computing, State of the Art in Scientific Computing, 7th International Workshop, PARA 2004, Lyngby, Denmark, June 20–23, 2004, revised selected papers. pp 413–422Google Scholar
- 10.NVIDIA. The cuBLAS library. http://docs.nvidia.com/cuda/cublas. Accessed May 2017
- 11.Openblas. http://www.openblas.net. Accessed May 2017
- 13.The OmpSs Programming Model. https://pm.bsc.es/ompss. Accessed May 2017
- 14.Wende F, Steinke T, Cordes F (2014) Multi-threaded kernel offloading to gpgpu using hyper-q on kepler architecture. Technical Report 14-19, ZIB, Takustr.7, 14195 BerlinGoogle Scholar