Fast block QR update in digital signal processing

Alventosa, Fran J.; Alonso, Pedro; Vidal, Antonio M.; Piñero, Gema; Quintana-Ortí, Enrique S.

doi:10.1007/s11227-018-2298-5

Fast block QR update in digital signal processing

Published: 13 March 2018

Volume 75, pages 1051–1064, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Fran J. Alventosa¹,
Pedro Alonso ORCID: orcid.org/0000-0002-6882-6592¹,
Antonio M. Vidal¹,
Gema Piñero² &
…
Enrique S. Quintana-Ortí³

287 Accesses
2 Citations
Explore all metrics

Abstract

The processing of digital sound signals often requires the computation of the QR factorization of a rectangular system matrix. However, sometimes, only a given (and probably small) part of the system matrix varies from the current sample to the next one. We exploit this fact to reuse some computations carried out to process the former sample in order to save execution time in the processing of the current sample. These savings can be critical for real-time applications running on low power consumption devices with high mobility. In addition, we propose a simple out-of-order task-parallel algorithm for the QR factorization using OpenMP that exploits the multicore capability of modern processors. Furthermore, in the presence of a Graphics Processing Unit (GPU) in the system, our algorithm is able to off-load some tasks to the GPU to accelerate the computation on these hardware devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Acoustic Terminology and Signal Processing

Exudyn – a C++-based Python package for flexible multibody systems

Article Open access 09 October 2023

A Ring-Projection-Based Two-Scale Approach for Accurate Digital Image Correlation of Large Translations and Rotations

Article 17 April 2024

Notes

For the sake of simplicity we omit here the exact procedure to build the new rows of the system matrix from the sample.
In order to save space in the document we have annotated the algorithms with OpenMP tags that will be explained later. The sequential version arises from simply deleting these OpenMP directives.

References

Augonnet C, Thibault S, Namyst R (2010) StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Research Report RR-7240, INRIA
Buttari A, Langou J, Kurzak J, Dongarra J (2008) Parallel tiled QR factorization for multicore architectures. Concurr Comput Pract Exp 20(13):1573–1590
Article Google Scholar
Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53
Article MathSciNet Google Scholar
Chan E, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R (2007) Supermatrix out-of-order scheduling of matrix operations for smp and multi-core architectures. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’07. ACM, New York, pp 116–125
Chan E, Van Zee FG, Quintana-Ortí ES, Quintana-Ortí G, De Van Geijn R (2007) Satisfying your dependencies with supermatrix. In: Proceedings—2007 IEEE International Conference on Cluster Computing, CLUSTER 2007. pp 91–99
Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn RA (2008) Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: Chatterjee S, Scott ML (eds) PPOPP. ACM, New york, pp 123–132
Google Scholar
Golub GH, Van Loan CF (2013) Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore
Google Scholar
Gunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78
Article MathSciNet MATH Google Scholar
Joffrain T, Quintana-Ortí ES, van de Geijn RA (2004) Rapid development of high-performance out-of-core solvers. In: Applied Parallel Computing, State of the Art in Scientific Computing, 7th International Workshop, PARA 2004, Lyngby, Denmark, June 20–23, 2004, revised selected papers. pp 413–422
NVIDIA. The cuBLAS library. http://docs.nvidia.com/cuda/cublas. Accessed May 2017
Openblas. http://www.openblas.net. Accessed May 2017
Quintana-Ortí G, Quintana-Ortí ES, Van De Geijn RA, Van Zee FG, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:26
Article MathSciNet MATH Google Scholar
The OmpSs Programming Model. https://pm.bsc.es/ompss. Accessed May 2017
Wende F, Steinke T, Cordes F (2014) Multi-threaded kernel offloading to gpgpu using hyper-q on kepler architecture. Technical Report 14-19, ZIB, Takustr.7, 14195 Berlin

Download references

Acknowledgements

This work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TEC2015-67387-C4-1-R and TIN2014-53495-R; and the Generalitat Valenciana PROMETEOII/2014/003.

Author information

Authors and Affiliations

Depto. de Sistemas Informáticos y Computación, Universitat Politècnica de València, Valencia, Spain
Fran J. Alventosa, Pedro Alonso & Antonio M. Vidal
Instituto de Telecomunicaciones y Aplicaciones Multimedia (iTEAM), Universitat Politècnica de València, Valencia, Spain
Gema Piñero
Dept. Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain
Enrique S. Quintana-Ortí

Authors

Fran J. Alventosa
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Gema Piñero
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Alonso.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alventosa, F.J., Alonso, P., Vidal, A.M. et al. Fast block QR update in digital signal processing. J Supercomput 75, 1051–1064 (2019). https://doi.org/10.1007/s11227-018-2298-5

Download citation

Published: 13 March 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s11227-018-2298-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast block QR update in digital signal processing

Abstract

Access this article

Similar content being viewed by others

Introduction to Acoustic Terminology and Signal Processing

Exudyn – a C++-based Python package for flexible multibody systems

A Ring-Projection-Based Two-Scale Approach for Accurate Digital Image Correlation of Large Translations and Rotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast block QR update in digital signal processing

Abstract

Access this article

Similar content being viewed by others

Introduction to Acoustic Terminology and Signal Processing

Exudyn – a C++-based Python package for flexible multibody systems

A Ring-Projection-Based Two-Scale Approach for Accurate Digital Image Correlation of Large Translations and Rotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation