Abstract
The Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the conventional solver based on the LU factorization with row pivoting. Furthermore, the GHA can be formulated as a procedure rich in matrix multiplications, so that high performance can be expected on current architectures with multi-layered memories. Unfortunately, in principle the GHA does not admit the introduction of look-ahead, a technique that has been demonstrated to be rather useful to improve the performance of the LU factorization on multi-threaded platforms with high levels of hardware concurrency. In this paper we analyze the effect of this drawback on the implementation of the GHA on systems accelerated with graphics processing units (GPUs), exposing the roles of the CPU-to-GPU and single precision-to-double precision performance ratios, as well as the contribution from the operations in the algorithm’s critical path.
Keywords
All researchers acknowledge the support from the EHFARS project funded by the German Ministry of Education and Research BMBF.
E.S. Quintana-Ortí was supported by the CICYT project TIN2014-53495-R of the Ministerio de Economía y Competitividad and FEDER.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK users’ guide (third ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1999)
Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Exploiting the capabilities of modern GPUs for dense matrix computations. Concurrency Comput.: Pract. Exp. 21, 2457–2477 (2009)
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurrency Comput.: Pract. Exp. 25(8), 1170–1182 (2013)
Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Revisiting the Gauss-Huard algorithm for the solution of linear systems on graphics accelerators. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 505–514. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32149-3_47
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21(4), 457–466 (2007)
Dekker, T.J., Hoffmann, W., Potma, K.: Parallel algorithms for solving large linear systems. J. Comput. Appl. Math. 50(1–3), 221–232 (1994)
Dekker, T.J., Hoffmann, W., Potma, K.: Stability of the Gauss-Huard algorithm with partial pivoting. Computing 58, 225–244 (1997)
Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Solving linear systems on the intel Xeon-Phi accelerator via the Gauss-Huard algorithm. Commun. Comput. Inf. Sci. 565, 107–117 (2015)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: formal linear algebra methods environment. ACM Trans. Math. Softw. 27(4), 422–455 (2001)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco (2011)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Hoffmann, W., Potma, K., Pronk, G.: Solving dense linear systems by Gauss-Huard’s method on a distributed memory system. Future Gener. Comput. Syst. 10(2–3), 321–325 (1994)
Huard, P.: La méthode simplex sans inverse explicite. EDB Bull, Direction Études Rech. Sér. C Math. Inform. 2, 79–98 (1979)
Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical report TR-CS-98-07, Department of Computer Science, The Australian National University (1998)
The University of Tennessee at Knoxville. MAGMA: Matrix Algebra on GPU and Multicore Architectures. http://icl.cs.utk.edu/magma/
Van Zee, F.G., Smith, T.M., Marker, B., Meng Low, T., van de Geijn, R.A., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J., Killough, L.: The BLIS framework: experiments in portability. ACM Trans. Math. Soft. http://www.cs.utexas.edu/users/flame. Accessed 2016
Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 141–1433 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Catalán, S., Ezzatti, P., Quintana-Ortí, E.S., Remón, A. (2016). The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)