GPU Acceleration of Dense Matrix and Block Operations for Lanczos Method for Systems Over GF(2)
- 9 Downloads
Abstract
The algebraic operations with the dense matrices and blocks are bounding the scalability of block Lanczos–Montgomery method, that is used for the linear part in the RSA decomposition problem. This paper explores the possibility of implementation of the following algebraic operations over field \(\mathbb{F}_2\) on GPU: (1) multiplication of two 64k × 64k matrices; (2) multiplication of two N × 64k blocks. For matrix multiplication, we consider two algorithms: (a) the “naive” algorithm; (b) the “fast” algorithm by 4 Russians. For block multiplication, we consider just the “naive” algorithm. It seems that by now this is the only work where BLAS acceleration over \(\mathbb{F}_2\) are relatively successful accelerated on GPU.
Keywords and phrases
GPGPU GF(2) “four Russians” methodPreview
Unable to display preview. Download preview PDF.
Notes
Funding
This article contains the results of the project performed in the framework of the implementation of the programs of the Central Competences of the National Technological Database “Center for Big Data Storage and Analysis” (project “Tensor methods for processing and analysis of Big Data”) of MSU with the Project Support Funding of the National Technological Reporting dated December 11, 2018, no. 13/1251/2018.
References
- 1.D. Coppersmith, “Solving homogeneous linear equations over GF (2) via block Wiedemann algorithm,” Math. Comput. 62 (205) (1994).Google Scholar
- 2.E. Thome, “Fast computation of linear generators for matrix sequences and application to the block Wiedemann algorithm,” in Proceedings of the International Conference on Symbolic and Algebraic Computation, 2001, pp. 323–331.Google Scholar
- 3.T. Kleinjung et al., “Factorization of a 768-bit RSA modulus,” Lect. Notes Comput. Sci. 6233, 333–350 (2010).MathSciNetCrossRefGoogle Scholar
- 4.E. Thome et al., “Factorization of RSA-704 with CADO-NFS,” Preprint (2012).Google Scholar
- 5.E. Thome, “A modified block Lanczos algorithm with fewer vectors,” arXiv:1604.02277 (2016).Google Scholar
- 6.N. Zamarashkin and D. Zheltkov, “Block Lanczos–Montgomery method with reduced data exchanges,” in Russian Supercomputing Days, Commun. Comput. Inform. Sci. 687, 15–26(2016).Google Scholar
- 7.N. Zamarashkin and D. Zheltkov, “GPU acceleration of dense matrix and block operations for Lanczos Method for systems over large prime finite field,” in Russian Supercomputing Days, Commun. Comput. Inform. Sci. 793, 14–26 (2017).Google Scholar
- 8.N. L. Zamarashkin and D. A. Zheltkov, “GPU based acceleration of parallel block Lancoz solver,” Lobachevskii J. Math. 39 (4), 596–602 (2018).MathSciNetCrossRefGoogle Scholar
- 9.P. Montgomery, “A block Lanczos algorithm for finding dependencies over GF(2),” in Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques EUROCRYPT 1995 (Springer, 1995).Google Scholar
- 10.D. A. Zheltkov, “Effective basic linear algebra operations for the solution of large sparse linear systems over finite fields,” in Proceedings of the Conference on Russian Supercomputer Days, 2016, pp. 774–788.Google Scholar
- 11.N. L. Zamarashkin, Algorithms for Systems of Linear Equations over GF(2), The School-Book (Mosk. Gos. Univ., Moscow, 2013) [in Russian].Google Scholar
- 12.The M4RI Library. https://bitbucket.org/malb/m4ri. Accessed 2019.
- 13.M. Albrecht, B. Gregory, and W. Hart, “Algorithm 898: Efficient multiplication of dense matrices over GF(2),” ACM Trans. Math. Software 37 (1) (2010).Google Scholar
- 14.J. Tharaud and R. Laurent, “Linear algebra over the field with two elements using GPUs,” Preprint (2010).Google Scholar
- 15.D. Demirel, “Effizientes Losen linearer Gleichungssysteme uber GF(2) mit GPUs,” PhD Dissertation (2010).Google Scholar