Abstract
In the context of cryptanalysis, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems — e.g., the conjugate gradient or the Lanczos and Wiedemann algorithms — iterate a product of the corresponding sparse matrix with a vector (SpMV). This central operation can be accelerated on GPUs using specific computing models and addressing patterns, which increase the arithmetic intensity while reducing irregular memory accesses. In this work, we investigate the implementation of SpMV kernels on NVIDIA GPUs, for several representations of the sparse matrix in memory. We explore the use of Residue Number System (RNS) arithmetic to accelerate modular operations. We target linear systems arising when attacking the discrete logarithm problem on groups of size 100 to 1000 bits, which includes the relevant range for current cryptanalytic computations. The proposed SpMV implementation contributed to solving the discrete logarithm problem in GF(\(2^{619}\)) and GF(\(2^{809}\)) using the FFS algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adleman, L.: A subexponential algorithm for the discrete logarithm problem with applications to cryptography. In: Proceedings of the 20th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, pp. 55–60 (1979)
Bai, S., Bouvier, C., Filbois, A., Gaudry, P., Imbert, L., Kruppa, A., Morain, F., Thomé, E., Zimmermann, P.: Cado-nfs: Crible algébrique: Distribution, optimisation - number field sieve. http://cado-nfs.gforge.inria.fr/
Barbulescu, R., Bouvier, C., Detrey, J., Gaudry, P., Jeljeli, H., Thomé, E., Videau, M., Zimmermann, P.: Discrete logarithm in GF\((2^{809})\) with FFS. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 221–238. Springer, Heidelberg (2014)
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report NVR-2008-004, NVIDIA Corporation, December 2008
Bell, N., Garland, M.: Cusp: Generic parallel algorithms for sparse matrix and graph computations (2012). http://code.google.com/p/cusp-library/
Bernstein, D.J.: Multidigit modular multiplication with the explicit chinese remainder theorem. Technical report (1995). http://cr.yp.to/papers/mmecrt.pdf
Blelloch, G.E., Heroux, M.A., Zagha, M.: Segmented operations for sparse matrix computation on vector multiprocessors. Technical report CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, August 1993
Boyer, B., Dumas, J.G., Giorgi, P.: Exact sparse matrix-vector multiplication on GPU’s and multicore architectures. CoRR abs/1004.3719 (2010)
Hayashi, T., Shimoyama, T., Shinohara, N., Takagi, T.: Breaking pairing-based cryptosystems using \(\eta _t\) pairing over GF\((3^{97})\). Cryptology ePrint Archive, Report 2012/345 (2012)
Jeljeli, H.: Resolution of linear algebra for the discrete logarithm problem using GPU and multi-core architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 764–775. Springer, Heidelberg (2014)
Kaltofen, E.: Analysis of coppersmith’s block wiedemann algorithm for the parallel solution of sparse linear systems. Math. Comput. 64(210), 777–806 (1995)
LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 109–133. Springer, Heidelberg (1991)
Lanczos, C.: Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand 49, 33–53 (1952)
NVIDIA Corporation: CUDA Programming Guide Version 4.2 (2012). http://developer.nvidia.com/cuda-downloads
NVIDIA Corporation: PTX: Parallel Thread Execution ISA Version 3.0 (2012). http://developer.nvidia.com/cuda-downloads
Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic significance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS, vol. 209, pp. 224–314. Springer, Heidelberg (1985)
Pollard, J.M.: A monte carlo method for factorization. BIT Numer. Math. 15, 331–334 (1975)
Pomerance, C., Smith, J.W.: Reduction of huge, sparse matrices over finite fields via created catastrophes. Exp. Math. 1, 89–94 (1992)
Schmidt, B., Aribowo, H., Dang, H.-V.: Iterative sparse matrix-vector multiplication for integer factorization on GPUs. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 413–424. Springer, Heidelberg (2011)
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing, pp. 97–106, August 2007
Shanks, D.: Class number, a theory of factorization, and genera. In: 1969 Number Theory Institute (Proc. Sympos. Pure Math., Vol. XX, State Univ. New York, Stony Brook, N.Y., 1969), pp. 415–440. Providence, R.I. (1971)
Stach, P.: Optimizations to nfs linear algebra. In:CADO Workshop on Integer Factorization. http://cado.gforge.inria.fr/workshop/abstracts.html
Szabo, N.S., Tanaka, R.I.: Residue Arithmetic and Its Applications to Computer Technology. McGraw-Hill Book Company, New York (1967)
Taylor, F.J.: Residue arithmetic a tutorial with examples. Computer 17, 50–62 (1984)
Thomé, E.: Subquadratic computation of vector generating polynomials and improvement of the block wiedemann algorithm. J. Symbolic Comput. 33(5), 757–775 (2002)
Vázquez, F., Garzón, E.M., Martinez, J.A., Fernández, J.J.: The sparse matrix vector product on GPUs. Technical report, University of Almeria, June 2009
Wiedemann, D.H.: Solving sparse linear equations over finite fields. IEEE Trans. Inf. Theor. 32(1), 54–62 (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Formats and GPU Kernels of SpMV
B Resolution of Linear Algebra of the Function Field Sieve
The linear algebra step consists of solving the system \(Aw=0\), where \(A\) is the matrix produced by the filtering step of the FFS algorithm. \(A\) is singular and square. Finding a vector of the kernel of the matrix is generally sufficient for the FFS algorithm.
The simple Wiedemann algorithm [27] which resolves such a system, is composed of three steps:
-
Scalar products: It consists on the computation of a sequence of scalars \(a_i = \,^txA^i y\), where \(0 \le i \le 2N\), and \(x\) and \(y\) are random vectors in \((\mathbb {Z}/\ell \mathbb {Z})^N\). We take \(x\) in the canonical basis, so that instead of performing a full dot product between \(^tx\) and \(A^i y\), we just store the element of \(A^i y\) that corresponds to the non-zero coordinate of \(x\).
-
Linear generator: Using the Berlekamp-Massey algorithm, this step computes a linear generator of the \(a_i\)’s. The output \(F\) is a polynomial whose coefficients lie in \(\mathbb {Z}/\ell \mathbb {Z}\), and whose degree is very close to \(N\).
-
Evaluation: The last step computes \(\sum _{i=0}^{deg(F)}{A^iF_iy}\), where \(F_i\) is the \(i^{th}\) coefficient of \(F\). The result is with high probability a non-zero vector of the kernel of \(A\).
The Block Wiedemann algorithm [11] proposes to use \(m\) random vectors for \(x\) and \(n\) random vectors for \(y\). The sequence of scalars is thus replaced by a sequence of \(m \times n\) matrices and the numbers of iterations of the first and third steps become \((N/n + N/m)\) and \(N/n\), respectively. The \(n\) subsequences can be computed independently and in parallel. So, the block Wiedemann method allows to distribute the computation without an additional overhead [25].
1.1 B.1 Linear Algebra of FFS for GF(\(2^{619}\))
The matrix has 650 k rows and columns. The prime \(\ell \) is 217 bits. The computation was completed using the simple Wiedemann algorithm on a single NVIDIA GeForce GTX 580. The overall computation needed 16 GPU hours and 1 CPU hour.
1.2 B.2 Linear Algebra of FFS for GF(\(2^{809}\))
The matrix has 3.6M rows and columns. The prime \(\ell \) is 202 bits. We run a Block Wiedemann on a cluster of 8 GPUs. We used 4 distinct nodes, each equipped with two NVIDIA Tesla M2050 graphics processors, and ran the Block Wiedemann algorithm with blocking parameters \(m=8\) and \(n=4\). The overall computation required 4.4 days in parallel on the 4 nodes.
These two computations were part of record-sized discrete logarithm computations in a prime-degree extension field [3, 10].
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jeljeli, H. (2015). Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs. In: Koç, Ç., Mesnager, S., Savaş, E. (eds) Arithmetic of Finite Fields. WAIFI 2014. Lecture Notes in Computer Science(), vol 9061. Springer, Cham. https://doi.org/10.1007/978-3-319-16277-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-16277-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16276-8
Online ISBN: 978-3-319-16277-5
eBook Packages: Computer ScienceComputer Science (R0)