Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs

Jeljeli, Hamza

doi:10.1007/978-3-319-16277-5_2

Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs

Hamza Jeljeli¹⁶

Conference paper
First Online: 01 January 2015

737 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9061))

Abstract

In the context of cryptanalysis, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems — e.g., the conjugate gradient or the Lanczos and Wiedemann algorithms — iterate a product of the corresponding sparse matrix with a vector (SpMV). This central operation can be accelerated on GPUs using specific computing models and addressing patterns, which increase the arithmetic intensity while reducing irregular memory accesses. In this work, we investigate the implementation of SpMV kernels on NVIDIA GPUs, for several representations of the sparse matrix in memory. We explore the use of Residue Number System (RNS) arithmetic to accelerate modular operations. We target linear systems arising when attacking the discrete logarithm problem on groups of size 100 to 1000 bits, which includes the relevant range for current cryptanalytic computations. The proposed SpMV implementation contributed to solving the discrete logarithm problem in GF(\(2^{619}\)) and GF(\(2^{809}\)) using the FFS algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Adleman, L.: A subexponential algorithm for the discrete logarithm problem with applications to cryptography. In: Proceedings of the 20th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, pp. 55–60 (1979)
Google Scholar
Bai, S., Bouvier, C., Filbois, A., Gaudry, P., Imbert, L., Kruppa, A., Morain, F., Thomé, E., Zimmermann, P.: Cado-nfs: Crible algébrique: Distribution, optimisation - number field sieve. http://cado-nfs.gforge.inria.fr/
Barbulescu, R., Bouvier, C., Detrey, J., Gaudry, P., Jeljeli, H., Thomé, E., Videau, M., Zimmermann, P.: Discrete logarithm in GF\((2^{809})\) with FFS. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 221–238. Springer, Heidelberg (2014)
Chapter Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report NVR-2008-004, NVIDIA Corporation, December 2008
Google Scholar
Bell, N., Garland, M.: Cusp: Generic parallel algorithms for sparse matrix and graph computations (2012). http://code.google.com/p/cusp-library/
Bernstein, D.J.: Multidigit modular multiplication with the explicit chinese remainder theorem. Technical report (1995). http://cr.yp.to/papers/mmecrt.pdf
Blelloch, G.E., Heroux, M.A., Zagha, M.: Segmented operations for sparse matrix computation on vector multiprocessors. Technical report CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, August 1993
Google Scholar
Boyer, B., Dumas, J.G., Giorgi, P.: Exact sparse matrix-vector multiplication on GPU’s and multicore architectures. CoRR abs/1004.3719 (2010)
Google Scholar
Hayashi, T., Shimoyama, T., Shinohara, N., Takagi, T.: Breaking pairing-based cryptosystems using \(\eta _t\) pairing over GF\((3^{97})\). Cryptology ePrint Archive, Report 2012/345 (2012)
Google Scholar
Jeljeli, H.: Resolution of linear algebra for the discrete logarithm problem using GPU and multi-core architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 764–775. Springer, Heidelberg (2014)
Chapter Google Scholar
Kaltofen, E.: Analysis of coppersmith’s block wiedemann algorithm for the parallel solution of sparse linear systems. Math. Comput. 64(210), 777–806 (1995)
MATH MathSciNet Google Scholar
LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 109–133. Springer, Heidelberg (1991)
Google Scholar
Lanczos, C.: Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand 49, 33–53 (1952)
Article MathSciNet Google Scholar
NVIDIA Corporation: CUDA Programming Guide Version 4.2 (2012). http://developer.nvidia.com/cuda-downloads
NVIDIA Corporation: PTX: Parallel Thread Execution ISA Version 3.0 (2012). http://developer.nvidia.com/cuda-downloads
Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic significance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS, vol. 209, pp. 224–314. Springer, Heidelberg (1985)
Chapter Google Scholar
Pollard, J.M.: A monte carlo method for factorization. BIT Numer. Math. 15, 331–334 (1975)
Article MATH MathSciNet Google Scholar
Pomerance, C., Smith, J.W.: Reduction of huge, sparse matrices over finite fields via created catastrophes. Exp. Math. 1, 89–94 (1992)
Article MATH MathSciNet Google Scholar
Schmidt, B., Aribowo, H., Dang, H.-V.: Iterative sparse matrix-vector multiplication for integer factorization on GPUs. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 413–424. Springer, Heidelberg (2011)
Chapter Google Scholar
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing, pp. 97–106, August 2007
Google Scholar
Shanks, D.: Class number, a theory of factorization, and genera. In: 1969 Number Theory Institute (Proc. Sympos. Pure Math., Vol. XX, State Univ. New York, Stony Brook, N.Y., 1969), pp. 415–440. Providence, R.I. (1971)
Google Scholar
Stach, P.: Optimizations to nfs linear algebra. In:CADO Workshop on Integer Factorization. http://cado.gforge.inria.fr/workshop/abstracts.html
Szabo, N.S., Tanaka, R.I.: Residue Arithmetic and Its Applications to Computer Technology. McGraw-Hill Book Company, New York (1967)
MATH Google Scholar
Taylor, F.J.: Residue arithmetic a tutorial with examples. Computer 17, 50–62 (1984)
Article Google Scholar
Thomé, E.: Subquadratic computation of vector generating polynomials and improvement of the block wiedemann algorithm. J. Symbolic Comput. 33(5), 757–775 (2002)
Article MATH MathSciNet Google Scholar
Vázquez, F., Garzón, E.M., Martinez, J.A., Fernández, J.J.: The sparse matrix vector product on GPUs. Technical report, University of Almeria, June 2009
Google Scholar
Wiedemann, D.H.: Solving sparse linear equations over finite fields. IEEE Trans. Inf. Theor. 32(1), 54–62 (1986)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

CARAMEL project-team, LORIA, INRIA/CNRS/Université de Lorraine, Campus Scientifique, BP 239, 54506, Vandœuvre-lés-Nancy Cedex, France
Hamza Jeljeli

Authors

Hamza Jeljeli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamza Jeljeli .

Editor information

Editors and Affiliations

Department of Computer Science, Univ of California, Santa Barbara, Santa Barbara, California, USA
Çetin Kaya Koç
University of Paris VIII, Paris, France
Sihem Mesnager
Sabancı University, Istanbul, Turkey
Erkay Savaş

Appendices

A Formats and GPU Kernels of SpMV

B Resolution of Linear Algebra of the Function Field Sieve

The linear algebra step consists of solving the system \(Aw=0\), where \(A\) is the matrix produced by the filtering step of the FFS algorithm. \(A\) is singular and square. Finding a vector of the kernel of the matrix is generally sufficient for the FFS algorithm.

The simple Wiedemann algorithm [27] which resolves such a system, is composed of three steps:

Scalar products: It consists on the computation of a sequence of scalars \(a_i = \,^txA^i y\), where \(0 \le i \le 2N\), and \(x\) and \(y\) are random vectors in \((\mathbb {Z}/\ell \mathbb {Z})^N\). We take \(x\) in the canonical basis, so that instead of performing a full dot product between \(^tx\) and \(A^i y\), we just store the element of \(A^i y\) that corresponds to the non-zero coordinate of \(x\).
Linear generator: Using the Berlekamp-Massey algorithm, this step computes a linear generator of the \(a_i\)’s. The output \(F\) is a polynomial whose coefficients lie in \(\mathbb {Z}/\ell \mathbb {Z}\), and whose degree is very close to \(N\).
Evaluation: The last step computes \(\sum _{i=0}^{deg(F)}{A^iF_iy}\), where \(F_i\) is the \(i^{th}\) coefficient of \(F\). The result is with high probability a non-zero vector of the kernel of \(A\).

The Block Wiedemann algorithm [11] proposes to use \(m\) random vectors for \(x\) and \(n\) random vectors for \(y\). The sequence of scalars is thus replaced by a sequence of \(m \times n\) matrices and the numbers of iterations of the first and third steps become \((N/n + N/m)\) and \(N/n\), respectively. The \(n\) subsequences can be computed independently and in parallel. So, the block Wiedemann method allows to distribute the computation without an additional overhead [25].

1.1 B.1 Linear Algebra of FFS for GF(\(2^{619}\))

The matrix has 650 k rows and columns. The prime \(\ell \) is 217 bits. The computation was completed using the simple Wiedemann algorithm on a single NVIDIA GeForce GTX 580. The overall computation needed 16 GPU hours and 1 CPU hour.

1.2 B.2 Linear Algebra of FFS for GF(\(2^{809}\))

The matrix has 3.6M rows and columns. The prime \(\ell \) is 202 bits. We run a Block Wiedemann on a cluster of 8 GPUs. We used 4 distinct nodes, each equipped with two NVIDIA Tesla M2050 graphics processors, and ran the Block Wiedemann algorithm with blocking parameters \(m=8\) and \(n=4\). The overall computation required 4.4 days in parallel on the 4 nodes.

These two computations were part of record-sized discrete logarithm computations in a prime-degree extension field [3, 10].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeljeli, H. (2015). Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs. In: Koç, Ç., Mesnager, S., Savaş, E. (eds) Arithmetic of Finite Fields. WAIFI 2014. Lecture Notes in Computer Science(), vol 9061. Springer, Cham. https://doi.org/10.1007/978-3-319-16277-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-16277-5_2
Published: 22 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16276-8
Online ISBN: 978-3-319-16277-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics