Skip to main content

Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9061))

Abstract

In the context of cryptanalysis, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems — e.g., the conjugate gradient or the Lanczos and Wiedemann algorithms — iterate a product of the corresponding sparse matrix with a vector (SpMV). This central operation can be accelerated on GPUs using specific computing models and addressing patterns, which increase the arithmetic intensity while reducing irregular memory accesses. In this work, we investigate the implementation of SpMV kernels on NVIDIA GPUs, for several representations of the sparse matrix in memory. We explore the use of Residue Number System (RNS) arithmetic to accelerate modular operations. We target linear systems arising when attacking the discrete logarithm problem on groups of size 100 to 1000 bits, which includes the relevant range for current cryptanalytic computations. The proposed SpMV implementation contributed to solving the discrete logarithm problem in GF(\(2^{619}\)) and GF(\(2^{809}\)) using the FFS algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adleman, L.: A subexponential algorithm for the discrete logarithm problem with applications to cryptography. In: Proceedings of the 20th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, pp. 55–60 (1979)

    Google Scholar 

  2. Bai, S., Bouvier, C., Filbois, A., Gaudry, P., Imbert, L., Kruppa, A., Morain, F., Thomé, E., Zimmermann, P.: Cado-nfs: Crible algébrique: Distribution, optimisation - number field sieve. http://cado-nfs.gforge.inria.fr/

  3. Barbulescu, R., Bouvier, C., Detrey, J., Gaudry, P., Jeljeli, H., Thomé, E., Videau, M., Zimmermann, P.: Discrete logarithm in GF\((2^{809})\) with FFS. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 221–238. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  4. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report NVR-2008-004, NVIDIA Corporation, December 2008

    Google Scholar 

  5. Bell, N., Garland, M.: Cusp: Generic parallel algorithms for sparse matrix and graph computations (2012). http://code.google.com/p/cusp-library/

  6. Bernstein, D.J.: Multidigit modular multiplication with the explicit chinese remainder theorem. Technical report (1995). http://cr.yp.to/papers/mmecrt.pdf

  7. Blelloch, G.E., Heroux, M.A., Zagha, M.: Segmented operations for sparse matrix computation on vector multiprocessors. Technical report CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, August 1993

    Google Scholar 

  8. Boyer, B., Dumas, J.G., Giorgi, P.: Exact sparse matrix-vector multiplication on GPU’s and multicore architectures. CoRR abs/1004.3719 (2010)

    Google Scholar 

  9. Hayashi, T., Shimoyama, T., Shinohara, N., Takagi, T.: Breaking pairing-based cryptosystems using \(\eta _t\) pairing over GF\((3^{97})\). Cryptology ePrint Archive, Report 2012/345 (2012)

    Google Scholar 

  10. Jeljeli, H.: Resolution of linear algebra for the discrete logarithm problem using GPU and multi-core architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 764–775. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Kaltofen, E.: Analysis of coppersmith’s block wiedemann algorithm for the parallel solution of sparse linear systems. Math. Comput. 64(210), 777–806 (1995)

    MATH  MathSciNet  Google Scholar 

  12. LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 109–133. Springer, Heidelberg (1991)

    Google Scholar 

  13. Lanczos, C.: Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand 49, 33–53 (1952)

    Article  MathSciNet  Google Scholar 

  14. NVIDIA Corporation: CUDA Programming Guide Version 4.2 (2012). http://developer.nvidia.com/cuda-downloads

  15. NVIDIA Corporation: PTX: Parallel Thread Execution ISA Version 3.0 (2012). http://developer.nvidia.com/cuda-downloads

  16. Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic significance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS, vol. 209, pp. 224–314. Springer, Heidelberg (1985)

    Chapter  Google Scholar 

  17. Pollard, J.M.: A monte carlo method for factorization. BIT Numer. Math. 15, 331–334 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  18. Pomerance, C., Smith, J.W.: Reduction of huge, sparse matrices over finite fields via created catastrophes. Exp. Math. 1, 89–94 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  19. Schmidt, B., Aribowo, H., Dang, H.-V.: Iterative sparse matrix-vector multiplication for integer factorization on GPUs. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 413–424. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing, pp. 97–106, August 2007

    Google Scholar 

  21. Shanks, D.: Class number, a theory of factorization, and genera. In: 1969 Number Theory Institute (Proc. Sympos. Pure Math., Vol. XX, State Univ. New York, Stony Brook, N.Y., 1969), pp. 415–440. Providence, R.I. (1971)

    Google Scholar 

  22. Stach, P.: Optimizations to nfs linear algebra. In:CADO Workshop on Integer Factorization. http://cado.gforge.inria.fr/workshop/abstracts.html

  23. Szabo, N.S., Tanaka, R.I.: Residue Arithmetic and Its Applications to Computer Technology. McGraw-Hill Book Company, New York (1967)

    MATH  Google Scholar 

  24. Taylor, F.J.: Residue arithmetic a tutorial with examples. Computer 17, 50–62 (1984)

    Article  Google Scholar 

  25. Thomé, E.: Subquadratic computation of vector generating polynomials and improvement of the block wiedemann algorithm. J. Symbolic Comput. 33(5), 757–775 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  26. Vázquez, F., Garzón, E.M., Martinez, J.A., Fernández, J.J.: The sparse matrix vector product on GPUs. Technical report, University of Almeria, June 2009

    Google Scholar 

  27. Wiedemann, D.H.: Solving sparse linear equations over finite fields. IEEE Trans. Inf. Theor. 32(1), 54–62 (1986)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamza Jeljeli .

Editor information

Editors and Affiliations

Appendices

A Formats and GPU Kernels of SpMV

figure g
figure h
figure i

B Resolution of Linear Algebra of the Function Field Sieve

The linear algebra step consists of solving the system \(Aw=0\), where \(A\) is the matrix produced by the filtering step of the FFS algorithm. \(A\) is singular and square. Finding a vector of the kernel of the matrix is generally sufficient for the FFS algorithm.

The simple Wiedemann algorithm [27] which resolves such a system, is composed of three steps:

  • Scalar products: It consists on the computation of a sequence of scalars \(a_i = \,^txA^i y\), where \(0 \le i \le 2N\), and \(x\) and \(y\) are random vectors in \((\mathbb {Z}/\ell \mathbb {Z})^N\). We take \(x\) in the canonical basis, so that instead of performing a full dot product between \(^tx\) and \(A^i y\), we just store the element of \(A^i y\) that corresponds to the non-zero coordinate of \(x\).

  • Linear generator: Using the Berlekamp-Massey algorithm, this step computes a linear generator of the \(a_i\)’s. The output \(F\) is a polynomial whose coefficients lie in \(\mathbb {Z}/\ell \mathbb {Z}\), and whose degree is very close to \(N\).

  • Evaluation: The last step computes \(\sum _{i=0}^{deg(F)}{A^iF_iy}\), where \(F_i\) is the \(i^{th}\) coefficient of \(F\). The result is with high probability a non-zero vector of the kernel of \(A\).

The Block Wiedemann algorithm [11] proposes to use \(m\) random vectors for \(x\) and \(n\) random vectors for \(y\). The sequence of scalars is thus replaced by a sequence of \(m \times n\) matrices and the numbers of iterations of the first and third steps become \((N/n + N/m)\) and \(N/n\), respectively. The \(n\) subsequences can be computed independently and in parallel. So, the block Wiedemann method allows to distribute the computation without an additional overhead [25].

1.1 B.1 Linear Algebra of FFS for GF(\(2^{619}\))

The matrix has 650 k rows and columns. The prime \(\ell \) is 217 bits. The computation was completed using the simple Wiedemann algorithm on a single NVIDIA GeForce GTX 580. The overall computation needed 16 GPU hours and 1 CPU hour.

1.2 B.2 Linear Algebra of FFS for GF(\(2^{809}\))

The matrix has 3.6M rows and columns. The prime \(\ell \) is 202 bits. We run a Block Wiedemann on a cluster of 8 GPUs. We used 4 distinct nodes, each equipped with two NVIDIA Tesla M2050 graphics processors, and ran the Block Wiedemann algorithm with blocking parameters \(m=8\) and \(n=4\). The overall computation required 4.4 days in parallel on the 4 nodes.

These two computations were part of record-sized discrete logarithm computations in a prime-degree extension field [3, 10].

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jeljeli, H. (2015). Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs. In: Koç, Ç., Mesnager, S., Savaş, E. (eds) Arithmetic of Finite Fields. WAIFI 2014. Lecture Notes in Computer Science(), vol 9061. Springer, Cham. https://doi.org/10.1007/978-3-319-16277-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16277-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16276-8

  • Online ISBN: 978-3-319-16277-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics