Graphics Processing Units

  • Peter SchwabeEmail author


This chapter introduces graphics processing units (GPUs) for general-purpose computations. It describes the highly parallel architecture of modern GPUs, software-development toolchains to program them, and typical pitfalls and performance bottlenecks. Then it considers several applications of GPUs in information security, in particular in cryptography and cryptanalysis.


Device Memory Advanced Encryption Standard Graphic Card Work Item Compute Unify Device Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    “1bitc0inplz”. Edited forum post on, 2011.
  2. 2.
    Advanced Microdevices Inc. R600-Family Instruction Set Architecture, 2008.
  3. 3.
    Advanced Microdevices Inc. AMD Accelerated Parallel Processing OpenCL Programming Guide, rev. 1.3f, 2011.
  4. 4.
    Advanced Microdevices Inc. Evergreen Family Instruction Set Architecture Instructions and Microcode, 2011.
  5. 5.
    Advanced Microdevices Inc. R700-Family Instruction Set Architecture, 2011.
  6. 6.
    Daniel V. Bailey, Lejla Batina, Daniel J. Bernstein, Peter Birkner, Joppe W. Bos, Hsieh-Chung Chen, Chen-Mou Cheng, Gauthier Van Damme, Giacomo de Meulenaer, Luis Julian Dominguez Perez, Junfeng Fan, Tim Güneysu, Frank Gürkaynak, Thorsten Kleinjung, Tanja Lange, Nele Mentens, Ruben Niederhagen, Christof Paar, Francesco Regazzoni, Peter Schwabe, Leif Uhsadel, Anthony Van Herrewege, and Bo-Yin Yang. Breaking ECC2K-130. Cryptology ePrint Archive, Report 2009/541, 2009.
  7. 7.
    Daniel J. Bernstein, Hsieh-Chung Chen, Chen-Mou Cheng, Tanja Lange, Ruben Niederhagen, Peter Schwabe, and Bo-Yin Yang. ECC2K-130 on NVIDIA GPUs. In Guang Gong and Kishan Chand Gupta, editors, Progress in Cryptology - INDOCRYPT 2010, volume 6498 of LNCS, pp. 328–346. Springer, 2010.
  8. 8.
    Daniel J. Bernstein, Hsueh-Chung Chen, Ming-Shing Chen, Chen-Mou Cheng, Chun-Hung Hsiao, Tanja Lange, Zong-Cing Lin, and Bo-Yin Yang. The billion-mulmod-per-second pc. In Workshop Record of SHARCS’09: Special-purpose Hardware for Attacking Cryptographic Systems, pp. 131–144, 2009.
  9. 9.
    Daniel J. Bernstein, Tien-Ren Chen, Chen-Mou Cheng, Tanja Lange, and Bo-Yin Yang. ECM on graphics cards. In Antoine Joux, editor, Advances in Cryptology - EUROCRYPT 2009, volume 5479 of LNCS, pp. 483–501. Springer, 2009.
  10. 10.
    Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures, 2011.
  11. 11.
    Daniel J. Bernstein and Tanja Lange. eBACS: ECRYPT benchmarking of cryptographic systems. (Accessed Nov. 3, 2011).
  12. 12.
    Eli Biham. A fast new DES implementation in software. In Eli Biham, editor, Fast Software Encryption, volume 1267 of LNCS, pp. 260–272. Springer, 1997.
  13. 13.
    Joppe W. Bos and Deian Stefan. Performance analysis of the SHA-3 candidates on exotic multi-core architectures. In Stefan Mangard and François-Xavier Standaert, editors, Cryptographic Hardware and Embedded Systems - CHES 2010, volume 6225 of LNCS, pp. 279–293. Springer, 2010.
  14. 14.
    BrookGPU., Accessed Nov. 5, 2011.
  15. 15.
    Certicom ECC Challenge, 1997., Accessed Nov. 6, 2011.
  16. 16.
    ECC Curves List, 1997., Accessed Nov. 6, 2011.
  17. 17.
    Marta Chinnici, Salvatore Cuomo, Maurizio Laporta, Alberto Pizzirani, and Silvio Migliori. CUDA based implementation of parallelized Pollard’s rho algorithm for ECDLP. In Final Workshop of Grid Projects, “Pon Ricerca 2000–2006, Avviso 1575”, 2009. http:/// ia2009/7chinnici.pdf.Google Scholar
  18. 18.
    Clam AntiVirus., Accessed Nov 1, 2011.
  19. 19.
    Aaron E. Cohen and Keshab K. Parhi. GPU accelerated elliptic curve cryptography in \(GF(2^m)\). In 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pages 57–60. IEEE, 2010.Google Scholar
  20. 20.
    James Coleman and Perry Taylor. Hardware level IO benchmarking of PCI Express. White Paper, Intel Corporation, 2008.
  21. 21.
    Debra L. Cook and Angelos D. Keromytis. CryptoGraphics: Exploiting Graphics Cards For Security, volume 20 of Advances in Information Security. Springer, 2006.Google Scholar
  22. 22.
    Joan Daemen and Vincent Rijmen. AES proposal: Rijndael, version 2, 1999.
  23. 23.
    Michael J. Flynn. Very high-speed computing systems. Proceedings of the IEEE, 54(12):1901–1909, 1966.
  24. 24.
    James Forshaw. WebGL - a new dimension for browser exploitation. Blog entry on the Context Information Security Ltd. blog, 2011.
  25. 25.
    James Forshaw, Paul Stone, and Michael Jordon. WebGL - more WebGL security flaws. Blog entry on the Context Information Security Ltd. blog, 2011.
  26. 26.
    Khronos OpenCL Working Group. The OpenCL Specification, Version 1.0, 2008.
  27. 27.
    Khronos OpenCL Working Group. The OpenCL Specification, Version 1.1, 2010.
  28. 28.
    Mark Harris. Real-Time Cloud Simulation and Rendering. Ph.D. thesis, University of North Carolina at Chapel Hill, 2003.
  29. 29.
    Owen Harrison and John Waldron. AES encryption implementation and analysis on commodity graphics processing units. In Pascal Paillier and Ingrid Verbauwhede, editors, Cryptographic Hardware and Embedded Systems - CHES 2007, volume 4727 of LNCS, pages 209–226. Springer, 2007.Google Scholar
  30. 30.
    Owen Harrison and John Waldron. Practical symmetric key cryptography on modern graphics hardware. In USENIX Security Symposium, pages 195–209. Usenix Association, 2008.Google Scholar
  31. 31.
    Owen Harrison and John Waldron. Efficient acceleration of asymmetric cryptography on graphics hardware. In Bart Preneel, editor, Progress in Cryptology - AFRICACRYPT 2009, volume 5580 of LNCS, pages 350–367. Springer, 2009.Google Scholar
  32. 32.
    Jens Hermans, Frederik Vercauteren, and Bart Preneel. Speed records for NTRU. In Josef Pieprzyk, editor, Topics in Cryptology - CT-RSA 2010, volume 5985 of LNCS, pages 73–88. Springer, 2010.Google Scholar
  33. 33.
    Yunqing Hou. asfermi: An assembler for the NVIDIA Fermi instruction set, 2011., Accessed Nov. 1, 2011.
  34. 34.
    Zhi Hu, Patrick Longa, and Maozhi Xu. Implementing 4-dimensional GLV method on GLS elliptic curves with \(j\)-invariant 0. Cryptology ePrint Archive, Report 2011/315, 2011.
  35. 35.
    Nigel Jacob and Carla Brodley. Offloading IDS computation to the GPU. In Proceedings of the 22nd Annual Computer Security Applications Conference, pp. 371–380. IEEE Computer Society, 2006.
  36. 36.
    Keon Jang, Sangjin Han, Seungyeop Han, Sue Moon, and KyoungSoo Park. SSLShader: cheap SSL acceleration with commodity processors. In David G. Andersen and Sylvia Ratnasamy, editors, Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’11). ACM Press, 2011.
  37. 37.
    Emilia Käsper and Peter Schwabe. Faster and timing-attack resistant AES-GCM. In Christophe Clavier and Kris Gaj, editors, Cryptographic Hardware and Embedded Systems - CHES 2009, volume 5747 of LNCS, pp. 1–17. Springer, 2009.
  38. 38.
    Kaspersky Lab. Kaspersky Lab utilizes NVIDIA technologies to enhance protection, 2009.
  39. 39.
    WebGL - OpenGL ES 2.0 for the web, 2011.
  40. 40.
    WebGL security, 2011., Accessed Nov. 4. 2011.
  41. 41.
    Robert Könighofer. A fast and cache-timing resistant implementation of the AES. In Tal Malkin, editor, Topics in Cryptology - CT-RSA 2008, volume 4964 of LNCS, pages 187–202. Springer, 2008.Google Scholar
  42. 42.
    Tanja Lange. CodingCrypto’s page on Engineyard’s programming contest, 2009., Accessed Nov. 5, 2011.
  43. 43.
    Jed Lengyel, Mark Reichert, Bruce R. Donald, and Donald P. Greenberg. Real-time robot motion planning using rasterizing computer graphics hardware. SIGGRAPH Computer Graphics, 24(4):327–335, 1990.
  44. 44.
    Svetlin A. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In 2007 IEEE International Conference on Signal Processing and Communications (ICSPC 2007), pages 65–68. IEEE, 2007.
  45. 45.
    Mitsuru Matsui. How far can we go on the x64 processors? In Matthew Robshaw, editor, Fast Software Encryption, volume 4047 of LNCS, pp. 341–358. Springer, 2006.
  46. 46.
  47. 47.
    Andrew Moss, Daniel Page, and Nigel P. Smart. Toward acceleration of RSA using 3d graphics hardware. In Steven D. Galbraith, editor, Cryptography and Coding, volume 4887 of LNCS, pp. 364–383. Springer, 2007.
  48. 48.
    Ruben Niederhagen. Calasm, 2011.
  49. 49.
    NVIDIA Corporation. Tuning CUDA Applications for Fermi, Version 1.0, 2010.
  50. 50.
    NVIDIA Corporation. NVIDIA CUDA - NVIDIA CUDA C Programming Guide, Version 4.0, 2011.
  51. 51.
    NVIDIA Corporation. OpenCL Programming Guide for the CUDA Architecture, 2011.
  52. 52.
    Dag Arne Osvik, Joppe W. Bos, Deian Stefan, and David Canright. Fast software AES encryption. In Seokhie Hong and Tetsu Iwata, editors,Fast Software Encryption, volume 6147 of LNCS, pages 75–93. Springer, 2010.Google Scholar
  53. 53.
    Ádám Rák. AMD-GPU-Asm-Disasm, 2011., Accessed Nov. 1, 2011.
  54. 54.
    Elizabeth Seamans and Thomas Alexander. Fast virus signature matching on the GPU. In Hubert Nguyen, editor,GPU Gems 3, pp. 771–784. Addison-Wesley, 2007., Accessed Nov. 1, 2011.
  55. 55.
    Robert Szerwinski and Tim Güneysu. Exploiting the power of GPUs for asymmetric cryptography. In Elisabeth Oswald and Pankaj Rohatgi, editors, Cryptographic Hardware and Embedded Systems -CHES 2008, volume 5154 of LNCS, pages 79–99. Springer, 2008.Google Scholar
  56. 56.
    Eran Tromer, Dag Arne Osvik, and Adi Shamir. Efficient cache attacks on AES, and countermeasures. Journal of Cryptology, 23(1):37–71, 2010.
  57. 57.
    Wladimir J. van der Laan. Cubin utilities, 2007., Accessed Nov. 1, 2011.
  58. 58.
    Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos P. Markatos, and Sotiris Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Richard Lippmann, Engin Kirda, and Ari Trachtenberg, editors, Recent Advances in Intrusion Detection, volume 5230 of LNCS, pp. 116–134. Springer, 2008.
  59. 59.
    Giorgos Vasiliadis and Sotiris Ioannidis. GrAVity: A massively parallel antivirus engine. In Somesh Jha, Robin Sommer, and Christian Kreibich, editors, Recent Advances In Intrusion Detection, LNCS, pp. 79–96. Springer, 2010.
  60. 60.
    Giorgos Vasiliadis, Michalis Polychronakis, and Sotiris Ioannidis. GPU-assisted malware. In Jean-Yves Marion, Noam Rathaus, and Cliff Zou, editors, Proceedings of the 5th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 2010. Scholar
  61. 61.
    Giorgos Vasiliadis, Michalis Polychronakis, and Sotiris Ioannidis. MIDeA: A multi-parallel intrusion detection architecture. In George Danezis and Vitaly Shmatikov, editors, Proceedings of the 18th ACM/SIGSAC Conference on Computer and Communications Security, pp. 297–308. ACM Press, 2011.
  62. 62.
    Takeshi Yamanouchi. AES encryption and decryption on the GPU. In Hubert Nguyen, editor, GPU Gems 3, pp. 785–804. Addison-Wesley, 2007., Accessed Nov. 1, 2011.
  63. 63.
    Jason Yang and James Goodman. Symmetric key cryptography on modern graphics hardware. In Kaoru Kurosawa, editor,  Advances in Cryptology - ASIACRYPT 2007, volume 4833 of LNCS, pages 249–264. Springer, 2007.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Digital Security GroupRadboud University NijmegenNijmegenThe Netherlands

Personalised recommendations