High-Performance Symmetric Cryptography Server with GPU Acceleration

  • Wangzhao Cheng
  • Fangyu Zheng
  • Wuqiong Pan
  • Jingqiang Lin
  • Huorong Li
  • Bingyu Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10631)

Abstract

With more and more sensitive and private data transferred on the Internet, various security protocols have been developed to secure end-to-end communication. However, in practical situations, applying these protocols would decline the overall performance of the whole system, of which frequently-used symmetric cryptographic operations on the server side is the bottleneck. In this contribution, we present a high-performance symmetric cryptography server. Firstly, a symmetric algorithm SM4 is carefully scheduled in GPUs, including instruction-level implementation and variable location improvement. Secondly, optimization methods is provided to speed up the inefficient data transfer between CPU and GPU. Finally, the overall server architecture is adopted for mass data encryption, which can deliver 15.96 Gbps data encryption through network, 1.23 times of the existing fastest symmetric cryptographic server. Furthermore, the server can be boosted by 2.02 times with the high-speed pre-calculation technique for long-term-key applications such as IPSec VPN gateways.

Keywords

Symmetric Cryptographic Algorithm Graphics Processing Unit (GPU) CUDA SM4 implementation Symmetric cryptography server Performance 

References

  1. 1.
    Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. (TOG) 22(3), 917–924 (2003)CrossRefGoogle Scholar
  2. 2.
    Erickson, J., Ding, J., Christensen, C.: Algebraic cryptanalysis of SMS4: Gröbner basis attack and SAT attack compared. In: Lee, D., Hong, S. (eds.) ICISC 2009. LNCS, vol. 5984, pp. 73–86. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14423-3_6CrossRefGoogle Scholar
  3. 3.
    Etrog, J., Robshaw, M.J.B.: The cryptanalysis of reduced-round SMS4. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 51–65. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04159-4_4CrossRefGoogle Scholar
  4. 4.
    Gilger, J., Barnickel, J., Meyer, U.: GPU-acceleration of block ciphers in the OpenSSL cryptographic library. In: Gollmann, D., Freiling, F.C. (eds.) ISC 2012. LNCS, vol. 7483, pp. 338–353. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33383-5_21CrossRefGoogle Scholar
  5. 5.
    Harrison, O., Waldron, J.: Practical symmetric key cryptography on modern graphics hardware. In: USENIX Security Symposium, vol. 2008 (2008)Google Scholar
  6. 6.
    Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: cheap SSL acceleration with commodity processors. In: NSDI (2011)Google Scholar
  7. 7.
    Jiang, Z.H., Fei, Y., Kaeli, D.: A complete key recovery timing attack on a GPU. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 394–405. IEEE (2016)Google Scholar
  8. 8.
    Li, C., Wu, H., Chen, S., Li, X., Guo, D.: Efficient implementation for MD5-RC4 encryption using GPU with CUDA. In: 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication. ASID 2009, pp. 167–170. IEEE (2009)Google Scholar
  9. 9.
    Liu, F., Ji, W., Hu, L., Ding, J., Lv, S., Pyshkin, A., Weinmann, R.-P.: Analysis of the SMS4 block cipher. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP 2007. LNCS, vol. 4586, pp. 158–170. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-73458-1_13CrossRefGoogle Scholar
  10. 10.
    Luken, B.P., Ouyang, M., Desoky, A.H.: AES and DES encryption with GPU. In: ISCA PDCCS, pp. 67–70 (2009)Google Scholar
  11. 11.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications. ICSPC 2007, pp. 65–68. IEEE (2007)Google Scholar
  12. 12.
    Martínez-Herrera, A.F., Mancillas-López, C., Mex-Perera, C.: GCM implementations of Camellia-128 and SMS4 by optimizing the polynomial multiplier. Microprocess. Microsyst. 45, 129–140 (2016)CrossRefGoogle Scholar
  13. 13.
    Nishikawa, N., Amano, H., Iwai, K.: Implementation of bitsliced AES encryption on CUDA-enabled GPU. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds.) NSS 2017. LNCS, vol. 10394, pp. 273–287. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64701-2_20CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Pan, W., Zheng, F., Zhao, Y., Zhu, W.-T., Jing, J.: An efficient elliptic curve cryptography signature server with GPU acceleration. IEEE Trans. Inf. Forensics Secur. 12(1), 111–122 (2017)CrossRefGoogle Scholar
  16. 16.
    Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of GPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-13257-0_12CrossRefGoogle Scholar
  17. 17.
    Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the potential of GPUs for modular multiplication in ECC. In: Rhee, K.-H., Yi, J.H. (eds.) WISA 2014. LNCS, vol. 8909, pp. 295–306. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15087-1_23CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Wangzhao Cheng
    • 1
    • 2
    • 3
  • Fangyu Zheng
    • 1
    • 2
  • Wuqiong Pan
    • 1
    • 2
  • Jingqiang Lin
    • 1
    • 2
  • Huorong Li
    • 1
    • 2
    • 3
  • Bingyu Li
    • 1
    • 2
    • 3
  1. 1.Data Assurance and Communication Security Research CenterBeijingChina
  2. 2.State Key Laboratory of Information SecurityInstitute of Information Engineering, CASBeijingChina
  3. 3.School of Cyber SecurityUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations