Abstract
LEA is a new lightweight and low-power encryption algorithm. This algorithm has a certain useful features which are especially suitable for parallel hardware and software implementations, i.e., simple ARX operations, non-S-BOX architecture, and 32-bit word size. In this paper we evaluate the performance of the LEA algorithm on ARM-NEON and GPUs by taking advantage of both the desirable features of LEA and a parallel computing platform and programming model by NEON and CUDA. Specifically, we propose novel parallel LEA implementations on representative SIMT and SIMD architectures such as CUDA and NEON. In case of CUDA, we firstly designed a thread-based computation model to fall into functional parallelism by computing several encryptions over one thread. To alleviate the memory transfer delay, we allocate memory to satisfy coalescing memory access. Secondly our method is block cipher implementation written in assembly language, which provides efficient and flexible programming environments. With these optimization techniques, we achieved 17.352 and 2.5 GBps (bytes per second) throughput without/with memory transfer. In case of NEON, we adopted pipeline instructions and SIMD-based execution models, which enhanced encryption by 49.85 % compared to previous ARM implementations.
Keywords
This work was supported by the Industrial Strategic Technology Development Program (No. 10043907, Development of high performance IoT device and Open Platform with Intelligent Software) funded by the Ministry of Science, ICT & Future Planning (MSIF, Korea).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
NEON. http://www.arm.com/products/processors/technologies/neon.php. Accessed 2013
Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012)
Sánchez, A.H., Rodríguez-Henríquez, F.: NEON implementation of an attribute-based encryption scheme. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 322–338. Springer, Heidelberg (2013)
Nvidia: CUDA C programming guide (2012)
Nvidia: CUDA best practices guide (2012)
Hong, D., Lee, J.-K., Kim, D.-C., Kwon, D., Ryu, K.H., Lee, D.-G.: LEA: a 128-bit block cipher for fast encryption on common processors. In: Kim, Y., Lee, H., Perrig, A. (eds.) WISA 2013. LNCS, vol. 8267, pp. 1–24. Springer, Heidelberg (2014)
Nvidia: Parallel thread execution ISA version 3.1. http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf. Accessed 2013
Scott, M., Szczechowiak, P.: Optimizing multiprecision multiplication for public key cryptography. IACR Cryptology ePrint Archive 2007:299 (2007)
Intel Corporation. http://ark.intel.com/. Accessed 2013
Iwai, K., Kurokawa, T., Nisikawa, N.: AES encryption implementation on CUDA GPU and its analysis. In: 2010 First International Conference on Networking and Computing (ICNC), pp. 209–214. IEEE (2010)
Stefan, D.: Analysis and Implementation of eSTREAM and SHA-3 Cryptographic Algorithms. Ph.D. dissertation, COOPER UNION (2011)
Neves, S., Arajo, F.: Cryptography in GPUs. Ph.D. dissertation, Masters thesis, Universidade de Coimbra, Coimbra (2009)
Iwai, K., Nishikawa, N., Kurokawa, T.: Acceleration of AES encryption on CUDA GPU. Int. J. Netw. Comput. 2(1), 131 (2012)
Khalid, A., Paul, G., Chattopadhyay, A.: New speed records for Salsa20 stream cipher using an autotuning framework on GPUs. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013. LNCS, vol. 7918, pp. 189–207. Springer, Heidelberg (2013)
Liu, G., An, H., Han, W., Xu, G., Yao, P., Xu, M., Hao, X., Wang, Y.: A program behavior study of block cryptography algorithms on GPGPU. In: Fourth International Conference on Frontier of Computer Science and Technology, 2009 FCST’09, pp. 33–39. IEEE (2009)
Di Biagio, A., Barenghi, A., Agosta, G., Pelosi, G.: Design of a parallel AES for graphics hardware using the CUDA framework. In: IEEE International Symposium on Parallel & Distributed Processing, 2009. IPDPS 2009, pp. 1–8. IEEE (2009)
Bernstein, D.J., Chen, H.-C., Cheng, C.-M., Lange, T., Niederhagen, R., Schwabe, P., Yang, B.-Y.: Usable assembly language for GPUs: a success story. IACR Cryptology ePrint Archive 2012:137 (2012)
Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Boston (2012)
Benchmarking the new Kepler (GTX 680). http://blog.accelereyes.com/blog/2012/04/26/benchmarking-kepler-gtx-680/. Accessed 2013
GeForce GTX 680 2 GB review: Kepler sends Tahiti on vacation. http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161-15.html. Accessed 2013
GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580. http://wili.cc/blog/gpgpu-faceoff.html. Accessed 2013
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications, 2007, ICSPC 2007, pp. 65–68. IEEE (2007)
Holzer-Graf, S., Krinninger, T., Pernull, M., Schläffer, M., Schwabe, P., Seywald, D., Wieser, W.: Efficient vector implementations of AES-based designs: a case study and new implemenations for Grøstl. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 145–161. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Seo, H. et al. (2014). Parallel Implementations of LEA. In: Lee, HS., Han, DG. (eds) Information Security and Cryptology -- ICISC 2013. ICISC 2013. Lecture Notes in Computer Science(), vol 8565. Springer, Cham. https://doi.org/10.1007/978-3-319-12160-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-12160-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12159-8
Online ISBN: 978-3-319-12160-4
eBook Packages: Computer ScienceComputer Science (R0)