Parallel Implementations of CHAM
In this paper, we presented novel parallel implementations of CHAM-64/128 block cipher on modern ARM-NEON processors. In order to accelerate the performance of the implementation of CHAM-64/128 block cipher, the full specifications of ARM-NEON processors are utilized in terms of instruction set and multiple cores. First, the SIMD feature of ARM processor is fully utilized. The modern ARM processor provides \(2\times 16\)-bit vectorized instruction. By using the instruction sets and full register files, total 4 CHAM-64/128 encryptions are performed at once in data parallel way. Second, the dedicated SIMD instruction sets, namely NEON engine, is fully exploited. The NEON engine supports \(8\times 16\)-bit vectorized instruction over 128-bit Q registers. The 24 CHAM-64/128 encryptions are performed at once in data parallel way. Third, both ARM and NEON instruction sets are well re-ordered in interleaved way. This mixed approach hides the pipeline stalls between each instruction set. Fourth, the multiple cores are exploited to maximize the performance in thread level. Finally, we achieved the 4.2 cycles/byte for implementation of CHAM-64/128 on ARM-NEON processors. This result is competitive to the parallel implementation of LEA-128/128 and HIGHT-64/128 on same processor.
KeywordsCHAM ARM-NEON processor Parallel computation Software implementation
This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1C1B5075742) and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (2014-1-00743) supervised by the IITP (Institute for Information & communications Technology Promotion). Zhi Hu was partially supported by the Natural Science Foundation of China (Grant No. 61602526).
- 2.Beaulieu, R., Treatman-Clark, S., Shors, D., Weeks, B., Smith, J., Wingers, L.: The SIMON and SPECK lightweight block ciphers. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2015)Google Scholar
- 6.Faz-Hernández, A., Longa, P., Sánchez, A.H.: Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves. In: Benaloh, J. (ed.) CT-RSA 2014. LNCS, vol. 8366, pp. 1–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04852-9_1CrossRefGoogle Scholar
- 8.Hong, D., Lee, J.-K., Kim, D.-C., Kwon, D., Ryu, K.H., Lee, D.-G.: LEA: a 128-bit block cipher for fast encryption on common processors. In: Kim, Y., Lee, H., Perrig, A. (eds.) WISA 2013. LNCS, vol. 8267, pp. 3–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05149-9_1CrossRefGoogle Scholar
- 10.Koo, B., Roh, D., Kim, H., Jung, Y., Lee, D.-G., Kwon, D.: CHAM: a family of lightweight block ciphers for resource-constrained devices. In: International Conference on Information Security and Cryptology, ICISC 2017 (2017)Google Scholar
- 11.Liu, Z., Azarderakhsh, R., Kim, H., Seo, H.: Efficient software implementation of ring-LWE encryption on IoT processors. IEEE Trans. Comput. (2017)Google Scholar
- 14.Park, T., Seo, H., Kim, H.: Parallel implementations of SIMON and SPECK. In: 2016 International Conference on Platform Technology and Service (PlatCon), pp. 1–6. IEEE (2016)Google Scholar
- 15.Seo, H., Jeong, I., Lee, J., Kim, W.-H.: Compact implementations of ARX-based block ciphers on IoT processors. ACM Trans. Embed. Comput. Syst. (TECS) 17(3), 60 (2018)Google Scholar