Abstract
Getting a lot of CPU power used to be an expensive undertaking. Servers with many cores cost a lot of money and consume large amounts of energy. The developments in hardware for mobile devices has resulted in a surge in relatively cheap, powerful, and low-energy CPUs. In this paper we show how to build a low-energy, eighty-core cluster built around twenty ODROID-C2 development boards for under 1500 USD. The ODROID-C2 is a 46 USD microcomputer that provides a 1.536 GHz quad-core Cortex-A53-based CPU and 2 GB of RAM. We investigate the cluster’s application to cryptanalysis by implementing Pollard’s Rho method to tackle the Certicom ECC2K-130 elliptic curve challenge. We optimise software from the Breaking ECC2K-130 technical report for the Cortex-A53. To do so, we show how to use microbenchmarking to derive the needed instruction characteristics which ARM neglected to document for the public. The implementation of the ECC2K-130 attack finally allows us to compare the proposed platform to various other platforms, including “classical” desktop CPUs, GPUs and FPGAs. Although it may still be slower than for example FPGAs, our cluster still provides a lot of value for money.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Including the Motorola Moto G5 Plus, Moto X Play, Samsung Galaxy A3 and A5, and HTC Desire 826: https://goo.gl/aZMky5.
- 2.
We should note that is important to remove the J2 jumper from the ODROID-C2 board when not powering it through USB: this saves a significant amount of energy.
References
Ansible. https://docs.ansible.com/ansible/. Accessed 22 June 2017
ARM Cortex-A Series Programmer’s Guide for ARMv8-A. Version 1.0. https://developer.arm.com/products/processors/cortex-a/cortex-a53/docs/den0024/latest/1-introduction. Accessed 22 June 2017
BCM2837 - Raspberry Pi documentation. https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md. Accessed 08 May 2017
ODROID-C2. http://www.hardkernel.com/main/products/prdt_info.php?g_code=G145457216438. Accessed 03 Apr 2017
Bailey, D.V., Batina, L., Bernstein, D.J., Birkner, P., Bos, J.W., Chen, H.-C., Cheng, C.-M., Damme, G.V., de Meulenaer, G., Perez, L.J.D., Fan, J., Güneysu, T., Gürkaynak, F., Kleinjung, T., Lange, T., Mentens, N., Niederhagen, R., Paar, C., Regazzoni, F., Schwabe, P., Uhsadel, L., Herrewege, A.V., Yang, B.-Y.: Breaking ECC2K-130. Cryptology ePrint Archive, Report 2009/514 (2009). https://eprint.iacr.org/2009/541/
Bernstein, D.J.: Minimum number of bit operations for multiplication. https://binary.cr.yp.to/m.html. Accessed 05 Apr 2017
Bernstein, D.J.: Batch binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_19
Bernstein, D.J., Chen, H.-C., Cheng, C.-M., Lange, T., Niederhagen, R., Schwabe, P., Yang, B.-Y.: ECC2K-130 on NVIDIA GPUs. In: Gong, G., Gupta, K.C. (eds.) INDOCRYPT 2010. LNCS, vol. 6498, pp. 328–346. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17401-8_23
Bernstein, J.D., Engels, S., Lange, T., Niederhagen, R., Paar, C., Schwabe, P., Zimmermann, R.: Faster discrete logarithms on fpgas (2016). http://cryptojedi.org/papers/#sect113r2
Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14
Certicom Corp: The Certicom ECC Challenge. https://www.certicom.com/content/certicom/en/the-certicom-ecc-challenge.html. Accessed 03 Apr 2017
Certicom Research. Certicom ECC Challenge. https://www.certicom.com/content/dam/certicom/images/pdfs/challenge-2009.pdf. Accessed 10 Nov 2009
Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., O’Brien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014). https://doi.org/10.1007/s10586-013-0282-7
Fan, J., Bailey, D.V., Batina, L., Guneysu, T., Paar, C., Verbauwhede, I.: Breaking elliptic curve cryptosystems using reconfigurable hardware. In: 2010 International Conference on Field Programmable Logic and Applications, pp. 133–138, 8 2010. https://doi.org/10.1109/FPL.2010.34
Hutter, M., Schwabe, P.: Multiprecision multiplication on AVR revisited. J. Cryptogr. Eng. 5(3), 201–214 (2015). http://cryptojedi.org/papers/#avrmul
Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. In: Soviet Physics Doklady, vol. 7, p. 595 (1963)
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Patel, N.: Sony says the 40GB PS3 is still using 90nm chips. https://www.engadget.com/2007/11/03/sony-says-the-40gb-ps3-is-still-using-90nm-chips/. Accessed 24 Aug 2017
Pollard, J.M.: Monte Carlo methods for index computation \((\operatorname{mod} p)\). Math. Comput. 32(143), 918–924 (1978)
TechInsights. Nintendo Switch teardown. http://techinsights.com/about-techinsights/overview/blog/nintendo-switch-teardown/. Accessed 08 May 2017
van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999). https://doi.org/10.1007/PL00003816
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Cortex-A53 Benchmarking Results
In this section we will provide an overview of our results with microbenchmarking. As described in Sect. 3.1 we measured the execution times of various instructions. We also looked at various combinations of instructions to learn about pipelining behaviour and execution units.
1.1 A.1 Operations On “Normal” Registers
Our findings for AArch64 instructions are shown in Table 6. When measuring two arithmetic instructions we noticed that they take the time as a single instruction. This suggests that the Cortex-A53 has two ALUs and thus can compute two arithmetic instructions at the same time. This does not hold for multiplication or the memory operations and we suspect that the architecture only has one multiplier and a single processing unit for memory access.
1.2 A.2 Operations On NEON Vector Registers
The NEON vector registers are available in different sizes. It is possible to access them as 64-bit vectors or as 128-bit vectors. Tables 7 and 8 give an overview of our results. For the 64-bit vectors we again notice that two arithmetic instructions run in the same time as a single instruction. This however is not the case with the 128-bit vectors. This suggests that there are two 64-bit execution units that are combined for the 128-bit values.
Load and store operations again do not execute in parallel and we suspect there is only one load-store-unit. It is possible to pair up an arithmetic operation with a load or store.
B The ECC2K-130 Challenge Parameters
The Certicom ECC2K-130 challenge is defined in [11, 12]. The challenge is to find integer k such that \(Q=[k]P\) on the Koblitz curve \(y^2 + xy = x^3 + 1\) defined over \(\mathbb {F}_{2^{131}}\). The group order \(|E(\mathbb {F}_{2^{131}})| = 4l\), where l is the 129-bit prime number
The coordinates of P and Q are given in a polynomial-basis representation of \(F_2[z]/(F)\) where \(F(z) = z^{131} + z^{13} + z^{2} + z + 1\). They are represented below as hexadecimal bit strings with respect to this basis.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wiggers, T. (2019). Energy-Efficient ARM64 Cluster with Cryptanalytic Applications. In: Lange, T., Dunkelman, O. (eds) Progress in Cryptology – LATINCRYPT 2017. LATINCRYPT 2017. Lecture Notes in Computer Science(), vol 11368. Springer, Cham. https://doi.org/10.1007/978-3-030-25283-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-25283-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25282-3
Online ISBN: 978-3-030-25283-0
eBook Packages: Computer ScienceComputer Science (R0)