Energy-Efficient ARM64 Cluster with Cryptanalytic Applications

Wiggers, Thom

doi:10.1007/978-3-030-25283-0_10

Thom Wiggers¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11368))

Included in the following conference series:

International Conference on Cryptology and Information Security in Latin America

424 Accesses
1 Citations

Abstract

Getting a lot of CPU power used to be an expensive undertaking. Servers with many cores cost a lot of money and consume large amounts of energy. The developments in hardware for mobile devices has resulted in a surge in relatively cheap, powerful, and low-energy CPUs. In this paper we show how to build a low-energy, eighty-core cluster built around twenty ODROID-C2 development boards for under 1500 USD. The ODROID-C2 is a 46 USD microcomputer that provides a 1.536 GHz quad-core Cortex-A53-based CPU and 2 GB of RAM. We investigate the cluster’s application to cryptanalysis by implementing Pollard’s Rho method to tackle the Certicom ECC2K-130 elliptic curve challenge. We optimise software from the Breaking ECC2K-130 technical report for the Cortex-A53. To do so, we show how to use microbenchmarking to derive the needed instruction characteristics which ARM neglected to document for the public. The implementation of the ECC2K-130 attack finally allows us to compare the proposed platform to various other platforms, including “classical” desktop CPUs, GPUs and FPGAs. Although it may still be slower than for example FPGAs, our cluster still provides a lot of value for money.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Including the Motorola Moto G5 Plus, Moto X Play, Samsung Galaxy A3 and A5, and HTC Desire 826: https://goo.gl/aZMky5.
2.
We should note that is important to remove the J2 jumper from the ODROID-C2 board when not powering it through USB: this saves a significant amount of energy.

References

Ansible. https://docs.ansible.com/ansible/. Accessed 22 June 2017
ARM Cortex-A Series Programmer’s Guide for ARMv8-A. Version 1.0. https://developer.arm.com/products/processors/cortex-a/cortex-a53/docs/den0024/latest/1-introduction. Accessed 22 June 2017
BCM2837 - Raspberry Pi documentation. https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md. Accessed 08 May 2017
ODROID-C2. http://www.hardkernel.com/main/products/prdt_info.php?g_code=G145457216438. Accessed 03 Apr 2017
Bailey, D.V., Batina, L., Bernstein, D.J., Birkner, P., Bos, J.W., Chen, H.-C., Cheng, C.-M., Damme, G.V., de Meulenaer, G., Perez, L.J.D., Fan, J., Güneysu, T., Gürkaynak, F., Kleinjung, T., Lange, T., Mentens, N., Niederhagen, R., Paar, C., Regazzoni, F., Schwabe, P., Uhsadel, L., Herrewege, A.V., Yang, B.-Y.: Breaking ECC2K-130. Cryptology ePrint Archive, Report 2009/514 (2009). https://eprint.iacr.org/2009/541/
Bernstein, D.J.: Minimum number of bit operations for multiplication. https://binary.cr.yp.to/m.html. Accessed 05 Apr 2017
Bernstein, D.J.: Batch binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_19
Chapter Google Scholar
Bernstein, D.J., Chen, H.-C., Cheng, C.-M., Lange, T., Niederhagen, R., Schwabe, P., Yang, B.-Y.: ECC2K-130 on NVIDIA GPUs. In: Gong, G., Gupta, K.C. (eds.) INDOCRYPT 2010. LNCS, vol. 6498, pp. 328–346. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17401-8_23
Chapter Google Scholar
Bernstein, J.D., Engels, S., Lange, T., Niederhagen, R., Paar, C., Schwabe, P., Zimmermann, R.: Faster discrete logarithms on fpgas (2016). http://cryptojedi.org/papers/#sect113r2
Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14
Chapter Google Scholar
Certicom Corp: The Certicom ECC Challenge. https://www.certicom.com/content/certicom/en/the-certicom-ecc-challenge.html. Accessed 03 Apr 2017
Certicom Research. Certicom ECC Challenge. https://www.certicom.com/content/dam/certicom/images/pdfs/challenge-2009.pdf. Accessed 10 Nov 2009
Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., O’Brien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014). https://doi.org/10.1007/s10586-013-0282-7
Article Google Scholar
Fan, J., Bailey, D.V., Batina, L., Guneysu, T., Paar, C., Verbauwhede, I.: Breaking elliptic curve cryptosystems using reconfigurable hardware. In: 2010 International Conference on Field Programmable Logic and Applications, pp. 133–138, 8 2010. https://doi.org/10.1109/FPL.2010.34
Hutter, M., Schwabe, P.: Multiprecision multiplication on AVR revisited. J. Cryptogr. Eng. 5(3), 201–214 (2015). http://cryptojedi.org/papers/#avrmul
Article Google Scholar
Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. In: Soviet Physics Doklady, vol. 7, p. 595 (1963)
Google Scholar
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Article MathSciNet Google Scholar
Patel, N.: Sony says the 40GB PS3 is still using 90nm chips. https://www.engadget.com/2007/11/03/sony-says-the-40gb-ps3-is-still-using-90nm-chips/. Accessed 24 Aug 2017
Pollard, J.M.: Monte Carlo methods for index computation $(\operatorname{mod} p)$. Math. Comput. 32(143), 918–924 (1978)
MathSciNet MATH Google Scholar
TechInsights. Nintendo Switch teardown. http://techinsights.com/about-techinsights/overview/blog/nintendo-switch-teardown/. Accessed 08 May 2017
van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999). https://doi.org/10.1007/PL00003816
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing and Information Science, Radboud University, Nijmegen, The Netherlands
Thom Wiggers

Authors

Thom Wiggers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thom Wiggers .

Editor information

Editors and Affiliations

Technische Universiteit Eindhoven, Eindhoven, The Netherlands
Tanja Lange
University of Haifa, Haifa, Israel
Orr Dunkelman

Appendices

A Cortex-A53 Benchmarking Results

In this section we will provide an overview of our results with microbenchmarking. As described in Sect. 3.1 we measured the execution times of various instructions. We also looked at various combinations of instructions to learn about pipelining behaviour and execution units.

1.1 A.1 Operations On “Normal” Registers

Our findings for AArch64 instructions are shown in Table 6. When measuring two arithmetic instructions we noticed that they take the time as a single instruction. This suggests that the Cortex-A53 has two ALUs and thus can compute two arithmetic instructions at the same time. This does not hold for multiplication or the memory operations and we suspect that the architecture only has one multiplier and a single processing unit for memory access.

Table 6. Hypothesised instruction characteristics for instructions operating on registers. Latencies are including the issue cycles. Many instructions can be dual-issued.

Full size table

1.2 A.2 Operations On NEON Vector Registers

The NEON vector registers are available in different sizes. It is possible to access them as 64-bit vectors or as 128-bit vectors. Tables 7 and 8 give an overview of our results. For the 64-bit vectors we again notice that two arithmetic instructions run in the same time as a single instruction. This however is not the case with the 128-bit vectors. This suggests that there are two 64-bit execution units that are combined for the 128-bit values.

Load and store operations again do not execute in parallel and we suspect there is only one load-store-unit. It is possible to pair up an arithmetic operation with a load or store.

B The ECC2K-130 Challenge Parameters

The Certicom ECC2K-130 challenge is defined in [11, 12]. The challenge is to find integer k such that $Q=[k]P$ on the Koblitz curve $y^2 + xy = x^3 + 1$ defined over $\mathbb {F}_{2^{131}}$. The group order $|E(\mathbb {F}_{2^{131}})| = 4l$, where l is the 129-bit prime number

$$ l = 680564733841876926932320129493409985129. $$

The coordinates of P and Q are given in a polynomial-basis representation of $F_2[z]/(F)$ where $F(z) = z^{131} + z^{13} + z^{2} + z + 1$. They are represented below as hexadecimal bit strings with respect to this basis.

$$\begin{aligned} P_x&= \mathtt {05 1C99BFA6 F18DE467 C80C23B9 8C7994AA} \\ P_y&= \mathtt {04 2EA2D112 ECEC71FC F7E000D7 EFC978BD} \\ Q_x&= \mathtt {06 C997F3E7 F2C66A4A 5D2FDA13 756A37B1} \\ Q_y&= \mathtt {04 A38D1182 9D32D347 BD0C0F58 4D546E9A} \\ \end{aligned}$$

Table 7. Hypothesised instruction characteristics for instructions operating on 64-bit vectors. Latencies are including the issue cycles. Arithmetic operations can be issued together with other arithmetic instructions or with a load or store operation.

Full size table

Table 8. Hypothesised instruction characteristics for instructions operating on 128-bit vectors. Latencies are including the issue cycles.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiggers, T. (2019). Energy-Efficient ARM64 Cluster with Cryptanalytic Applications. In: Lange, T., Dunkelman, O. (eds) Progress in Cryptology – LATINCRYPT 2017. LATINCRYPT 2017. Lecture Notes in Computer Science(), vol 11368. Springer, Cham. https://doi.org/10.1007/978-3-030-25283-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-25283-0_10
Published: 20 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25282-3
Online ISBN: 978-3-030-25283-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Energy-Efficient ARM64 Cluster with Cryptanalytic Applications

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Cortex-A53 Benchmarking Results

1.1 A.1 Operations On “Normal” Registers

1.2 A.2 Operations On NEON Vector Registers

B The ECC2K-130 Challenge Parameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation