Skip to main content

Energy-Efficient ARM64 Cluster with Cryptanalytic Applications

80 Cores That Do Not Cost You an ARM and a Leg

  • Conference paper
  • First Online:
Progress in Cryptology – LATINCRYPT 2017 (LATINCRYPT 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11368))

Abstract

Getting a lot of CPU power used to be an expensive undertaking. Servers with many cores cost a lot of money and consume large amounts of energy. The developments in hardware for mobile devices has resulted in a surge in relatively cheap, powerful, and low-energy CPUs. In this paper we show how to build a low-energy, eighty-core cluster built around twenty ODROID-C2 development boards for under 1500 USD. The ODROID-C2 is a 46 USD microcomputer that provides a 1.536 GHz quad-core Cortex-A53-based CPU and 2 GB of RAM. We investigate the cluster’s application to cryptanalysis by implementing Pollard’s Rho method to tackle the Certicom ECC2K-130 elliptic curve challenge. We optimise software from the Breaking ECC2K-130 technical report for the Cortex-A53. To do so, we show how to use microbenchmarking to derive the needed instruction characteristics which ARM neglected to document for the public. The implementation of the ECC2K-130 attack finally allows us to compare the proposed platform to various other platforms, including “classical” desktop CPUs, GPUs and FPGAs. Although it may still be slower than for example FPGAs, our cluster still provides a lot of value for money.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Including the Motorola Moto G5 Plus, Moto X Play, Samsung Galaxy A3 and A5, and HTC Desire 826: https://goo.gl/aZMky5.

  2. 2.

    We should note that is important to remove the J2 jumper from the ODROID-C2 board when not powering it through USB: this saves a significant amount of energy.

References

  1. Ansible. https://docs.ansible.com/ansible/. Accessed 22 June 2017

  2. ARM Cortex-A Series Programmer’s Guide for ARMv8-A. Version 1.0. https://developer.arm.com/products/processors/cortex-a/cortex-a53/docs/den0024/latest/1-introduction. Accessed 22 June 2017

  3. BCM2837 - Raspberry Pi documentation. https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md. Accessed 08 May 2017

  4. ODROID-C2. http://www.hardkernel.com/main/products/prdt_info.php?g_code=G145457216438. Accessed 03 Apr 2017

  5. Bailey, D.V., Batina, L., Bernstein, D.J., Birkner, P., Bos, J.W., Chen, H.-C., Cheng, C.-M., Damme, G.V., de Meulenaer, G., Perez, L.J.D., Fan, J., Güneysu, T., Gürkaynak, F., Kleinjung, T., Lange, T., Mentens, N., Niederhagen, R., Paar, C., Regazzoni, F., Schwabe, P., Uhsadel, L., Herrewege, A.V., Yang, B.-Y.: Breaking ECC2K-130. Cryptology ePrint Archive, Report 2009/514 (2009). https://eprint.iacr.org/2009/541/

  6. Bernstein, D.J.: Minimum number of bit operations for multiplication. https://binary.cr.yp.to/m.html. Accessed 05 Apr 2017

  7. Bernstein, D.J.: Batch binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_19

    Chapter  Google Scholar 

  8. Bernstein, D.J., Chen, H.-C., Cheng, C.-M., Lange, T., Niederhagen, R., Schwabe, P., Yang, B.-Y.: ECC2K-130 on NVIDIA GPUs. In: Gong, G., Gupta, K.C. (eds.) INDOCRYPT 2010. LNCS, vol. 6498, pp. 328–346. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17401-8_23

    Chapter  Google Scholar 

  9. Bernstein, J.D., Engels, S., Lange, T., Niederhagen, R., Paar, C., Schwabe, P., Zimmermann, R.: Faster discrete logarithms on fpgas (2016). http://cryptojedi.org/papers/#sect113r2

  10. Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14

    Chapter  Google Scholar 

  11. Certicom Corp: The Certicom ECC Challenge. https://www.certicom.com/content/certicom/en/the-certicom-ecc-challenge.html. Accessed 03 Apr 2017

  12. Certicom Research. Certicom ECC Challenge. https://www.certicom.com/content/dam/certicom/images/pdfs/challenge-2009.pdf. Accessed 10 Nov 2009

  13. Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., O’Brien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014). https://doi.org/10.1007/s10586-013-0282-7

    Article  Google Scholar 

  14. Fan, J., Bailey, D.V., Batina, L., Guneysu, T., Paar, C., Verbauwhede, I.: Breaking elliptic curve cryptosystems using reconfigurable hardware. In: 2010 International Conference on Field Programmable Logic and Applications, pp. 133–138, 8 2010. https://doi.org/10.1109/FPL.2010.34

  15. Hutter, M., Schwabe, P.: Multiprecision multiplication on AVR revisited. J. Cryptogr. Eng. 5(3), 201–214 (2015). http://cryptojedi.org/papers/#avrmul

    Article  Google Scholar 

  16. Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. In: Soviet Physics Doklady, vol. 7, p. 595 (1963)

    Google Scholar 

  17. Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)

    Article  MathSciNet  Google Scholar 

  18. Patel, N.: Sony says the 40GB PS3 is still using 90nm chips. https://www.engadget.com/2007/11/03/sony-says-the-40gb-ps3-is-still-using-90nm-chips/. Accessed 24 Aug 2017

  19. Pollard, J.M.: Monte Carlo methods for index computation \((\operatorname{mod} p)\). Math. Comput. 32(143), 918–924 (1978)

    MathSciNet  MATH  Google Scholar 

  20. TechInsights. Nintendo Switch teardown. http://techinsights.com/about-techinsights/overview/blog/nintendo-switch-teardown/. Accessed 08 May 2017

  21. van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999). https://doi.org/10.1007/PL00003816

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thom Wiggers .

Editor information

Editors and Affiliations

Appendices

A Cortex-A53 Benchmarking Results

In this section we will provide an overview of our results with microbenchmarking. As described in Sect. 3.1 we measured the execution times of various instructions. We also looked at various combinations of instructions to learn about pipelining behaviour and execution units.

1.1 A.1 Operations On “Normal” Registers

Our findings for AArch64 instructions are shown in Table 6. When measuring two arithmetic instructions we noticed that they take the time as a single instruction. This suggests that the Cortex-A53 has two ALUs and thus can compute two arithmetic instructions at the same time. This does not hold for multiplication or the memory operations and we suspect that the architecture only has one multiplier and a single processing unit for memory access.

Table 6. Hypothesised instruction characteristics for instructions operating on registers. Latencies are including the issue cycles. Many instructions can be dual-issued.

1.2 A.2 Operations On NEON Vector Registers

The NEON vector registers are available in different sizes. It is possible to access them as 64-bit vectors or as 128-bit vectors. Tables 7 and 8 give an overview of our results. For the 64-bit vectors we again notice that two arithmetic instructions run in the same time as a single instruction. This however is not the case with the 128-bit vectors. This suggests that there are two 64-bit execution units that are combined for the 128-bit values.

Load and store operations again do not execute in parallel and we suspect there is only one load-store-unit. It is possible to pair up an arithmetic operation with a load or store.

B The ECC2K-130 Challenge Parameters

The Certicom ECC2K-130 challenge is defined in [11, 12]. The challenge is to find integer k such that \(Q=[k]P\) on the Koblitz curve \(y^2 + xy = x^3 + 1\) defined over \(\mathbb {F}_{2^{131}}\). The group order \(|E(\mathbb {F}_{2^{131}})| = 4l\), where l is the 129-bit prime number

$$ l = 680564733841876926932320129493409985129. $$

The coordinates of P and Q are given in a polynomial-basis representation of \(F_2[z]/(F)\) where \(F(z) = z^{131} + z^{13} + z^{2} + z + 1\). They are represented below as hexadecimal bit strings with respect to this basis.

$$\begin{aligned} P_x&= \mathtt {05 1C99BFA6 F18DE467 C80C23B9 8C7994AA} \\ P_y&= \mathtt {04 2EA2D112 ECEC71FC F7E000D7 EFC978BD} \\ Q_x&= \mathtt {06 C997F3E7 F2C66A4A 5D2FDA13 756A37B1} \\ Q_y&= \mathtt {04 A38D1182 9D32D347 BD0C0F58 4D546E9A} \\ \end{aligned}$$
Table 7. Hypothesised instruction characteristics for instructions operating on 64-bit vectors. Latencies are including the issue cycles. Arithmetic operations can be issued together with other arithmetic instructions or with a load or store operation.
Table 8. Hypothesised instruction characteristics for instructions operating on 128-bit vectors. Latencies are including the issue cycles.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wiggers, T. (2019). Energy-Efficient ARM64 Cluster with Cryptanalytic Applications. In: Lange, T., Dunkelman, O. (eds) Progress in Cryptology – LATINCRYPT 2017. LATINCRYPT 2017. Lecture Notes in Computer Science(), vol 11368. Springer, Cham. https://doi.org/10.1007/978-3-030-25283-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25283-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25282-3

  • Online ISBN: 978-3-030-25283-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics