How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4 Processors

  • Mitsuru Matsui
  • Sayaka Fukuda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3557)


This paper discusses the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 for Pentium III runs at the rate of 2.75 íops/cycle, which seems the most efficient code ever made for a real-world cipher primitive. For FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. Finally we consider fast implementation of SHA512 and Whirlpool, two hash functions with a genuine 64-bit architecture. It will be shown that new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.


Hash Function Lookup Table Block Cipher Advance Encryption Standard Assembly Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Anderson, R., Biham, E., Knudsen, L.: Serpent: A proposal for the Advanced Encryption Standard,
  2. 2.
    Aoki, K., Lipmaa, H.: Fast Implementations of AES Candidates. In: Proceedings of The Third AES Candidate Conference (2000), Available at
  3. 3.
  4. 4.
    Ekdahl, P., Johansson, T.: A new version of the stream cipher SNOW. In: Nyberg, K., Heys, H.M. (eds.) SAC 2002. LNCS, vol. 2595, pp. 47–61. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Ekdahl, P.: SNOW Homepage,
  6. 6.
    Federal Information Processing Standards Publication 197, Advanced Encryption Standard (AES), NIST (2001)Google Scholar
  7. 7.
    Federal Information Processing Standards Publication 180-2, Secure Hash Standard, NIST (2002)Google Scholar
  8. 8.
    Fog, A.: How To Optimize for Pentium Family Processors, Available at
  9. 9.
    Gladman, B.: Serpent Performance, Available at
  10. 10.
    IA-32 Intel Architecture Optimization Reference Manual, Order Number 248966-011,
  11. 11.
    Junod, P., Vaudenay, S.: FOX: A new family of block ciphers. In: Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 131–146. Springer, Heidelberg (2004)Google Scholar
  12. 12.
    Lipmaa, H.: Fast software implementations of SC2000. In: Chan, A.H., Gligor, V.D. (eds.) ISC 2002. LNCS, vol. 2433, pp. 63–74. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Lipmaa, H.: AES / Rijndael: speed,
  14. 14.
    Nakajima, J., Matsui, M.: Performance Analysis and Parallel Implementation of Dedicated Hash Functions on Pentium III. IEICE Trans. Fundamentals E86-A(1), 54–63 (2003)Google Scholar
  15. 15.
    New European Schemes for Signatures, Integrity, and Encryption (NESSIE),

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mitsuru Matsui
    • 1
  • Sayaka Fukuda
    • 1
  1. 1.Information Technology R&D CenterMitsubishi Electric CorporationKamakuraJapan

Personalised recommendations