Exploiting Symmetries of Small Prime-Sized DFTs

  • Doru Thom PopoviciEmail author
  • Devangi N. Parikh
  • Daniele G. Spampinato
  • Tze Meng Low
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12043)


Small prime-sized discrete Fourier transforms appear in various applications from quantum mechanics, material sciences and machine learning. The typical implementation of the discrete Fourier transform for such problem sizes is done as a cyclic convolution using algorithms like Rader or Bluestein. However, these approaches exhibit extra computation and expensive data movement. In this work, we present an alternative method by casting the Fourier transform as a direct symmetric matrix-vector multiplication. Exploiting the symmetries of the Fourier matrix and using knowledge from dense linear algebra, we present an implementation that reduces the amount of computation and requires less memory usage. We show that this approach achieves up to 2x performance gains on Intel and AMD architectures, compared to implementations offered by Intel MKL and FFTW that use Rader and Bluestein.


Prime-sized DFTs Rader algorithm Bluestein algorithm Symmetric matrix-vector multiplication 


  1. 1.
    Bluestein, L.: A linear filtering approach to the computation of discrete Fourier transform. IEEE Trans. Audio Electroacoust. 18, 451–455 (1970)Google Scholar
  2. 2.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)Google Scholar
  3. 3.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005). Special issue on “Program Generation, Optimization, and Adaptation”CrossRefGoogle Scholar
  4. 4.
    Intel: Math Kernel Library (2018).
  5. 5.
    Lebensohn, R.A., Kanjarla, A.K., Eisenlohr, P.: An elasto-viscoplastic formulation based on fast Fourier transforms for the prediction of micromechanical fields in polycrystalline materials. Int. J. Plast. 32, 59–69 (2012)Google Scholar
  6. 6.
    Popovici, D., Franchetti, F., Low, T.M.: Mixed data layout kernels for vectorized complex arithmetic. In: 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 (2017)Google Scholar
  7. 7.
    Popovici, D.T., Russell, F.P., Wilkinson, K., Skylaris, C.K., Kelly, P.H., Franchetti, F.: Generating optimized Fourier interpolation routines for density functional theory using SPIRAL. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 743–752. IEEE (2015)Google Scholar
  8. 8.
    Rader, C.M.: Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE 56, 1107–1108 (1968)Google Scholar
  9. 9.
    Skylaris, C.K., Haynes, P.D., Mostofi, A.A., Payne, M.C.: Introducing ONETEP: linear-scaling density functional simulations on parallel computers. J. Chem. Phys. 122, 084119 (2005)Google Scholar
  10. 10.
    Vasilache, N., Johnson, J., Mathieu, M., Chintala, S., Piantino, S., LeCun, Y.: Fast convolutional nets with fbfft: A GPU performance evaluation. arXiv preprint arXiv:1412.7580 (2014)
  11. 11.
    Veras, R., Popovici, D.T., Low, T.M., Franchetti, F.: Compilers, hands-off my hands-on optimizations. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2016, pp. 4:1–4:8 (2016).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Lawrence Berkeley National LabBerkeleyUSA
  2. 2.University of Texas at AustinAustinUSA
  3. 3.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations