An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors

  • Daisuke Takahashi
  • Mitsuhisa Sato
  • Taisuke Boku
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2716)


In this paper, we propose an OpenMP implementation of a recursive algorithm for parallel fast Fourier transform (FFT) on shared-memory parallel computers. A recursive three-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the DELL PowerEdge 7150 and the hp workstation zx6000 are reported. We successfully achieved performance of about 757MFLOPS on the DELL PowerEdge 7150 (Itanium 800MHz, 4CPUs) and about 871MFLOPS on the hp workstation zx6000 (Itanium2 1GHz, 2CPUs) for 224-point FFT.


Fast Fourier Transform Large Problem Size Fast Fourier Transform Algorithm Point Fast Fourier Transform OpenMP Directive 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965) 297–301zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Computing 5 (1987) 197–210zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Bailey, D.H.: FFTs in external or hierarchical memory. The Journal of Supercomputing 4 (1990) 23–35CrossRefGoogle Scholar
  4. 4.
    Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)zbMATHGoogle Scholar
  5. 5.
    Wadleigh, K.R.: High performance FFT algorithms for cache-coherent multiprocessors. The International Journal of High Performance Computing Applications 13 (1999) 163–171CrossRefGoogle Scholar
  6. 6.
    Takahashi, D.: A blocking algorithm for parallel 1-D FFT on shared-memory parallel computers. In: Proc. 6th International Conference on Applied Parallel Computing (PARA 2002). Volume 2367 of Lecture Notes in Computer Science., Springer-Verlag (2002) 380–389Google Scholar
  7. 7.
    Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numerische Mathematik 68 (1994) 507–547zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP98). (1998) 1381–1384Google Scholar
  9. 9.
    Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers 48 (1999) 142–149CrossRefGoogle Scholar
  10. 10.
    Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1 (1984) 45–63zbMATHCrossRefGoogle Scholar
  11. 11.
    Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance evaluation of OpenMP applications with nested parallelism. In: Proc. 5th Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR 2000). Volume 1915 of Lecture Notes in Computer Science., Springer-Verlag (2000) 100–112CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Daisuke Takahashi
    • 1
  • Mitsuhisa Sato
    • 1
  • Taisuke Boku
    • 1
  1. 1.Institute of Information Sciences and ElectronicsUniversity of TsukubaTsukuba, IbarakiJapan

Personalised recommendations