Skip to main content
Log in

An FFT Performance Model for Optimizing General-Purpose Processor Architecture

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT), due to its flexibility, reliability and practicality. FFT is a representative application intensive in both computation and memory access, optimizing the FFT performance of a GPP also benefits the performances of many other applications. To facilitate the analysis of FFT, this paper proposes a theoretical model of the FFT processing. The model gives out a tight lower bound of the runtime of FFT on a GPP, and guides the architecture optimization for GPP as well. Based on the model, two theorems on optimization of architecture parameters are deduced, which refer to the lower bounds of register number and memory bandwidth. Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.

The above investigations were adopted in the development of Godson-3B, which is an industrial GPP. The optimization techniques deduced from our performance model improve the FFT performance by about 40%, while incurring only 0:8% additional area cost. Consequently, Godson-3B solves the 1024-point single-precision complex FFT in 0:368 μs with about 40Watt power consumption, and has the highest performance-per-watt in complex FFT among processors as far as we know. This work could benefit optimization of other GPPs as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Frigo M, Johnson S. The design and implementation of FFTW3. Proceedings of the IEEE, Feb. 2005, 93(2): 216–231.

    Article  Google Scholar 

  2. Franchetti F, Püschel M, Voronenko Y, Chellappa S, Moura J M F. Discrete Fourier transform on multicore. IEEE Signal Processing Magazine, 2009, 26(6): 90–102.

    Article  Google Scholar 

  3. Li Y, Zhao L, Lin H, Chow A C, Diamond J R. A performance model for fast Fourier transform. In Proc. the 23 rd International Symposium on Parallel and Distributed Processing, Rome, Italy, May 23–29, 2009, pp.1-11.

  4. Fraguela B B, Voronenko Y, PÄuschel M. Automatic tuning of discrete Fourier transforms driven by analytical modeling. In Proc. the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT2009). Raleigh, USA, Sept. 12–16, 2009, pp.271-280.

  5. Norton A, Silberger A J. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures. IEEE Transactions on Computers, 1987, C-36(5): 581–591.

    Google Scholar 

  6. Cvetanović Z. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM Journal of Research and Development, 1987, 31(4): 435–451.

    Article  Google Scholar 

  7. Gu L, Li X. DFT performance prediction in FFTW. In Proc.the 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC), Newark, USA, Oct. 8–10, 2009.

  8. Pagiamtzis K, Kulak P G. Empirical performance prediction for IFFT/FFT cores for OFDM systems-on-a-chip. In Proc. the 45th MWSCAS, Tulsa, USA, Aug. 4–7, 2002.

  9. Singer B, Veloso M. Learning to construct fast signal processing implementations. Journal of Machine Learning Research, 2003, 3: 887–919.

    MathSciNet  MATH  Google Scholar 

  10. Sepiashvili D. Performance models and search methods for optimal FFT implementations. [Master's Thesis]. Carnegie Mellon University, 2000.

  11. Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 1965, 19(90): 297–301.

    Article  MathSciNet  MATH  Google Scholar 

  12. Bergland G. Fast Fourier transform hardware implementations—An overview. IEEE Transactions on Audio and Electroacoustics, 1969, 17(2): 104–108.

    Article  Google Scholar 

  13. Gentleman W M, Sande G. Fast Fourier transforms — For fun and profit. In Proc. the 1966 Fall Joint Computer Conference, San Francisco, USA, Nov. 7–10, 1966, pp.563-578.

  14. Brenner N. Fast Fourier transform of externally stored data. IEEE Transactions on Audio and Electroacoustics, 1969, 17(2): 128–132.

    Article  Google Scholar 

  15. Guan X, Lin H, Fei Y. Design of an application-specific instruction set processor for high-throughput and scalable FFT. In Proc. DATE2009, Dresden, Germany, Mar. 12–16, 2009, pp.1302-1307.

  16. Statix IV. FFT MegaCore function. http://www.altera.com, Sept. 2010.

  17. TMS320C6747. Floating-point digital signal processor. http://focus.ti.com/dsp/docs, Sept. 2010.

  18. Naga K G, Brandon L, Yuri D, Burton S, John M. High performance discrete Fourier transforms on graphics processors. In Proc. the 22nd Int. Conference on Supercomputing, Island of Kos, Greece, Jun. 7–12, 2008, pp.1-12.

  19. Bader D, Agarwal V. FFTC: Fastest Fourier transform for the IBM Cell broadband engine. In Proc. the 14th IEEE International Conference on High Performance Computing (HiPC), Goa, India, Dec. 18–21, 2007, pp.172-184.

  20. Ranganathan P, Adve S, Jouppi N P. Performance of image and video processing with general-purpose processors and media ISA extensions. In Proc. the 26th International Symposium on Computer Architecture, Atlanta, USA, May 2–4, 1999, pp.124-135.

  21. Barkdull J N, Douglas S C. General-purpose microprocessor performance for DSP applications. In Conference Record of the 30th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Nov. 3–6, 1996, pp.912-916.

  22. Suh J, Kim E G, Crago S P, Lakshmi S, French M C. A performance analysis of pim, stream processing, and tiled processing on memory-intensive signal processing kernels. In Proc. the 30th Annual International Symposium on Computer Architecture, San Diego, USA, Jun. 9–11, 2003, pp.410-419.

  23. Chen L, Hu Z, Lin J, Gao G R. Optimizing the fast Fourier transform on a multi-core architecture. In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, USA, Mar. 26–30, 2007, pp.1-8.

  24. Frigo M, Johnson S. FFTW: An adaptive software architecture for the FFT. In Proc. the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, May 12–15, 1998, pp.1381-1384.

  25. Hu W, Wang J, Gao X, Chen Y. Micro-architecture of Godson-3 multi-core processor. In Proc. the 20th Hot Chips (Hotchips 2008), Stanford University, USA, Aug. 26–28, 2008.

  26. Hu W, Wang J, Gao X, Chen Y, Liu Q, Li G. Godson-3: A scalable multicore RISC processor with x86 emulation. IEEE Micro, 2009, 29(2): 17–29.

    Article  Google Scholar 

  27. Chellappa S, Franchetti F, PÄueschel M. Computer generation of fast Fourier transforms for the cell broadband engine. In Proc. the 23 rd International Conference on Supercomputing (ICS), York town Heights, USA, Jun. 8–12, 2009, pp.26-35.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ling Li, Yun-Ji Chen, Dao-Fu Liu, Cheng Qian or Wei-Wu Hu.

Additional information

This work is partially supported by the National Science and Technology Major Project under Grant Nos. 2009ZX01028-002-003, 2009ZX01029-001-003, 2010ZX01036-001-002, and the National Natural Science Foundation of China under Grant Nos. 61050002, 61003064, 60921002.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 123 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Chen, YJ., Liu, DF. et al. An FFT Performance Model for Optimizing General-Purpose Processor Architecture. J. Comput. Sci. Technol. 26, 875–889 (2011). https://doi.org/10.1007/s11390-011-0186-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-011-0186-z

Keywords

Navigation