A Parallel 1-D FFT Implementation Method for Multi-core Vector Processors

Liu, Zhong; Tian, Xi

doi:10.1007/978-981-13-5919-4_5

Zhong Liu¹³ &
Xi Tian¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 994))

Included in the following conference series:

CCF National Conference on Computer Engineering and Technology

5376 Accesses

Abstract

This paper presents an efficient parallel 1-D FFT implementation method based on the architecture features of multi-core vector processor. It divides the parallel computation of large-point 1-D FFT into the (n-m)-level parallel FFT computation and M-point parallel FFT computation according to the number of data points M that can be accommodated in the global cache (GC). The parallel FFT computation for each stage are performed using a shared DDR data method in (n-m)-level FFT computation. In the M-point parallel FFT computation, a parallel FFT computation method based on the matrix Fourier algorithm is designed, it converts the original M-point 1-D FFT computation into a 2-D FFT computation, and achieves parallel FFT computation using a shared GC data method, which avoids multiple data transfers between GC and AM and reduces data transmission overhead. Merge Column FFT computation with factor matrix multiplication and column FFT computation results in the AM, which further reduces the number of data transfer between AM and GC, and can significantly improve the efficiency of M-point FFT computation. The experimental results on Matrix show that the average speedup of the single-core single-precision 1-D FFT is 8.26 times and the average speedup of the dual-core single-precision 1-D FFT is 6.78 times compared with the TMS320C6678 with the same frequency.

Supported by the National Natural Science Foundation of China under Grant No. 61572025.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Franchetti, F., Puschel, M., Voronenko, Y., Chellappa, S., Moura, J.M.: Discrete fourier transform on multicore. Signal Process. Mag. IEEE 26, 90–102 (2009)
Article Google Scholar
Gu, L., Siegel, J., Li, X.: Using GPUs to compute large out-of-card FFTs. In: Proceedings of the International Conference on Supercomputing, pp. 255–264. ACM (2011)
Google Scholar
Pekurovsky, D.: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions. SIAM J. Sci. Comput. 34, 192–209 (2012)
Article MathSciNet Google Scholar
Pippig, M.: PFFT: an extension of FFTW to massively parallel architectures. SIAM J. Sci. Comput. 35, 213–236 (2013)
Article MathSciNet Google Scholar
Takahashi, D.: Implementation of parallel 1-D FFT on GPU clusters. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp. 174–180, December 2013
Google Scholar
Tang, P.T.P., Park, J., Kim, D., Petrov, V.: A framework for low-communication 1-D FFT. Sci. Program. 21, 181–195 (2013)
Google Scholar
Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: Intel math kernel library. High-Performance Computing on the Intel® Xeon Phi™, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7
Chapter Google Scholar
Cooley, J.W., Turkey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Article MathSciNet Google Scholar
Goedecker, S.: Fast Radix 2, 3, 4, and 5 kernels for fast Fourier Transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18(6), 1605–1611 (1997)
Article MathSciNet Google Scholar
Karner, H., Auer, M., Ueberhuber, C.W.: Multiply-add optimized FFT kernels. Math. Model. Methods Appl. Sci. 11(01), 105–117 (2001)
Article MathSciNet Google Scholar
Liu, Z., Chen, H., Xiang, H.V.: Vectorization of accelerating fast fourier transform computation based on fused multiply-add instruction. J. Natl. Univ. Def. Technol. 37(2), 72–78 (2015)
Google Scholar
HE, T., Zhu, D.: Design and implementation of large-point 1D FFT on GPU. Comput. Eng. Sci. 35(11), 34–41 (2013)
Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW. Proc. IEEE 93(2), 216–231 (2005)
Article Google Scholar
Takahashi, D.: A parallel 1-D FFT algorithm for the Hitachi SR8000. Parallel Comput. 29(6), 679–690 (2003)
Article MathSciNet Google Scholar
Takahashi, D., Uno, A., Yokokawa, M.: An implementation of Parallel 1-D FFT on the K computer. Int. Conf. High Perform. Comput. Commun. 248(4), 344–350 (2012)
Google Scholar
Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P.T.P., Dubey, P., Kim, D.: Tera-scale 1D FFT with low communication algorithm and Intel® Xeon Phi™ coprocessors. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, vol. 31, no. 12, p. 34. ACM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, China
Zhong Liu & Xi Tian

Authors

Zhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Liu .

Editor information

Editors and Affiliations

National University of Defense Technology, Changsha, China
Weixia Xu
National University of Defense Technology, Changsha, China
Liquan Xiao
National University of Defense Technology, Changsha, China
Jinwen Li
National University of Defense Technology, Changsha, China
Zhenzhen Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Tian, X. (2019). A Parallel 1-D FFT Implementation Method for Multi-core Vector Processors. In: Xu, W., Xiao, L., Li, J., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2018. Communications in Computer and Information Science, vol 994. Springer, Singapore. https://doi.org/10.1007/978-981-13-5919-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-5919-4_5
Published: 06 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5918-7
Online ISBN: 978-981-13-5919-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)