Abstract
This paper presents an efficient parallel 1-D FFT implementation method based on the architecture features of multi-core vector processor. It divides the parallel computation of large-point 1-D FFT into the (n-m)-level parallel FFT computation and M-point parallel FFT computation according to the number of data points M that can be accommodated in the global cache (GC). The parallel FFT computation for each stage are performed using a shared DDR data method in (n-m)-level FFT computation. In the M-point parallel FFT computation, a parallel FFT computation method based on the matrix Fourier algorithm is designed, it converts the original M-point 1-D FFT computation into a 2-D FFT computation, and achieves parallel FFT computation using a shared GC data method, which avoids multiple data transfers between GC and AM and reduces data transmission overhead. Merge Column FFT computation with factor matrix multiplication and column FFT computation results in the AM, which further reduces the number of data transfer between AM and GC, and can significantly improve the efficiency of M-point FFT computation. The experimental results on Matrix show that the average speedup of the single-core single-precision 1-D FFT is 8.26 times and the average speedup of the dual-core single-precision 1-D FFT is 6.78 times compared with the TMS320C6678 with the same frequency.
Supported by the National Natural Science Foundation of China under Grant No. 61572025.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Franchetti, F., Puschel, M., Voronenko, Y., Chellappa, S., Moura, J.M.: Discrete fourier transform on multicore. Signal Process. Mag. IEEE 26, 90–102 (2009)
Gu, L., Siegel, J., Li, X.: Using GPUs to compute large out-of-card FFTs. In: Proceedings of the International Conference on Supercomputing, pp. 255–264. ACM (2011)
Pekurovsky, D.: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions. SIAM J. Sci. Comput. 34, 192–209 (2012)
Pippig, M.: PFFT: an extension of FFTW to massively parallel architectures. SIAM J. Sci. Comput. 35, 213–236 (2013)
Takahashi, D.: Implementation of parallel 1-D FFT on GPU clusters. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp. 174–180, December 2013
Tang, P.T.P., Park, J., Kim, D., Petrov, V.: A framework for low-communication 1-D FFT. Sci. Program. 21, 181–195 (2013)
Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: Intel math kernel library. High-Performance Computing on the Intel® Xeon Phi™, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7
Cooley, J.W., Turkey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Goedecker, S.: Fast Radix 2, 3, 4, and 5 kernels for fast Fourier Transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18(6), 1605–1611 (1997)
Karner, H., Auer, M., Ueberhuber, C.W.: Multiply-add optimized FFT kernels. Math. Model. Methods Appl. Sci. 11(01), 105–117 (2001)
Liu, Z., Chen, H., Xiang, H.V.: Vectorization of accelerating fast fourier transform computation based on fused multiply-add instruction. J. Natl. Univ. Def. Technol. 37(2), 72–78 (2015)
HE, T., Zhu, D.: Design and implementation of large-point 1D FFT on GPU. Comput. Eng. Sci. 35(11), 34–41 (2013)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW. Proc. IEEE 93(2), 216–231 (2005)
Takahashi, D.: A parallel 1-D FFT algorithm for the Hitachi SR8000. Parallel Comput. 29(6), 679–690 (2003)
Takahashi, D., Uno, A., Yokokawa, M.: An implementation of Parallel 1-D FFT on the K computer. Int. Conf. High Perform. Comput. Commun. 248(4), 344–350 (2012)
Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P.T.P., Dubey, P., Kim, D.: Tera-scale 1D FFT with low communication algorithm and Intel® Xeon Phi™ coprocessors. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, vol. 31, no. 12, p. 34. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, Z., Tian, X. (2019). A Parallel 1-D FFT Implementation Method for Multi-core Vector Processors. In: Xu, W., Xiao, L., Li, J., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2018. Communications in Computer and Information Science, vol 994. Springer, Singapore. https://doi.org/10.1007/978-981-13-5919-4_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-5919-4_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5918-7
Online ISBN: 978-981-13-5919-4
eBook Packages: Computer ScienceComputer Science (R0)