Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform

Wang, Xiao; Jia, Haipeng; Li, Zhihao; Zhang, Yunquan

doi:10.1007/978-3-030-05054-2_27

Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform

Xiao Wang^16,17,
Haipeng Jia¹⁶,
Zhihao Li^16,17 &
…
Yunquan Zhang¹⁶

Conference paper
First Online: 07 December 2018

1817 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11335))

Abstract

Fourier Transform is one of the most critical algorithms, and is applied in a wide range of fields like signal processing and data compression. In real world applications, such as image compression (JPEG), Fourier Transform is concentrated in processing real number input. These transforms are called real DFT (real discrete fourier transform) in this paper. Thus it is critical to optimize real DFT for specific platforms. In this paper, we implement 1D and 2D real DFT on ARMv8 platform which is the flagship architecture of ARM. Real DFT kinds implemented and optimized include R2HC, HC2R, DHT, DCTI-IV, DSTI-IV and are especially optimized when input size is \(2^{q}3^{n}5^{m}\). In order to achieve high performance, optimization is carried out in following aspects: (1) Reduction of the computation complexity of real DFT. (2) Implementation of high performance 1D complex DFT algorithm to support real DFT. (3) For the 2D real DFT, we propose a cache-aware blocking approach to improve cache performance. Experimental results show that: Compared with FFTw 3.3.7, 1D-Float DFT gains 1.52x speedup in average across all real DFT kinds, maximum speedup reaches 1.79x; 1D-Double DFT gains 1.34x speedup in average across all real DFT kinds, maximum speedup reaches 1.61x; 2D-Float DFT gains 1.41x speedup in average across all real DFT kinds, maximum speedup reaches 1.70x; 2D-Double DFT gains 1.10x speedup across all real DFT kinds, maximum speedup reaches 1.25x.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Oran Brigham, E.: The Fast Fourier Transform and Its Applications, vol. 1. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Reddy, B.S., Chatterji, B.N.: An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 5(8), 1266–1271 (1996)
Article Google Scholar
Sorensen, H.V., Jones, D., Heideman, M., Burrus, C.: Real-valued Fast Fourier Transform algorithms. IEEE Trans. Acoust. Speech Signal Process. 35(6), 849–863 (1987)
Article Google Scholar
Pippig, M.: PFFT: an extension of FFTW to massively parallel architectures. SIAM J. Sci. Comput. 35(3), C213–C236 (2013)
Article MathSciNet Google Scholar
Abtahi, T., Kulkarni, A., Mohsenin, T.: Accelerating convolutional neural network with FFT on tiny cores. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2017)
Google Scholar
Cecotti, H., Graeser, A.: Convolutional neural network with embedded Fourier Transform for EEG classification. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks, pp. 4013–4021 (2016)
Google Scholar
Lee, B.: FCT-a fact cosine transform. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1984, vol. 9, pp. 477–480. IEEE (1984)
Google Scholar
Chen, Y., Cui, X., Mei, H.: Large-scale FFT on GPU clusters, pp. 315–324 (2010)
Google Scholar
Frigo, M., Johnson, S.G.: FFTW: an adaptive software architecture for the FFT, vol. 3, pp. 1381–1384. IEEE (1998)
Google Scholar
Li, Y., Zhang, Y.-Q., Liu, Y.-Q., Long, G.-P., Jia, H.-P.: MPFFT: an auto-tuning FFT library for OpenCL GPUs. J. Comput. Sci. Technol. 28(1), 90–105 (2013)
Article Google Scholar
Makhoul, J.: A fast cosine transform in one and two dimensions. IEEE Trans. Acoust. Speech Signal Process. 28(1), 27–34 (1980)
Article MathSciNet Google Scholar
Press, W.H.: Numerical Recipes: The Art of Scientific Computing, 3rd edn. Cambridge University Press, New York (2007)
MATH Google Scholar
Shao, X., Johnson, S.G.: Type-II/III DCT/DST algorithms with reduced number of arithmetic operations. Signal Process. 88(6), 1553–1564 (2008)
Article Google Scholar
Wang, Z.: On computing the discrete fourier and cosine transforms. IEEE Trans. Acoust. Speech Signal Process. 33(5), 1341–1344 (1985)
Article MathSciNet Google Scholar
ARM Performance Library. https://developer.arm.com/products/software-development-tools/hpc/arm-performance-libraries

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China under Grant No.2017YFB0202105 and No.2016YFE0100300; The National Natural Science Foundation of China under Grant No.61432018, No.61521092 and No.61502405; Key Technology Research and Development Programs of Guangdong Province under Grant No.2015B010108006.

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xiao Wang, Haipeng Jia, Zhihao Li & Yunquan Zhang
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
Xiao Wang & Zhihao Li

Authors

Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haipeng Jia .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Jia, H., Li, Z., Zhang, Y. (2018). Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-05054-2_27
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05053-5
Online ISBN: 978-3-030-05054-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics