# Design and Digital Implementation of Fast and Recursive DCT II–IV Algorithms

- 59 Downloads

## Abstract

Using the proposed factorizations of discrete cosine transform (DCT) matrices, fast and recursive algorithms are stated. In this paper, signal flow graphs for the *n*-point DCT II and DCT IV algorithms are introduced. The proposed algorithms yield exactly the same results as with standard DCT algorithms but are faster. The arithmetic complexity and stability of the algorithms are explored, and improvements of these algorithms are compared with previously existing fast and stable DCT algorithms. A parallel hardware computing architecture for the DCT II algorithm is proposed. The computing architecture is first designed, simulated, and prototyped using a 40-nm Xilinx Virtex-6 FPGA and thereafter mapped to custom integrated circuit technology using 0.18-\(\upmu \)m CMOS standard cells from Austria Micro Systems. The performance trade-off exists between computational precision, chip area, clock speed, and power consumption. This trade-off is explored in both FPGA and custom CMOS implementation spaces. An example FPGA implementation operates at clock frequencies in excess of 230 MHz for several values of system word size leading to real-time throughput levels better than 230 million 16-point DCTs per second. Custom CMOS-based results are subject to synthesis and place-and-route steps of the design flow. Physical silicon fabrication was not conducted due to prohibitive cost.

## Keywords

Discrete cosine transforms Fast algorithms Recursive algorithms Arithmetic complexity Sparse and orthogonal factors Signal flow graphs Field-programmable gate array (FPGA) Application-specific integrated circuits (ASIC)## Notes

### Acknowledgements

The authors appreciate valuable discussions, help, and suggestions made by Jianhua Liu to improve the quality of the paper.

## References

- 1.L.V. Agostini, I.S. Silva, S. Bampi, Pipelined fast 2D DCT architecture for JPEG image compression, in
*14th Symposium on Integrated Circuits and Systems Design*, pp. 226–231 (2001)Google Scholar - 2.N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform. IEEE Trans. Comput.
**C–23**(1), 90–93 (1974)MathSciNetCrossRefzbMATHGoogle Scholar - 3.G. Anastassiou,
*Handbook of Analytic Computational Methods in Applied Mathematics*(CRC Press, Boca Raton, 2000)CrossRefzbMATHGoogle Scholar - 4.G. Baszenski, U. Schreiber, G. Tasche, Numerical stability of fast cosine transforms. Numer. Funct. Anal. Opt.
**21**(1–2), 25–46 (2000)MathSciNetCrossRefzbMATHGoogle Scholar - 5.V. Britanak, New universal rotation-based fast computational structures for an efficient implementation of the DCT-IV/DST-IV and analysis/synthesis MDCT/MDST lter banks. Signal Process.
**89**(11), 2213–2232 (2009)CrossRefzbMATHGoogle Scholar - 6.V. Britanak, New recursive fast radix-2 algorithm for the modulated complex lapped transform. IEEE Trans. Signal Process.
**60**(12), 6703–6708 (2012)MathSciNetCrossRefzbMATHGoogle Scholar - 7.V. Britanak, K.R. Rao, Two-dimensional DCT/DST universal computational structure for 2m\(\times \)2n block sizes. IEEE Trans. Signal Process.
**48**(11), 3250–3255 (2000)CrossRefGoogle Scholar - 8.V. Britanak, P. Yip, K.R. Rao,
*Discrete Cosine and Sine Transforms: General Properties*, Fast Algorithms and Integer Approximations (Academic Press, Oxford, 2007)Google Scholar - 9.M. Budagavi, A. Fuldseth, G. Bjntegaard, V. Sze, M. Sadafale, Core transform design in the high efficiency video coding (HEVC) standard. IEEE J. Sel. Topics Signal Process.
**7**(6), 1029–1041 (2013)CrossRefGoogle Scholar - 10.V.A. Coutinho, R.J. Cintra, F.M. Bayer, S. Kulasekera, A. Madanayake, Low-complexity pruned 8-point DCT approximations for image encoding, in
*International Conference on Electronics, Communications and Computers (CONIELECOMP)*, pp. 1–7 (2015)Google Scholar - 11.A.D. Darji, R.P. Makwana, High-performance multiplierless DCT architecture for HEVC, in
*19th International Symposium on VLSI Design and Test*, pp. 1–5 (2015)Google Scholar - 12.V. Dhandapani, S. Ramachandran, Area and power efficient DCT architecture for image compression. EURASIP J. Adv. Signal Process.
**2014**(1), 1–9 (2014)CrossRefGoogle Scholar - 13.A. Dogan, An efficient low area implementation of 2-D DCT on FPGA, in
*9th International Conference on Electrical and Electronics Engineering (ELECO)*, pp. 771–775 (2015)Google Scholar - 14.H. El-Banna, A.A. El-Fattah, W. Fakhr, An efficient implementation of the 1D DCT using FPGA technology, in
*Proceedings of the 15th International Conference on Microelectronics*, pp. 278–281 (2003)Google Scholar - 15.High efficiency video coding. ITU-T Rec. H.265 and ISO/IEC 23008-2 (HEVC), ITU-T and ISO/IEC (2013)Google Scholar
- 16.N.J. Higham,
*Accuracy and Stability of Numerical Algorithms*(SIAM Publications, Philadelphia, 1996)zbMATHGoogle Scholar - 17.A.K. Jain, A sinusoidal family of unitary transforms. IEEE Trans. Pattern Anal. Mach. Intell.
**1**(4), 356–365 (1979)CrossRefzbMATHGoogle Scholar - 18.T. Kailath, V. Olshevsky, Displacement structure approach to discrete-trigonometric transform based preconditioners of g.strang type and of t.chan type. CALCOLO
**33**(3), 191–208 (1996)MathSciNetCrossRefzbMATHGoogle Scholar - 19.E. Kalali, A.C. Mert, I. Hamzaoglu, A computation and energy reduction technique for hevc discrete cosine transform. IEEE Trans. Consum. Electron.
**62**(2), 166–174 (2016)CrossRefGoogle Scholar - 20.M.C. Lee, R.K.W. Chan, D.A. Adjeroh, Fast three-dimensional discrete cosine transform. SIAM J. Sci. Comput.
**30**(6), 3087–3107 (2008)MathSciNetCrossRefzbMATHGoogle Scholar - 21.M.H. Lee, M.H.A. Khan, K.J. Kim, D. Park, A fast hybrid jackethadamard matrix based diagonal block-wise transform. Signal Process. Image Commun.
**29**(1), 49–65 (2014)CrossRefGoogle Scholar - 22.P.K. Meher, S.Y. Park, B.K. Mohanty, K.S. Lim, C. Yeo, Efficient integer DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol.
**24**(1), 168–178 (2014)CrossRefGoogle Scholar - 23.A. Olshevsky, V. Olshevsky, J. Wang, A comrade-matrix-based derivation of the eight versions of fast cosine and sine transforms, in
*Contemporary Mathematics*, ed. by V. Olshevsky (American Mathematical Society, Boston, 2003), pp. 119–149Google Scholar - 24.G. Pastuszak, Hardware architectures for the H.265/HEVC discrete cosine transform. IET Image Process.
**9**(6), 468–477 (2015)CrossRefGoogle Scholar - 25.S.M. Perera, Signal processing based on stable radix-2 DCT I–IV algorithms having orthogonal factors. Electron. J. Linear Algebra
**31**, 362–380 (2016)MathSciNetCrossRefzbMATHGoogle Scholar - 26.S.M. Perera, V. Olshevsky, Stable, recursive and fast algorithms for DST having orthogonal factors. J. Coupled Syst. Multiscale Dyn.
**1**(3), 358–371 (2013)CrossRefGoogle Scholar - 27.G. Plonka, M. Tasche, Fast and numerically stable algorithms for discrete cosine transforms. Linear Algebra Appl.
**394**, 309–345 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 28.U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, S. Kulasekera, A. Edirisuriya, Improved 8-point approximate DCT for image and video compression requiring only 14 additions. IEEE Trans. Circuits Syst. I Regul. Pap.
**61**(6), 1727–1740 (2014)CrossRefGoogle Scholar - 29.M.T. Pourazad, C. Doutre, M. Azimi, P. Nasiopoulos, HEVC: the new gold standard for video compression: How does HEVC compare with H.264/AVC? IEEE Consum. Electron. Mag.
**1**(3), 36–46 (2012)CrossRefGoogle Scholar - 30.M. Puschel, J.M.F. Moura, Algebraic signal processing theory: Cooley–Tukey type algorithms for DCTs and DSTs. IEEE Trans. Signal Process.
**56**(4), 1502–1521 (2008)MathSciNetCrossRefzbMATHGoogle Scholar - 31.K.R. Rao, P. Yip,
*Discrete Cosine Transform: Algorithms, Advantages, Applications*(Academic Press, Cambridge, 2014)zbMATHGoogle Scholar - 32.K.R. Rao, D.N. Kim, J.J. Hwang,
*Fast Fourier Transform-Algorithms and Applications*(Springer, Berlin, 2011)zbMATHGoogle Scholar - 33.U. Schreiber, Fast and numerically stable trigonometric transforms. Ph.D. dissertation, Thesis, University of Rostock (1999)Google Scholar
- 34.M.T.G. Steidl, A polynomial approach to fast algorithms for discrete Fourier-cosine and Fourier-sine transforms. Math. Comput.
**56**(193), 281–296 (1991)MathSciNetCrossRefzbMATHGoogle Scholar - 35.G. Strang,
*Introduction to Applied Mathematics*(Wellesley-Cambridge Press, Cambridge, 1986)zbMATHGoogle Scholar - 36.A. Tumeo, M. Monchiero, G. Palermo, F. Ferrandi, D. Sciuto, A pipelined fast 2D-DCT accelerator for FPGA-based SoCs, in
*IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, pp. 331–336 (2007)Google Scholar - 37.C. Van Loan, Computational frameworks for the fast Fourier transform. SIAM
**10**, 1 (1992)MathSciNetzbMATHGoogle Scholar - 38.M. Vashkevich, A.A. Petrovsky, A low multiplicative complexity fast recursive DCT-2 algorithm. CoRR. abs/1203.3442 (2012)Google Scholar
- 39.Z. Wang, Fast algorithms for the discrete W transform and for the discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process.
**32**(4), 803–816 (1984)MathSciNetCrossRefzbMATHGoogle Scholar - 40.Z. Wang, B. Hunt, The discrete cosine transform—a new version. IEEE Int. Confer. Acoust. Speech Signal Process.
**8**, 1256–1259 (1983)CrossRefGoogle Scholar - 41.Y. Ye, S. Cheng,
*Implementation of 2D-DCT Based on FPGA with Verilog HDL*(Springer, Berlin, 2011), pp. 633–639Google Scholar - 42.P. Yip, K.R. Rao, A fast computational algorithm for the discrete sine transform. IEEE Trans. Commun.
**28**(2), 304–307 (1980)CrossRefzbMATHGoogle Scholar - 43.W. Zhao, T. Onoye, T. Song, High-performance multiplierless transform architecture for HEVC, in
*IEEE International Symposium on Circuits and Systems (ISCAS2013)*, pp. 1668–1671 (2013)Google Scholar