Circuits, Systems, and Signal Processing

, Volume 38, Issue 2, pp 529–555 | Cite as

Design and Digital Implementation of Fast and Recursive DCT II–IV Algorithms

  • Sirani M. PereraEmail author
  • Arjuna Madanayake
  • Nathan Dornback
  • Nilan Udayanga


Using the proposed factorizations of discrete cosine transform (DCT) matrices, fast and recursive algorithms are stated. In this paper, signal flow graphs for the n-point DCT II and DCT IV algorithms are introduced. The proposed algorithms yield exactly the same results as with standard DCT algorithms but are faster. The arithmetic complexity and stability of the algorithms are explored, and improvements of these algorithms are compared with previously existing fast and stable DCT algorithms. A parallel hardware computing architecture for the DCT II algorithm is proposed. The computing architecture is first designed, simulated, and prototyped using a 40-nm Xilinx Virtex-6 FPGA and thereafter mapped to custom integrated circuit technology using 0.18-\(\upmu \)m CMOS standard cells from Austria Micro Systems. The performance trade-off exists between computational precision, chip area, clock speed, and power consumption. This trade-off is explored in both FPGA and custom CMOS implementation spaces. An example FPGA implementation operates at clock frequencies in excess of 230 MHz for several values of system word size leading to real-time throughput levels better than 230 million 16-point DCTs per second. Custom CMOS-based results are subject to synthesis and place-and-route steps of the design flow. Physical silicon fabrication was not conducted due to prohibitive cost.


Discrete cosine transforms Fast algorithms Recursive algorithms Arithmetic complexity Sparse and orthogonal factors Signal flow graphs Field-programmable gate array (FPGA) Application-specific integrated circuits (ASIC) 



The authors appreciate valuable discussions, help, and suggestions made by Jianhua Liu to improve the quality of the paper.


  1. 1.
    L.V. Agostini, I.S. Silva, S. Bampi, Pipelined fast 2D DCT architecture for JPEG image compression, in 14th Symposium on Integrated Circuits and Systems Design, pp. 226–231 (2001)Google Scholar
  2. 2.
    N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform. IEEE Trans. Comput. C–23(1), 90–93 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    G. Anastassiou, Handbook of Analytic Computational Methods in Applied Mathematics (CRC Press, Boca Raton, 2000)CrossRefzbMATHGoogle Scholar
  4. 4.
    G. Baszenski, U. Schreiber, G. Tasche, Numerical stability of fast cosine transforms. Numer. Funct. Anal. Opt. 21(1–2), 25–46 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    V. Britanak, New universal rotation-based fast computational structures for an efficient implementation of the DCT-IV/DST-IV and analysis/synthesis MDCT/MDST lter banks. Signal Process. 89(11), 2213–2232 (2009)CrossRefzbMATHGoogle Scholar
  6. 6.
    V. Britanak, New recursive fast radix-2 algorithm for the modulated complex lapped transform. IEEE Trans. Signal Process. 60(12), 6703–6708 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    V. Britanak, K.R. Rao, Two-dimensional DCT/DST universal computational structure for 2m\(\times \)2n block sizes. IEEE Trans. Signal Process. 48(11), 3250–3255 (2000)CrossRefGoogle Scholar
  8. 8.
    V. Britanak, P. Yip, K.R. Rao, Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations (Academic Press, Oxford, 2007)Google Scholar
  9. 9.
    M. Budagavi, A. Fuldseth, G. Bjntegaard, V. Sze, M. Sadafale, Core transform design in the high efficiency video coding (HEVC) standard. IEEE J. Sel. Topics Signal Process. 7(6), 1029–1041 (2013)CrossRefGoogle Scholar
  10. 10.
    V.A. Coutinho, R.J. Cintra, F.M. Bayer, S. Kulasekera, A. Madanayake, Low-complexity pruned 8-point DCT approximations for image encoding, in International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–7 (2015)Google Scholar
  11. 11.
    A.D. Darji, R.P. Makwana, High-performance multiplierless DCT architecture for HEVC, in 19th International Symposium on VLSI Design and Test, pp. 1–5 (2015)Google Scholar
  12. 12.
    V. Dhandapani, S. Ramachandran, Area and power efficient DCT architecture for image compression. EURASIP J. Adv. Signal Process. 2014(1), 1–9 (2014)CrossRefGoogle Scholar
  13. 13.
    A. Dogan, An efficient low area implementation of 2-D DCT on FPGA, in 9th International Conference on Electrical and Electronics Engineering (ELECO), pp. 771–775 (2015)Google Scholar
  14. 14.
    H. El-Banna, A.A. El-Fattah, W. Fakhr, An efficient implementation of the 1D DCT using FPGA technology, in Proceedings of the 15th International Conference on Microelectronics, pp. 278–281 (2003)Google Scholar
  15. 15.
    High efficiency video coding. ITU-T Rec. H.265 and ISO/IEC 23008-2 (HEVC), ITU-T and ISO/IEC (2013)Google Scholar
  16. 16.
    N.J. Higham, Accuracy and Stability of Numerical Algorithms (SIAM Publications, Philadelphia, 1996)zbMATHGoogle Scholar
  17. 17.
    A.K. Jain, A sinusoidal family of unitary transforms. IEEE Trans. Pattern Anal. Mach. Intell. 1(4), 356–365 (1979)CrossRefzbMATHGoogle Scholar
  18. 18.
    T. Kailath, V. Olshevsky, Displacement structure approach to discrete-trigonometric transform based preconditioners of g.strang type and of t.chan type. CALCOLO 33(3), 191–208 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    E. Kalali, A.C. Mert, I. Hamzaoglu, A computation and energy reduction technique for hevc discrete cosine transform. IEEE Trans. Consum. Electron. 62(2), 166–174 (2016)CrossRefGoogle Scholar
  20. 20.
    M.C. Lee, R.K.W. Chan, D.A. Adjeroh, Fast three-dimensional discrete cosine transform. SIAM J. Sci. Comput. 30(6), 3087–3107 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    M.H. Lee, M.H.A. Khan, K.J. Kim, D. Park, A fast hybrid jackethadamard matrix based diagonal block-wise transform. Signal Process. Image Commun. 29(1), 49–65 (2014)CrossRefGoogle Scholar
  22. 22.
    P.K. Meher, S.Y. Park, B.K. Mohanty, K.S. Lim, C. Yeo, Efficient integer DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol. 24(1), 168–178 (2014)CrossRefGoogle Scholar
  23. 23.
    A. Olshevsky, V. Olshevsky, J. Wang, A comrade-matrix-based derivation of the eight versions of fast cosine and sine transforms, in Contemporary Mathematics, ed. by V. Olshevsky (American Mathematical Society, Boston, 2003), pp. 119–149Google Scholar
  24. 24.
    G. Pastuszak, Hardware architectures for the H.265/HEVC discrete cosine transform. IET Image Process. 9(6), 468–477 (2015)CrossRefGoogle Scholar
  25. 25.
    S.M. Perera, Signal processing based on stable radix-2 DCT I–IV algorithms having orthogonal factors. Electron. J. Linear Algebra 31, 362–380 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    S.M. Perera, V. Olshevsky, Stable, recursive and fast algorithms for DST having orthogonal factors. J. Coupled Syst. Multiscale Dyn. 1(3), 358–371 (2013)CrossRefGoogle Scholar
  27. 27.
    G. Plonka, M. Tasche, Fast and numerically stable algorithms for discrete cosine transforms. Linear Algebra Appl. 394, 309–345 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, S. Kulasekera, A. Edirisuriya, Improved 8-point approximate DCT for image and video compression requiring only 14 additions. IEEE Trans. Circuits Syst. I Regul. Pap. 61(6), 1727–1740 (2014)CrossRefGoogle Scholar
  29. 29.
    M.T. Pourazad, C. Doutre, M. Azimi, P. Nasiopoulos, HEVC: the new gold standard for video compression: How does HEVC compare with H.264/AVC? IEEE Consum. Electron. Mag. 1(3), 36–46 (2012)CrossRefGoogle Scholar
  30. 30.
    M. Puschel, J.M.F. Moura, Algebraic signal processing theory: Cooley–Tukey type algorithms for DCTs and DSTs. IEEE Trans. Signal Process. 56(4), 1502–1521 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications (Academic Press, Cambridge, 2014)zbMATHGoogle Scholar
  32. 32.
    K.R. Rao, D.N. Kim, J.J. Hwang, Fast Fourier Transform-Algorithms and Applications (Springer, Berlin, 2011)zbMATHGoogle Scholar
  33. 33.
    U. Schreiber, Fast and numerically stable trigonometric transforms. Ph.D. dissertation, Thesis, University of Rostock (1999)Google Scholar
  34. 34.
    M.T.G. Steidl, A polynomial approach to fast algorithms for discrete Fourier-cosine and Fourier-sine transforms. Math. Comput. 56(193), 281–296 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    G. Strang, Introduction to Applied Mathematics (Wellesley-Cambridge Press, Cambridge, 1986)zbMATHGoogle Scholar
  36. 36.
    A. Tumeo, M. Monchiero, G. Palermo, F. Ferrandi, D. Sciuto, A pipelined fast 2D-DCT accelerator for FPGA-based SoCs, in IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 331–336 (2007)Google Scholar
  37. 37.
    C. Van Loan, Computational frameworks for the fast Fourier transform. SIAM 10, 1 (1992)MathSciNetzbMATHGoogle Scholar
  38. 38.
    M. Vashkevich, A.A. Petrovsky, A low multiplicative complexity fast recursive DCT-2 algorithm. CoRR. abs/1203.3442 (2012)Google Scholar
  39. 39.
    Z. Wang, Fast algorithms for the discrete W transform and for the discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(4), 803–816 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Z. Wang, B. Hunt, The discrete cosine transform—a new version. IEEE Int. Confer. Acoust. Speech Signal Process. 8, 1256–1259 (1983)CrossRefGoogle Scholar
  41. 41.
    Y. Ye, S. Cheng, Implementation of 2D-DCT Based on FPGA with Verilog HDL (Springer, Berlin, 2011), pp. 633–639Google Scholar
  42. 42.
    P. Yip, K.R. Rao, A fast computational algorithm for the discrete sine transform. IEEE Trans. Commun. 28(2), 304–307 (1980)CrossRefzbMATHGoogle Scholar
  43. 43.
    W. Zhao, T. Onoye, T. Song, High-performance multiplierless transform architecture for HEVC, in IEEE International Symposium on Circuits and Systems (ISCAS2013), pp. 1668–1671 (2013)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Sirani M. Perera
    • 1
    Email author
  • Arjuna Madanayake
    • 2
  • Nathan Dornback
    • 2
  • Nilan Udayanga
    • 2
  1. 1.Department of MathematicsEmbry-Riddle Aeronautical UniversityDaytona BeachUSA
  2. 2.Department Electrical and Computer EngineeringUniversity of AkronAkronUSA

Personalised recommendations