Low-Power Custom Regular Processor Synthesis Flow

  • Roger Woods
  • Gayle Lightbody
  • Jonathan Spanier
  • Gareth Keane


In high-throughput signal processing applications that are both arithmetic- and data-dominated, the performance capabilities of customized processor solutions are attractive particularly in applications such as video compression where functionality is fixed or has been standardized. For high volumes, customized VLSI solutions offer the ultimate system performance in terms of both area and speed. For example, a processing performance of 100 GOP/s/cm 2/W is achievable for computationally complex DSP algorithms in 0.35μm standard cell CMOS technology. The performance gain is possible as it is often possible to exploit characteristics of the algorithm in such a way to allow a highly optimized implementation to be realized. A number of different design flows exist for this target domain. In this chapter, we will focus on relatively regular signal processing kernels that are critical in the overall performance and cost of the global application. Examples include motion estimation, discrete orthogonal transforms, algebraic solvers, Fourier transforms and the like. The designs resulting from our tuned approach are characterized by high area utilization, high levels of locality (preserving power) and efficient memory utilization.


Discrete Cosine Transform Design Flow Discrete Cosine Transform Coefficient Custom Processor VLSI Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Booth, 1951]
    Booth, A. D. A signed binary multiplication technique. In Quarterly Journal of Mechanics and Applied Mathematics, Vol. 4, No. 2, pages 236–240, 1951.Google Scholar
  2. [Chandrakasan and Brodersen, 1996]
    Chandrakasan, A. and Brodersen, R., editors Low Power Digital Design. Kluwer Academic Publishers, 1996.Google Scholar
  3. [Chandrakasan et al., 1995]
    Chandrakasan, A., Potkonjak, M., Mehra, R., Rabaey, J., and Brodersen, R. (1995). Optimizing Power Using Transformations. In IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol. 14, No. 7, pages 12–31.Google Scholar
  4. [Chandrakasan et al., 1992]
    Chandrakasan, A., Sheng, S., and Brodersen, R. (1992). Low Power CMOS Digital Design. In IEEE Journal of Solid-State Circuits, Vol. 27, No. 4, pages 473–484.Google Scholar
  5. [Chen et al., 1988]
    Chen, T., Sun, M., and Gottlieb, A. (1988). VLSI Implementation of a 16x16 DCT. In Proceedings ICASSP 1988, Vol. 4, pages 1973–1976.Google Scholar
  6. [Cvetkovic and Popovic, 1992]
    Cvetkovic, Z. and Popovic, M. V. (1992). New Fast Recursive Algorithms for the Computation of the Discrete Cosine and Sine Transforms. In IEEE Trans. On Signal Processing, Vol 40, No 8., pages 2083–2086.zbMATHCrossRefGoogle Scholar
  7. [Deverell, 1975]
    Deverell, J. (1975). Pipelined Iterative Arithmetic arrays. In IEEE Trans. On Computers, C-24, pages 317–322.Google Scholar
  8. [Fadavi-Ardekani, 1993]
    Fadavi-Ardekani, J. (1993). M X N Booth Encoded Multiplier Generator using Optimized Wallace Trees. In IEEE Transactions on VLSI Systems, Vol. 1, No. 7, pages 120–125.CrossRefGoogle Scholar
  9. [Foty and Nowak, 1994]
    Foty, D. P. and Nowak, E. J. (1994). MOSFET Technology for Low-Voltage/Low-Power Applications. In IEEE Micro, pages 68–77.Google Scholar
  10. [Givens, 1958]
    Givens, W. (1958). Computation of plane unitary rotations transforming a general matrix to triangular form. In J. Soc. Ind. Appl. Math, vol. 6, pages 26–50.MathSciNetzbMATHCrossRefGoogle Scholar
  11. [Guorong and Yungong, 1990]
    Guorong, H. and Yungong, S. (1990). CORDIC Implementation of the RLS Systolic Array. In Proceedings of ICSP, pages 1133–1136.Google Scholar
  12. [Haykin, 1986]
    Haykin, S., editor (1986).Adaptive Filter Theory. Prentice Hall.Google Scholar
  13. [Hou, 1987]
    Hou, H. S. (1987). A Fast Recursive Algorithm for computing the Discrete Cosine Transform. In IEEE Trans. On Acoustics, Speech and signal Processing, Vol ASSP-35, No. 70, pages 1455–1461.Google Scholar
  14. [Hunter, 1999]
    Hunter, J., (1999).Rapid Design of Discrete Cosine Transform Cores for Multimedia VLSI Systems. PhD thesis, The Queen’s University of Belfast.Google Scholar
  15. [Integrated Silicon Systems, 1997]
    Integrated Silicon Systems, (1997).DSiPWare Fixed Point Library Databook. Integrated Silicon Systems.Google Scholar
  16. [Keane et al., 1998]
    Keane, G., Spanier, J. R., and Woods, R. (1998). The Impact of Data Characteristics and Hardware Topology on Hardware Selection for Low Power DSP. In Int’l Symp. on Low Power Electronics and Design (ISLPED’98), pages 94–96.Google Scholar
  17. [Keane et al., 1999]
    Keane, G., Spanier, J. R., and Woods, R. (1999). Low-Power design of Signal Processing systems using Characterization of silicon IP cores. In 33rd Asilomar Conference on Signals, Systems and Computers, IEEE Computer Society, pages 767–771.Google Scholar
  18. [Kung, 1988]
    Kung, S. Y, editor (1988).VLSI Array Processors. Prentice-Hall.Google Scholar
  19. [Kuroda et al., 1998]
    Kuroda, T., Suzuki, K., Mita, S., Fujita, T., Yamane, F., Sano, F., Chiba, A., Watanabe, Y, Matsuda, K., Maeda, T., Sakurai, T., and Furuyama, T. (1998). Variable Supply-Voltage Scheme for Low-Power High-Speed CMOS Digital Design. In IEEE Journal of Solid-State Circuits, Vol. 33, No. 3, pages 454–462.CrossRefGoogle Scholar
  20. [Lightbody, 2000]
    Lightbody, G., (2000).High Performance VLSI Architectures for Recursive Least Squares Filtering. PhD thesis, The Queen’s University of Belfast.Google Scholar
  21. [Loeffler et al., 1989]
    Loeffler, C, Ligtenberg, A., and Moschytz, G. (1989). Practical Fast 1D DCT Algorithms with 11 Multiplications. In Proceedings ICASSP 1989, Vol. 2, pages 988–991.Google Scholar
  22. [McCanny and McWhirter, 1982]
    McCanny, J. V and McWhirter, J. G. (1982). Completely Iterative Pipelined Multiplier array suitable for VLSI. In IEE Proc., Vol 129, Pt G., No. 2, pages 40–45.Google Scholar
  23. [McWhirter, 1983]
    McWhirter, J. G. (1983). Recursive least squares minimisation using systolic array. In Proc. SPIE (Real-Time Signal Processing IV), vol. 431., pages 105–112.CrossRefGoogle Scholar
  24. [Mehra et al., 1996]
    Mehra, R., Guerra, L. M., and Rabaey, J. M. (1996). Low-Power Architectural Synthesis and the Impact of Exploiting Regilarity. In Journal of VLSI Signal Processing Systems, Vol. 13, pages 239–258.CrossRefGoogle Scholar
  25. [Nagendra et al., 1996]
    Nagendra, C, Irwin, M. J., and Owens, R. M. (1996). Area-Time-Power Tradeoffs in Parallel Adders. In IEEE Transactions on Circuits and Systems, Vol. 43, No. 10, pages 689–702.CrossRefGoogle Scholar
  26. [Nagendra et al., 1994]
    Nagendra, C., Owens, R. M., and Irwin, M. J. (1994). Power-Delay Characteristics of CMOS Adders. In IEEE Transactions on VLSI Systems, Vol. 2, No. 3, pages 377–381.CrossRefGoogle Scholar
  27. [Rader, 1996]
    Rader, C. M. (1996). VLSI systolic arrays for adaptive nulling. In IEEE Signal Processing Magazine, Vol. 13, No. 4, pages 29–49.CrossRefGoogle Scholar
  28. [Shams and Bayoumi, 1997]
    Shams, A. M. and Bayoumi, M. (1997). A Structured approach for Designing Low Power Adders. In 31st Asilomar Conference on Signals, Systems and Computers, IEEE Computer Society, Vol. 1., pages 757–761.Google Scholar
  29. [Stork, 1995]
    Stork, J. (1995). Technology Leverage for Ultra-Low Power Information Systems. In Proceedings of the IEEE, Vol. 83, No. 4, pages 607–618.CrossRefGoogle Scholar
  30. [Trainor, 1995]
    Trainor, D., (1995).An Architectural Syntheiss tool for VLSI Signal Processing chips. PhD thesis, The Queen’s University of Belfast.Google Scholar
  31. [Trainor et al., 1997]
    Trainor, D., Woods, R. F., and McCanny, J. V. (1997). Architectural Synthesis of Digital Signal Processing Algorithms using “IRIS”. In Journal of VLSI Signal Processing Systems, Vol. 16, No. 1, pages 41–56.CrossRefGoogle Scholar
  32. [Vanhoof et al., 1993]
    Vanhoof, J., Rompaey, K. V., Bolsens, L, Goossens, G., and Man, H. D., editors (1993).High-Level Synthesis for Real-Time Digital Signal Processing. Kluwer Academic Publishers.zbMATHGoogle Scholar
  33. [Walke, 1997]
    Walke, R. L., (1997).High Sample Rate Givens Rotations for Recursive Least Squares. PhD thesis, University of Warwick.Google Scholar
  34. [Wallace, 1964]
    Wallace, C. S. (1964). A Suggestion for a Fast Multiplier. In IEEE Transactions on Computers, Vol. 13, No. 7, pages 14–17.zbMATHCrossRefGoogle Scholar
  35. [Zimmermann and Fichtner, 1997]
    Zimmermann, R. and Fichtner, W. (1997). Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic. In IEEE Journal of Solid-State Circuits, Vol. 32, No. 7, pages 1079–1090.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2000

Authors and Affiliations

  • Roger Woods
    • 1
  • Gayle Lightbody
    • 1
  • Jonathan Spanier
    • 1
  • Gareth Keane
    • 1
  1. 1.DSP and Telecommunications GroupQueen’s University of BelfastUK

Personalised recommendations