Systolic Arrays

  • Yu Hen HuEmail author
  • Sun-Yuan Kung


This chapter reviews the basic ideas of systolic array, its design methodologies, and historical development of various hardware implementations. Two modern applications, namely, motion estimation of video coding and wireless communication baseband processing are reviewed. The application to accelerating deep neural networks is also discussed.


  1. 1.
    Annaratone, M., Arnould, E., Gross, T., Kung, H.T., Lam, M., Menzilcioglu, O., and Webb, J.A.: The WARP computer: Architecture, implementation, and performance. IEEE Trans. Computers 36, 1523–1538 (1987)CrossRefGoogle Scholar
  2. 2.
    Arnould, E., Kung, H., et al.: A systolic array computer. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 10, pp. 232–235 (1985)Google Scholar
  3. 3.
    Borkar, S., Cohn, R., Cox, G., Gross, T., Kung, H.T., Lam, M., Levine, M., Moore, B., Moore, W., Peterson, C., Susman, J., Sutton, J., Urbanski, J., Webb, J.: Supporting systolic and memory communication in iwarp. In: Proc. 17th Intl. Symposium on Computer Architecture, pp. 71–80 (1990)Google Scholar
  4. 4.
    Broomhead, D., Harp, J., McWhirter, J., Palmer, K., Roberts, J.: A practical comparison of the systolic and wavefront array processing architectures. In: Proc. Intl. Conf. Acoustics, Speech, and Signal Processing, vol. 10, pp. 296–299 (1985)Google Scholar
  5. 5.
    Chen, Y.K., Kung, S.Y.: A systolic methodology with applications to full-search block matching architectures. J. of VLSI Signal Processing 19(1), 51–77 (1998)CrossRefGoogle Scholar
  6. 6.
    Foulser, D.E.: The Saxpy Matrix-1: A general-purpose systolic computer. IEEE Computer 20, 35–43 (1987)CrossRefGoogle Scholar
  7. 7.
    Gross, T., O’Hallaron, D.R.: iWarp: Anatomy of a Parallel Computing System. MIT Press, Boston, MA (1998)Google Scholar
  8. 8.
    Homewood, M., May, D., Shepherd, D., Shepherd, R.: The IMS T800 Transputer. IEEE Micro 7(5), 10–26 (1987)CrossRefGoogle Scholar
  9. 9.
    Hu, Y.H.: CORDIC-based VLSI architectures for digital signal processing. IEEE Signal Processing Magazine 9, 16–35 (1992)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Kittitornkun, S., Hu, Y.: Systolic full-search block matching motion estimation array structure. IEEE Trans. Circuits Syst. Video Technology 11, 248–251 (2001)CrossRefGoogle Scholar
  12. 12.
    Komarek, T., Pirsch, P.: Array architectures for block matching algorithms. IEEE Trans. Circuits Syst. 26(10), 1301–1308 (1989)CrossRefGoogle Scholar
  13. 13.
    Kung, H.T.: Why systolic array. IEEE Computers 15, 37–46 (1982)CrossRefGoogle Scholar
  14. 14.
    Kung, S.Y.: On supercomputing with systolic/wavefront array processors. Proc. IEEE 72, 1054–1066 (1984)Google Scholar
  15. 15.
    Kung, S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs, NJ (1988)Google Scholar
  16. 16.
    Kung, S.Y., Arun, K.S., Gal-Ezer, R.J., Bhaskar Rao, D.V.: Wavefront array processor: Language, architecture, and applications. IEEE Trans. Computer 31(11), 1054–1066 (1982)CrossRefGoogle Scholar
  17. 17.
    Jouppi, N. P., et al: In-Datacenter Performance Analysis of a Tensor Processing Unit. IEEE 44th International Symposium on Computer Architecture (ISCA), pp. 1–12, Toronto, Canada, (2017)Google Scholar
  18. 18.
    Lin, C.P., Tseng, P.C., Chiu, Y.T., Lin, S.S., Cheng, C.C., Fang, H.C., Chao, W.M., Chen, L.G.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In: Proc. International Solid-State Circuits Conference, pp. 1626–1635. San Francisco, CA (2006)Google Scholar
  19. 19.
    Ni, L.M., McKinley, P.: A survey of wormhole routing techniques in direct networks. IEEE Computer 26, 62–76 (1993)CrossRefGoogle Scholar
  20. 20.
    Nicoud, J.D., Tyrrell, A.M.: The transputer T414 instruction set. IEEE Micro 9(3), 60–75 (1989)CrossRefGoogle Scholar
  21. 21.
    Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K. and Chung, E.S.: Toward accelerating deep learning at scale using specialized hardware in the datacenter. IEEE Hot Chips 27 Symposium, 1–38 (2015)Google Scholar
  22. 22.
    Pan, S.B., Chae, S., Park, R.: VLSI architectures for block matching algorithm. IEEE Trans. Circuits Syst. Video Technol. 6(1), 67–73 (1996)CrossRefGoogle Scholar
  23. 23.
    Ramacher, U., Beichter, J., Raab, W., Anlauf, J., Bruels, N., Hachmann, U. and Wesseling, M.: Design of a 1st Generation Neurocomputer. VLSI Design of Neural Networks, Springer US. (1991)Google Scholar
  24. 24.
    Huttunen, H.: Deep neural networks: A signal processing perspective. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  25. 25.
    Seki, K., Kobori, T., Okello, J., Ikekawa, M.: A cordic-based reconfigrable systolic array processor for MIMO-OFDM wireless communications. In: Proc. IEEE Workshop on Signal Processing Systems, pp. 639–644. Shanghai, China (2007)Google Scholar
  26. 26.
    Taylor, R.: Signal processing with occam and the transputer. IEE Proceedings F: Communications, Radar and Signal Processing 131(6), 610–614 (1984)Google Scholar
  27. 27.
    Texas Instruments: TMS320C40 Digital Signal Processors (1996). URL
  28. 28.
    Volder, J.E.: The CORDIC trigonometric computing technique. IRE Trans. on Electronic Computers EC-8(3), 330–334 (1959)CrossRefGoogle Scholar
  29. 29.
    Walther, J.S.: A unified algorithm for elementary functions. In: Spring Joint Computer Conf. (1971)Google Scholar
  30. 30.
    Whitby-Strevens, C.: Transputers-past, present and future. IEEE Micro 10(6), 16–19, 76–82 (1990)CrossRefGoogle Scholar
  31. 31.
    Yeo, H., Hu, Y.: A novel modular systolic array architecture for full-search block matching motion estimation. IEEE Trans. Circuits Syst. Video Technol. 5(5), 407–416 (1995)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Wisconsin - MadisonDepartment of Electrical and Computer EngineeringMadisonUSA
  2. 2.Princeton UniversityDepartment of Electrical EngineeringPrincetonUSA

Personalised recommendations