Skip to main content

Systolic Arrays

  • Chapter
  • First Online:
Handbook of Signal Processing Systems

Abstract

This chapter reviews the basic ideas of systolic array, its design methodologies, and historical development of various hardware implementations. Two modern applications, namely, motion estimation of video coding and wireless communication baseband processing are reviewed. The application to accelerating deep neural networks is also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Annaratone, M., Arnould, E., Gross, T., Kung, H.T., Lam, M., Menzilcioglu, O., and Webb, J.A.: The WARP computer: Architecture, implementation, and performance. IEEE Trans. Computers 36, 1523–1538 (1987)

    Article  Google Scholar 

  2. Arnould, E., Kung, H., et al.: A systolic array computer. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 10, pp. 232–235 (1985)

    Google Scholar 

  3. Borkar, S., Cohn, R., Cox, G., Gross, T., Kung, H.T., Lam, M., Levine, M., Moore, B., Moore, W., Peterson, C., Susman, J., Sutton, J., Urbanski, J., Webb, J.: Supporting systolic and memory communication in iwarp. In: Proc. 17th Intl. Symposium on Computer Architecture, pp. 71–80 (1990)

    Google Scholar 

  4. Broomhead, D., Harp, J., McWhirter, J., Palmer, K., Roberts, J.: A practical comparison of the systolic and wavefront array processing architectures. In: Proc. Intl. Conf. Acoustics, Speech, and Signal Processing, vol. 10, pp. 296–299 (1985)

    Google Scholar 

  5. Chen, Y.K., Kung, S.Y.: A systolic methodology with applications to full-search block matching architectures. J. of VLSI Signal Processing 19(1), 51–77 (1998)

    Article  Google Scholar 

  6. Foulser, D.E.: The Saxpy Matrix-1: A general-purpose systolic computer. IEEE Computer 20, 35–43 (1987)

    Article  Google Scholar 

  7. Gross, T., O’Hallaron, D.R.: iWarp: Anatomy of a Parallel Computing System. MIT Press, Boston, MA (1998)

    Google Scholar 

  8. Homewood, M., May, D., Shepherd, D., Shepherd, R.: The IMS T800 Transputer. IEEE Micro 7(5), 10–26 (1987)

    Article  Google Scholar 

  9. Hu, Y.H.: CORDIC-based VLSI architectures for digital signal processing. IEEE Signal Processing Magazine 9, 16–35 (1992)

    Article  Google Scholar 

  10. iWarp project. URL http://www.cs.cmu.edu/afs/cs/project/iwarp/archive/WWW-pages/iwarp.html

  11. Kittitornkun, S., Hu, Y.: Systolic full-search block matching motion estimation array structure. IEEE Trans. Circuits Syst. Video Technology 11, 248–251 (2001)

    Article  Google Scholar 

  12. Komarek, T., Pirsch, P.: Array architectures for block matching algorithms. IEEE Trans. Circuits Syst. 26(10), 1301–1308 (1989)

    Article  Google Scholar 

  13. Kung, H.T.: Why systolic array. IEEE Computers 15, 37–46 (1982)

    Article  Google Scholar 

  14. Kung, S.Y.: On supercomputing with systolic/wavefront array processors. Proc. IEEE 72, 1054–1066 (1984)

    Google Scholar 

  15. Kung, S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs, NJ (1988)

    Google Scholar 

  16. Kung, S.Y., Arun, K.S., Gal-Ezer, R.J., Bhaskar Rao, D.V.: Wavefront array processor: Language, architecture, and applications. IEEE Trans. Computer 31(11), 1054–1066 (1982)

    Article  Google Scholar 

  17. Jouppi, N. P., et al: In-Datacenter Performance Analysis of a Tensor Processing Unit. IEEE 44th International Symposium on Computer Architecture (ISCA), pp. 1–12, Toronto, Canada, (2017)

    Google Scholar 

  18. Lin, C.P., Tseng, P.C., Chiu, Y.T., Lin, S.S., Cheng, C.C., Fang, H.C., Chao, W.M., Chen, L.G.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In: Proc. International Solid-State Circuits Conference, pp. 1626–1635. San Francisco, CA (2006)

    Google Scholar 

  19. Ni, L.M., McKinley, P.: A survey of wormhole routing techniques in direct networks. IEEE Computer 26, 62–76 (1993)

    Article  Google Scholar 

  20. Nicoud, J.D., Tyrrell, A.M.: The transputer T414 instruction set. IEEE Micro 9(3), 60–75 (1989)

    Article  Google Scholar 

  21. Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K. and Chung, E.S.: Toward accelerating deep learning at scale using specialized hardware in the datacenter. IEEE Hot Chips 27 Symposium, 1–38 (2015)

    Google Scholar 

  22. Pan, S.B., Chae, S., Park, R.: VLSI architectures for block matching algorithm. IEEE Trans. Circuits Syst. Video Technol. 6(1), 67–73 (1996)

    Article  Google Scholar 

  23. Ramacher, U., Beichter, J., Raab, W., Anlauf, J., Bruels, N., Hachmann, U. and Wesseling, M.: Design of a 1st Generation Neurocomputer. VLSI Design of Neural Networks, Springer US. (1991)

    Google Scholar 

  24. Huttunen, H.: Deep neural networks: A signal processing perspective. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)

    Google Scholar 

  25. Seki, K., Kobori, T., Okello, J., Ikekawa, M.: A cordic-based reconfigrable systolic array processor for MIMO-OFDM wireless communications. In: Proc. IEEE Workshop on Signal Processing Systems, pp. 639–644. Shanghai, China (2007)

    Google Scholar 

  26. Taylor, R.: Signal processing with occam and the transputer. IEE Proceedings F: Communications, Radar and Signal Processing 131(6), 610–614 (1984)

    Google Scholar 

  27. Texas Instruments: TMS320C40 Digital Signal Processors (1996). URL http://focus.ti.com/docs/prod/folders/print/tms320c40.html

  28. Volder, J.E.: The CORDIC trigonometric computing technique. IRE Trans. on Electronic Computers EC-8(3), 330–334 (1959)

    Article  Google Scholar 

  29. Walther, J.S.: A unified algorithm for elementary functions. In: Spring Joint Computer Conf. (1971)

    Google Scholar 

  30. Whitby-Strevens, C.: Transputers-past, present and future. IEEE Micro 10(6), 16–19, 76–82 (1990)

    Article  Google Scholar 

  31. Yeo, H., Hu, Y.: A novel modular systolic array architecture for full-search block matching motion estimation. IEEE Trans. Circuits Syst. Video Technol. 5(5), 407–416 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hen Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hu, Y.H., Kung, SY. (2019). Systolic Arrays. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-91734-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91734-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91733-7

  • Online ISBN: 978-3-319-91734-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics