Skip to main content

Putting inner loops automatically in silicon

  • Chapter 3 VLSI Algorithms
  • Chapter
  • First Online:
VLSI Engineering

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 163))

  • 143 Accesses

Abstract

Many of the time consuming inner loops are inherently regular and parallel. These are exactly the structures that are well suited for VLSI implementation. As a result, it will become increasingly common to have subroutines that are directly executeable in silicon. Does it imply that in the near future many large computations can be effectively carried out by small computers equipped with silicon subroutines? This talk will present a simplied characterization of the silicon subroutine approach, and discuss systolic architectures—a powerful method for implementing cost-effective silicon subroutines for computations such as pattern matching and error-correcting. CAD systems at CMU that have made it possible for us to design some rather complex chips, such as a programmable systolic chip, will also be briefly described.

The research was supported in part by the Office of Naval Research under Contracts N00014-76-C-0370, NR 044-422 and N00014-80-C-0236, NR 048-659, and in part by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 3597, monitored by the Air Force Avionics Laboratory under Contract F33615-81-K-1539.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbacci, M.R. Instruction Set Processor Specifications (ISPS): The Notation and Its Application. IEEE Transactions on Computers C-30(1):24–40, January, 1981.

    Google Scholar 

  2. Bentley, J.L. A Parallel Algorithm for Constructing Minimum Spanning Trees. Journal of Algorithms 1:51–59, 1980.

    Google Scholar 

  3. Bentley, J.L. and Kung, H.T. A Tree Machine for Searching Problems. In Proceedings of 1979 International Conference on Parallel Processing, pages 257–266. IEEE, August, 1979. Also available as a CMU Computer Science Department technical report, August 1979.

    Google Scholar 

  4. Blackmer, J., P. Kuekes and Frank, G. A 200 MOPS Systolic Processor. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.

    Google Scholar 

  5. Bojanczyk, A., Brent, R.P. and Kung, H.T. Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1981. The final version of the paper is to appear in SIAM Journal on Scientific and Statistical Computing.

    Google Scholar 

  6. Brent, R.P. and Kung, H.T. Systolic VLSI Arrays for Polynomial GCD Computation. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.

    Google Scholar 

  7. Bromley, K., Symanski, J.J., Speiser, J.M., and Whitehouse, H.J. Systolic Array Processor Developments. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 273–284. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  8. Cappello, P.R. and Steiglitz K. Digital Signal Processing Applications of Systolic Algorithms. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 245–254. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  9. Chazelle, Bernard. Computational Geometry on a Systolic Chip. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.

    Google Scholar 

  10. Cohen, D. Mathematical Approach to Computational Networks. Technical Report ISI/RR-78-73, University of Southern California, Information Sciences Institute, November, 1978.

    Google Scholar 

  11. Fisher, A. Systolic Algorithms for Running Order Statistics in Signal and Image Processing. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 265–272. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  12. Fisher, A.L. and Kung, H.T. Synchronizing Large Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation Engineers, May, 1982.

    Google Scholar 

  13. Foster, M.J. and Kung, H.T. The Design of Special-Purpose VLSI Chips. Computer 13(1):26–40, January, 1980. Reprint of the paper appears in Digital MOS Integrated Circuits, edited by Elmasry, M.I., IEEE Press Selected Reprint Series, 1981, pp. 204–217. A preliminary version of the paper, entitled “Design of Special-Purpose VLSI Chips: Example and Opinions,” also appears in Proceedings of the 7th International Symposium on Computer Architecture, pp. 300–307, La Baule, France, May 1980.

    Google Scholar 

  14. Foster, M.J. and Kung, H.T. Recognize Regular Languages With Programmable Building-Blocks. In Gray, J.P. (editor), VLSI 81, pages 75–84. Academic Press, August, 1981. The final version is to appear in Journal of Digital Systems.

    Google Scholar 

  15. Gentleman, W.M. and Kung, H.T. Matrix Triangularization by Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.

    Google Scholar 

  16. Guibas, L.J. and Liang, F.M. Systolic Stacks, Queues, and Counters. In Proceedings of the Conference on Advanced Research in VLSI. Cambridge, Massachusetts, January, 1982.

    Google Scholar 

  17. Guibas, L.J., Kung, H.T. and Thompson, C.D. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 509–525. California Institute of Technology, January, 1979.

    Google Scholar 

  18. Hong, J.-W. and Kung, H.T. I/O Complexity: The Red-Blue Pebble Game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages 326–333. ACM SIGACT, May, 1981.

    Google Scholar 

  19. Huffman, D.A. The Synthesis of Linear Sequential Coding Networks. In Cherry, C. (editor), Information Theory, pages 77–95. Academic press, 1957.

    Google Scholar 

  20. Kung, H.T. Let's Design Algorithms for VLSI Systems. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 65–90. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, September 1979.

    Google Scholar 

  21. Kung, H.T. Special-Purpose Devices for Signal and Image Processing: An Opportunity in VLSI. In Proceedings of the SPIE, Vol. 241, Real-Time Signal Processing III, pages 76–84. The Society of Photo-Optical Instrumentation Engineers, July, 1980.

    Google Scholar 

  22. Kung, H.T., Ruane, L.M., and Yen, D.W.L. A Two-Level Pipelined Systolic Array for Convolutions. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 255–264. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  23. Kung, H.T. Use of VLSI in Algebraic Computation: Some Suggestions. In Wang, P.S. (editor), Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation, pages 218–222. ACM SIGSAM, August, 1981.

    Google Scholar 

  24. Kung, H.T. Why Systolic Architectures? Computer Magazine 15(1):37–46, January, 1982.

    Google Scholar 

  25. Kung, H.T. and Lehman, P.L. Systolic (VLSI) Arrays for Relational Database Operations. In Proceedings of ACM-SIGMOD 1980 International Conference on Management of Data, pages 105–116. ACM, May, 1980. Also available as a CMU Computer Science Department technical report, August 1979.

    Google Scholar 

  26. Kung, H.T. and Leiserson, C.E. Systolic Arrays (for VLSI). In Duff, I. S. and Stewart, G. W. (editors), Sparse Matrix Proceedings 1978, pages 256–282. Society for Industrial and Applied Mathematics, 1979. A slightly different version appears in Introduction to VLSI Systems by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3.

    Google Scholar 

  27. Kung, H.T. and Picard, R.L. Hardware Pipelines for Multi-Dimensional Convolution and Resampling. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 273–278. IEEE Computer Society Press, November, 1981.

    Google Scholar 

  28. Kung, H.T. and Song, S.W. A Systolic 2-D Convolution Chip. In Preston, K., Jr. and Uhr, L. (editor), Multicomputers and Image Processing: Algorithms and Programs, pages 373–384. 1982. An extended abstract appears in Proceedings of 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, November 11–13, 1981, pp. 159–160.

    Google Scholar 

  29. Lehman, P.L. A Systolic (VLSI) Array for Processing Simple Relational Queries. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 285–295. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  30. Leiserson, C.E. Systolic Priority Queues. In Proceedings of Conference on Very Large Scale Integration: Architecutre, Design, Fabrication, pages 199–214. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, April 1979.

    Google Scholar 

  31. Leiserson, C.E. and Saxe, J.B. Optimizing Synchronous Systems. In Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pages 23–36. IEEE Computer Society, October, 1981.

    Google Scholar 

  32. Liu, K.Y. Architecture for VLSI Design of Reed-Solomon Encoders. In Proceedings of the Second Caltech VLSI Conference. Caltech, January, 1981.

    Google Scholar 

  33. Lyon, R.F. Two's Complement Pipeline Multipliers. IEEE Transactions on Communications COM-24(4):418–425, April, 1976.

    Google Scholar 

  34. Mead, C.A. and Conway, L.A. Introduction to VLSI Systems. Addison-Wesley, Reading, Massachusetts, 1980.

    Google Scholar 

  35. Mead, C.A., Pashley, R.D., Britton, L.D., Daimon, Y.T., and Sando, S.F. 128-Bit Multicomparator. IEEE Journal of Solid-State Circuits SC-11(5):692–695, October, 1976.

    Google Scholar 

  36. Mukhopadhyay, A. Hardware Algorithms for Nonnumeric Computation. IEEE Transactions on Computers C-28(6):384–394, June, 1979.

    Google Scholar 

  37. Noyce, R.N. Hardware Prospects and Limitations. In Dertouzos, M.L. and Moses, J. (editor), The Computer Age: A Twenty-Year View, pages 321–337. IEEE, 1979.

    Google Scholar 

  38. Ottmann, T., Rosenberg, A.L. and Stockmeyer, L.J. A Dictionary Machine for VLSI. Technical Report RC 9060 (#39615), IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1981.

    Google Scholar 

  39. Peterson, W.W. and Weldon, E.J., Jr. Error-Correcting Codes. MIT Press, Cambridge, Massachusetts, 1972.

    Google Scholar 

  40. Savage, C. A Systolic Data Structure Chip for Connectivity Problems. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 296–300. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  41. Schirm IV, L. Multiplier-Accumulator Application Notes.

    Google Scholar 

  42. Song, S.W. On a High-Performance VLSI Solution to Database Problems. PhD thesis, Carnegie-Mellon University, Computer Science Department, July, 1981. Also available as a CMU Computer Science Department technical report, August 1981.

    Google Scholar 

  43. Sutherland, I.E. and Mead, C.A. Microelectronics and Computer Science. Scientific American 237(3):210–228, September, 1977.

    Google Scholar 

  44. Swartzlander, E.E., Jr. and Gilbert, B.K. Arithmetic for Ultra-High-Speed Tomography. IEEE Transactions on Computers C-29(5):341–354, May, 1980.

    Google Scholar 

  45. Symanski, J.J. Progress on a Systolic Processor Implementation. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation, May, 1982.

    Google Scholar 

  46. Todd, S. Algorithm and Hardware for a Merge Sort Using Multiple Processors. IBM Journal of Research and Development 22(5):509–517, September, 1978.

    Google Scholar 

  47. Weiser, U. and Davis, A. A Wavefront Notation Tool for VLSI Array Design. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 226–234. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.

    Google Scholar 

  48. Whiteside, R.A., Hibbard, P.G. and Ostlund, N.S. Systolic Algorithms for Monte Carlo Simulations. Draft, CMU Computer Science Department.

    Google Scholar 

  49. Yen, D.W.L. and Kulkarni, A.V. The ESL Systolic Processor for Signal and Image Processing. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 265–272. November, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Tosiyasu L. Kunii

Rights and permissions

Reprints and permissions

Copyright information

© 1984 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kung, H.T. (1984). Putting inner loops automatically in silicon. In: Kunii, T.L. (eds) VLSI Engineering. Lecture Notes in Computer Science, vol 163. Springer, Tokyo. https://doi.org/10.1007/BFb0043449

Download citation

  • DOI: https://doi.org/10.1007/BFb0043449

  • Published:

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-70002-9

  • Online ISBN: 978-4-431-36817-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics