Putting inner loops automatically in silicon

Kung, H. T.

doi:10.1007/BFb0043449

H. T. Kung¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 163))

143 Accesses

Abstract

Many of the time consuming inner loops are inherently regular and parallel. These are exactly the structures that are well suited for VLSI implementation. As a result, it will become increasingly common to have subroutines that are directly executeable in silicon. Does it imply that in the near future many large computations can be effectively carried out by small computers equipped with silicon subroutines? This talk will present a simplied characterization of the silicon subroutine approach, and discuss systolic architectures—a powerful method for implementing cost-effective silicon subroutines for computations such as pattern matching and error-correcting. CAD systems at CMU that have made it possible for us to design some rather complex chips, such as a programmable systolic chip, will also be briefly described.

The research was supported in part by the Office of Naval Research under Contracts N00014-76-C-0370, NR 044-422 and N00014-80-C-0236, NR 048-659, and in part by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 3597, monitored by the Air Force Avionics Laboratory under Contract F33615-81-K-1539.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barbacci, M.R. Instruction Set Processor Specifications (ISPS): The Notation and Its Application. IEEE Transactions on Computers C-30(1):24–40, January, 1981.
Google Scholar
Bentley, J.L. A Parallel Algorithm for Constructing Minimum Spanning Trees. Journal of Algorithms 1:51–59, 1980.
Google Scholar
Bentley, J.L. and Kung, H.T. A Tree Machine for Searching Problems. In Proceedings of 1979 International Conference on Parallel Processing, pages 257–266. IEEE, August, 1979. Also available as a CMU Computer Science Department technical report, August 1979.
Google Scholar
Blackmer, J., P. Kuekes and Frank, G. A 200 MOPS Systolic Processor. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.
Google Scholar
Bojanczyk, A., Brent, R.P. and Kung, H.T. Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1981. The final version of the paper is to appear in SIAM Journal on Scientific and Statistical Computing.
Google Scholar
Brent, R.P. and Kung, H.T. Systolic VLSI Arrays for Polynomial GCD Computation. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.
Google Scholar
Bromley, K., Symanski, J.J., Speiser, J.M., and Whitehouse, H.J. Systolic Array Processor Developments. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 273–284. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Cappello, P.R. and Steiglitz K. Digital Signal Processing Applications of Systolic Algorithms. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 245–254. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Chazelle, Bernard. Computational Geometry on a Systolic Chip. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.
Google Scholar
Cohen, D. Mathematical Approach to Computational Networks. Technical Report ISI/RR-78-73, University of Southern California, Information Sciences Institute, November, 1978.
Google Scholar
Fisher, A. Systolic Algorithms for Running Order Statistics in Signal and Image Processing. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 265–272. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Fisher, A.L. and Kung, H.T. Synchronizing Large Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation Engineers, May, 1982.
Google Scholar
Foster, M.J. and Kung, H.T. The Design of Special-Purpose VLSI Chips. Computer 13(1):26–40, January, 1980. Reprint of the paper appears in Digital MOS Integrated Circuits, edited by Elmasry, M.I., IEEE Press Selected Reprint Series, 1981, pp. 204–217. A preliminary version of the paper, entitled “Design of Special-Purpose VLSI Chips: Example and Opinions,” also appears in Proceedings of the 7th International Symposium on Computer Architecture, pp. 300–307, La Baule, France, May 1980.
Google Scholar
Foster, M.J. and Kung, H.T. Recognize Regular Languages With Programmable Building-Blocks. In Gray, J.P. (editor), VLSI 81, pages 75–84. Academic Press, August, 1981. The final version is to appear in Journal of Digital Systems.
Google Scholar
Gentleman, W.M. and Kung, H.T. Matrix Triangularization by Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.
Google Scholar
Guibas, L.J. and Liang, F.M. Systolic Stacks, Queues, and Counters. In Proceedings of the Conference on Advanced Research in VLSI. Cambridge, Massachusetts, January, 1982.
Google Scholar
Guibas, L.J., Kung, H.T. and Thompson, C.D. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 509–525. California Institute of Technology, January, 1979.
Google Scholar
Hong, J.-W. and Kung, H.T. I/O Complexity: The Red-Blue Pebble Game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages 326–333. ACM SIGACT, May, 1981.
Google Scholar
Huffman, D.A. The Synthesis of Linear Sequential Coding Networks. In Cherry, C. (editor), Information Theory, pages 77–95. Academic press, 1957.
Google Scholar
Kung, H.T. Let's Design Algorithms for VLSI Systems. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 65–90. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, September 1979.
Google Scholar
Kung, H.T. Special-Purpose Devices for Signal and Image Processing: An Opportunity in VLSI. In Proceedings of the SPIE, Vol. 241, Real-Time Signal Processing III, pages 76–84. The Society of Photo-Optical Instrumentation Engineers, July, 1980.
Google Scholar
Kung, H.T., Ruane, L.M., and Yen, D.W.L. A Two-Level Pipelined Systolic Array for Convolutions. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 255–264. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Kung, H.T. Use of VLSI in Algebraic Computation: Some Suggestions. In Wang, P.S. (editor), Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation, pages 218–222. ACM SIGSAM, August, 1981.
Google Scholar
Kung, H.T. Why Systolic Architectures? Computer Magazine 15(1):37–46, January, 1982.
Google Scholar
Kung, H.T. and Lehman, P.L. Systolic (VLSI) Arrays for Relational Database Operations. In Proceedings of ACM-SIGMOD 1980 International Conference on Management of Data, pages 105–116. ACM, May, 1980. Also available as a CMU Computer Science Department technical report, August 1979.
Google Scholar
Kung, H.T. and Leiserson, C.E. Systolic Arrays (for VLSI). In Duff, I. S. and Stewart, G. W. (editors), Sparse Matrix Proceedings 1978, pages 256–282. Society for Industrial and Applied Mathematics, 1979. A slightly different version appears in Introduction to VLSI Systems by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3.
Google Scholar
Kung, H.T. and Picard, R.L. Hardware Pipelines for Multi-Dimensional Convolution and Resampling. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 273–278. IEEE Computer Society Press, November, 1981.
Google Scholar
Kung, H.T. and Song, S.W. A Systolic 2-D Convolution Chip. In Preston, K., Jr. and Uhr, L. (editor), Multicomputers and Image Processing: Algorithms and Programs, pages 373–384. 1982. An extended abstract appears in Proceedings of 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, November 11–13, 1981, pp. 159–160.
Google Scholar
Lehman, P.L. A Systolic (VLSI) Array for Processing Simple Relational Queries. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 285–295. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Leiserson, C.E. Systolic Priority Queues. In Proceedings of Conference on Very Large Scale Integration: Architecutre, Design, Fabrication, pages 199–214. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, April 1979.
Google Scholar
Leiserson, C.E. and Saxe, J.B. Optimizing Synchronous Systems. In Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pages 23–36. IEEE Computer Society, October, 1981.
Google Scholar
Liu, K.Y. Architecture for VLSI Design of Reed-Solomon Encoders. In Proceedings of the Second Caltech VLSI Conference. Caltech, January, 1981.
Google Scholar
Lyon, R.F. Two's Complement Pipeline Multipliers. IEEE Transactions on Communications COM-24(4):418–425, April, 1976.
Google Scholar
Mead, C.A. and Conway, L.A. Introduction to VLSI Systems. Addison-Wesley, Reading, Massachusetts, 1980.
Google Scholar
Mead, C.A., Pashley, R.D., Britton, L.D., Daimon, Y.T., and Sando, S.F. 128-Bit Multicomparator. IEEE Journal of Solid-State Circuits SC-11(5):692–695, October, 1976.
Google Scholar
Mukhopadhyay, A. Hardware Algorithms for Nonnumeric Computation. IEEE Transactions on Computers C-28(6):384–394, June, 1979.
Google Scholar
Noyce, R.N. Hardware Prospects and Limitations. In Dertouzos, M.L. and Moses, J. (editor), The Computer Age: A Twenty-Year View, pages 321–337. IEEE, 1979.
Google Scholar
Ottmann, T., Rosenberg, A.L. and Stockmeyer, L.J. A Dictionary Machine for VLSI. Technical Report RC 9060 (#39615), IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1981.
Google Scholar
Peterson, W.W. and Weldon, E.J., Jr. Error-Correcting Codes. MIT Press, Cambridge, Massachusetts, 1972.
Google Scholar
Savage, C. A Systolic Data Structure Chip for Connectivity Problems. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 296–300. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Schirm IV, L. Multiplier-Accumulator Application Notes.
Google Scholar
Song, S.W. On a High-Performance VLSI Solution to Database Problems. PhD thesis, Carnegie-Mellon University, Computer Science Department, July, 1981. Also available as a CMU Computer Science Department technical report, August 1981.
Google Scholar
Sutherland, I.E. and Mead, C.A. Microelectronics and Computer Science. Scientific American 237(3):210–228, September, 1977.
Google Scholar
Swartzlander, E.E., Jr. and Gilbert, B.K. Arithmetic for Ultra-High-Speed Tomography. IEEE Transactions on Computers C-29(5):341–354, May, 1980.
Google Scholar
Symanski, J.J. Progress on a Systolic Processor Implementation. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation, May, 1982.
Google Scholar
Todd, S. Algorithm and Hardware for a Merge Sort Using Multiple Processors. IBM Journal of Research and Development 22(5):509–517, September, 1978.
Google Scholar
Weiser, U. and Davis, A. A Wavefront Notation Tool for VLSI Array Design. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 226–234. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.
Google Scholar
Whiteside, R.A., Hibbard, P.G. and Ostlund, N.S. Systolic Algorithms for Monte Carlo Simulations. Draft, CMU Computer Science Department.
Google Scholar
Yen, D.W.L. and Kulkarni, A.V. The ESL Systolic Processor for Signal and Image Processing. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 265–272. November, 1981.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer, Carnegie-Mellon University, 15213, Pittsburgh, Pennsylvania, USA
H. T. Kung

Authors

H. T. Kung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Tosiyasu L. Kunii

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kung, H.T. (1984). Putting inner loops automatically in silicon. In: Kunii, T.L. (eds) VLSI Engineering. Lecture Notes in Computer Science, vol 163. Springer, Tokyo. https://doi.org/10.1007/BFb0043449

Download citation

DOI: https://doi.org/10.1007/BFb0043449
Published: 03 June 2005
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-70002-9
Online ISBN: 978-4-431-36817-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics