Systolic Array Processor Developments

  • Keith Bromley
  • J. J. Symanski
  • J. M. Speiser
  • H. J. Whitehouse


The combination of systolic array processing techniques and VLSI fabrication promises to provide modularity in the implementation of matrix operations for signal-processing with throughput increasing linearly with the number of cells utlized. In order to achieve this, however, many design tradeoffs must be made.

Several fundamental questions need to be addressed: What level of complexity (control) should the processor incorporate in order to perform complicated algorithms? Should the control for the processing element be combinatorial logic or a microprocessor? The broad application of a systolic processing element will require flexibility in its architecture if it is to be produced in large enough quantities to lower the unit cost so that large arrays can be constructed.

In order to have a timely marriage of algorithms and hardware we must develop both concurrently so that each will affect the other. A brief description of the hardware for a programmable, reconfigurable systolic array test-bed, implemented with presently available integrated circuits and capable of 32 bit floating point arithmetic will be given. While this hardware requires a small printed circuit board for each processor, in a few years, one or two custom VLSI chips could be used instead, yielding a smaller, faster systolic array. The test-bed is flexible enough to allow experimentation with architecture and algorithms so that knowledgeable decisions can be made when it comes time to specify the architecture of a VLSI circuit for a particular set of applications.

The systolic array testbed system is composed of a minicomputer system interfaced to the array of systolic processor elements (SPEs). The minicomputer system is an HP-1000, with the usual complement of printer, disk storage, keyboard-CRT, etc. The systolic array is housed in a cabinet approximately 28″x19″x21″. The interface circuitry uses a single 16-bit data path from the host HP-1000 to communicate data and commands to the array.

Commands and data are generated in the HP-1000 by the operator using interface programs written in FORTRAN. Algorithms can be conceived, put into a series of commands for the systolic array processor, and tested for validity. Data computed in the array can be read by the host HP-1000 and displayed for the operator.

The use of a general purpose minicomputer as the driver for the systolic array gives unlimited flexibility in developing algorithms. Through the use of interface routines, algorithms can be tried, evaluated, change and tried again in a few minutes. Also, in cases where the output must be manipulated and fed back into the array, the manipulation of the data can be done either in the host using the high order language capability (for optimum flexibility), or in a dedicated microprocessor interfacing the systolic array to the host (for optimum speed).


Data Path Systolic Array Matrix Operation Gate Count Systolic Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1A]
    Speiser, J.M. and H.J. Whitehouse, “Architectures for Real-Time Matrix Operations,” Proceedings of the 1980 Government Microcircuits Applications Conference held at Houston, Texas, 19–21 Nov.1980.Google Scholar
  2. [1B]
    Speiser, J.M., H.J. Whitehouse, and K. Bromley, “Signal Processing Applications for Systolic Arrays,” Record of 14th Asilomar Conference on Circuits, Systems and Computers held at Pacific Grove, California, 17–19 Nov. 1980, IEEE Catalog No. 80CH1625-3, pp 100–104.Google Scholar
  3. [2]
    Dongarra, J.J., et al, LINPACK Users’Guide, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1979.Google Scholar
  4. [3]
    Garbow, B.S., et al, Matrix Eigensystem Routines-EISPACK Guide Extensions, Springer-Verlag, 1977.Google Scholar
  5. [4]
    Kung, H.T., “Systolic Arrays for VLSI,” in Duff, I.S. and G.W. Stewart, Sparse Matrix Proceedings, 1978, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1979 (Reprinted in Mead, C. and L. Conway, Introduction to VLSI, Addison-Wesley, 1980).Google Scholar
  6. [5]
    Kung, S.-Y., “VLSI Array Processor for Signal Processing,” presented at Conference on Advanced Research in Integrated Circuits held at MIT, Cambridge, Massachusetts, Jan. 1980.Google Scholar
  7. [6]
    Speiser, J.M. and H.J. Whitehouse, “Parallel processing algorithms and architectures for real-time signal processing,” Real-Time Signal Processing IV, a publication of the SPIE International Technical Symposium held in San Diego, 25–28 Aug. 1981, Vol. 298.Google Scholar
  8. [7]
    L.W. Sumney and E.D. Maynard, Jr., “The United States Department of Defense Program on Very High Speed Integrated Circuits (VHSIC),” Proc. 1979 Int. Symp. on Circuits and Systems, July 1979, pp. 559–563.Google Scholar
  9. [8]
    H.F. Mermoz, “Spatial Processing Beyond Adaptive Beamforming,” J. Acoust. Soc. Am. 70(1), July 1981, pp. 74–79.CrossRefGoogle Scholar

Copyright information

© Carnegie-Mellon University 1981

Authors and Affiliations

  • Keith Bromley
    • 1
  • J. J. Symanski
    • 1
  • J. M. Speiser
    • 1
  • H. J. Whitehouse
    • 1
  1. 1.Naval Ocean Systems CenterSan DiegoUSA

Personalised recommendations