Abstract
In scientific computation, loops are frequently used to compute large quantities of data organized in arrays. On a dataflow machine, the main challenge is how to maximally exploit fine-grain parallelism to speed up loop execution while not incurring excessive storage space overhead than what is necessary.
The main contributions of this paper include:
-
The minimum storage requirement to support the maximum computation rate is analyzed and a storage minimization scheme called limited balancing is introduced. The basic intuition is that, since maximum computation rate is dominated by critical cycles in the loop, we should not allocate extra storage beyond a certain limit bounded by the ratios of the critical cycles. In other words, all cycles should be balanced to have the same balancing ratio.
-
The limited balancing problem is formulated as a integer linear programming problem. An efficient solution of the problem is presented. It reduces the problem to a network flow problem called “minimum circulation flow” problem. Therefore, a polynomial time algorithm is established for the solution of the linear relaxation of the limited balancing problem.
Our formal framework is developed under a FIFO dataflow model where each are in the dataflow graph is a FIFO queue of certain size. we establish the maximum computation rate of a loop under earliest firing schedule, and show that the maximum computation rate is dominated by the critical cycles of the dataflow graph. We discuss how our results may be applied to both the static dataflow model and the dynamic dataflow model.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Arvind and D. E. Culler. Dataflow architectures. Annual Reviews in Computer Science, 1:225–253, 1986.
Arvind and et al. The tagged token dataflow architecture (preliminary version). Technical report, Laboratory for Computer Science, MIT, Cambridge, MA., August 1983.
Arvind and K. P. Gostelow. The U-Interpreter. IEEE Computer, 15(2):42–49, February 1982.
Arvind, K. P. Gostelow, and W. Plouffe. An Asynchronous Programming Language and Computing Machine. Department of Information and Computer Science, University of California, Irvine, December 1978.
Arvind and R. A. Iannucci. A critique of multiprocessing von Neumann style. In Proceedings of the Tenth Annual International Symposium on Computer Architecture, pages 426–436, 1983.
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, 1988.
D. Bernstein and I. Gertner. Scheduling expressions on a pipelined processor with a maximal delay of one cycle. ACM Transactions on Programming Languages and Systems, 11(1):57–66, January 1989.
V. Chvatal. Linear Porgramming. W.H. Freeman and Company., 1983.
D. E. Culler. Managing parallelism and resources in scientific dataflow programs, Ph.D thesis. Technical Report TR-446, Laboratory for Computer Science, MIT, 1989.
J. B. Dennis. First version of a data flow procedure language. Technical Report MIT/LCS/TM-61, Laboratory for Computer Science, MIT, 1975.
J. B. Dennis. Data flow for supercomputers. In Proceedings of the 1984 CompCon, March 1984.
J. B. Dennis. Evolution of the static dataflow architecture. In Advanced Topics in Dataflow Computing. Prentice-Hall, 1991.
J. B. Dennis and G. R. Gao. An efficient pipelined dataflow processor architecture. In Proceedings of the Supercomputing '88 Conference, pages 368–373, Florida, November 1988. IEEE Computer Society and ACM SIGARCH.
J. B. Dennis, G. R. Gao, and K. W. Todd. Modeling the weather with a data flow super-computer. IEEE Transactions on Computers, C-33(7):592–603, 1984.
J. B. Dennis and D. P. Misunas. A preliminary architecture for a basic data-flow processor. In The Second Annual Symposium on Computer Architecture, pages 126–132, January 1975.
J. Edmonds and R.M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM, 1972.
L. R. Ford and D. R. Fulkerson. Flow in Networks. Princeton University Press, Princeton, NJ, 1962.
D.R. Fulkerson. An out-of-kilter method for minimal cost flow problems. J. SIAM, 1961.
G. R. Gao. A pipelined code mapping scheme for static dataflow computers. Technical Report TR-371, Laboratory for Computer Science, MIT, 1986.
G. R. Gao. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Boston, December 1990.
G. R. Gao, H. H. J. Hum, and Y. B. Wong. An efficient scheme for fine-grain software pipelining. In Proceedings of the CONPAR '90-VAPP IV Conference, Zurich, Switzerland, September 1990.
G.R. Gao. A flexible architecture model for hybrid data-flow and control-flow evaluation. In Advanced Topics in Dataflow Computing. Prentice-Hall, 1991.
G.R. Gao, Y.B. Wong, and Q. Ning. A petri net model for loop scheduling. In the Proceedings of ACM SIGPLAN'91, Toronto, Canada. June 1991.
P. B. Gibbons and S. S. Muchnik. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM Symposium on Compiler Construction, pages 11–16, Palo Alto, CA, June 1986.
T.R. Gross. Code Optimization of Pipeline Constraints. PhD thesis, Computing System Lab., Stanford University, 1983.
J. Hennessy and T. Gross. Postpass code optimization of pipelined constraints. ACM Transactions on Programming Languages and Systems, 5(3):422–448, July 1983.
N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 1984.
R. M. Keller, G. Lindstrom, and S. Patil. A loosely-coupled applicative multi-processing system. In AFIPS Conference Proceedings, vol. 48, pages 613–622, 1979.
L. G. Khachian. A polynomial algorithm in linear programming. Soviet Math. Doklady, 20:191–194, 1979.
J. R. Larus and P. N. Hilfinger. Register allocation in the SPUR Lisp compiler. In Proceedings of the ACM Symposium on Compiler Construction, pages 255–263, Palo Alto, CA, June 1986.
E. Lawler. Combinatorial Optimization Networks and Matroids. Holt, Rinehart, and Winston, 1976.
G. M. Papadopoulos and D. E. Culler. Monsoon: An explicit token-store architecture. In Proceedings of the Seventeenth Annual International Symposium of Computer Architecture, Seattle, WA, pages 82–91, 1990.
C. V. Ramamoorthy and G. S. Ho. Performance evaluation of asynchronous concurrent systems using Petri Nets. IEEE Transactions on Computers, pages 440–448, September 1980.
S Sakai and et al. An architecture of a dataflow single chip processor. In Proceedings of the 16th International Symposium on Computer Architecture, pages 46–53, Israel, 1989.
I. Watson and J. Gurd. A practical data flow computer. IEEE Computer, 15(2):51–57, February 1982.
T. Yuba and et al. Sigma-1: A dataflow computer for scientific computations. Computer Physics Communications, 37:141–148, 1985.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, G., Ning, Q. (1992). Loop storage optimization for dataflow machines. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038676
Download citation
DOI: https://doi.org/10.1007/BFb0038676
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive