Thread Migration in a Parallel Graph Reducer
- 195 Downloads
To support high level coordination, parallel functional languages need effective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if current work, or threads, are also distributed. Migrating a thread incurs significant execution cost and requires careful scheduling and an elaborate implementation.
This paper describes the design, implementation and performance of thread migration in the GUM runtime system underlying Glasgow parallel Haskell (GpH). Measurements of nontrivial programs on a highlatency cluster architecture show that thread migration can improve the performance of data-parallel and divide-and-conquer programs with low processor utilisation. Thread migration also reduces the variation in performance results obtained in separate executions of a program. Moreover, migration does not incur significant overheads if there are no migratable threads, or on a single processor. However, for programs that already exhibit good processor utilisation, migration may increase performance variability and very occasionally reduce performance.
KeywordsBlack Hole Processing Element Processor Utilisation Functional Language Thread Pool
Unable to display preview. Download preview PDF.
- 3.R.D. Blumofe, C.F. Joerg, C.E. Leiserson, K.H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP’95 — Symp. on Principles and Practice of Parallel Programming, pages 207–216, Santa Barbara, USA, 1995.Google Scholar
- 4.S. Breitinger, R. Loogen, Y. Ortega Mallén, and R. Peña Marí. Eden — The Paradise of Functional Concurrent Programming. In EuroPar’96 — European Conf. on Parallel Processing, LNCS 1123, pages 710–713, Lyon, France, 1996. Springer.Google Scholar
- 5.T. Bülck, A. Held, W. Kluge, S. Pantke, C. Rathsack, S-B. Scholz, and R. Schröder. Experience with the Implementation of a Concurrent Graph Reduction System on an nCUBE/2 Platform. In CONPAR’94 — Conf. on Parallel and Vector Processing, LNCS 854, pages 497–508. Springer, 1994.Google Scholar
- 6.J.S. Chase, F.G Amador, E.D. Lazowska, H.M Levy, and R.J. Littlefield. the Amber System: Parallel Programming on a Network of Multiprocessors. In Symp. on Operating Systems Principles, pages 147–158, Litchfield Park, AZ, USA, 1989.Google Scholar
- 8.A. R. Du Bois, R. Pointon, H-W. Loidl, and P. W. Trinder. Implementing Declarative Parallel Bottom-Avoiding Choice. In 14th Symposium on Computer Architecture and High Performance Computing, pages 82–89, Vitoria, Brazil, october 2002. IEEE Press.Google Scholar
- 9.K. Hammond and S.L. Peyton Jones. Some Early Experiments on the GRIP Parallel Reducer. In IFL’90 — Intl. Workshop on the Parallel Implementation of Functional Languages, pages 51–72, Nijmegen, The Netherlands, June 1990.Google Scholar
- 10.K. Hammond and S.L. Peyton Jones. Profiling Scheduling Strategies on the GRIP Multiprocessor. In IFL’92 — Intl. Workshop on the Parallel Implementation of Functional Languages, pages 73–98, RWTH Aachen, Germany, September 1992.Google Scholar
- 11.Impala. Impala-(IMplicitly PArallel LAnguage Application Suite). <URL:http://www.csg.lcs.mit.edu/impala/>, July 2001.
- 13.M. H. G. Kesseler. The Implementation of Functional Languages on Parallel Machines with Distributed Memory. PhD thesis, Wiskunde en Informatica, Katholieke Universiteit van Nijmegen, The Netherlands, 1996.Google Scholar
- 15.H-W. Loidl. Granularity in Large-Scale Parallel Functional Programming. PhD thesis, University of Glasgow, March 1998.Google Scholar
- 16.H-W. Loidl and K. Hammond. Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer. In IFL’96 — Intl. Workshop on the Implementation of Functional Languages, LNCS 1268, pages 184–199, Bonn/Bad-Godesberg, Germany, September 1996. Springer.Google Scholar
- 17.H-W. Loidl, U. Klusik, K. Hammond, R. Loogen, and P.W. Trinder. GpH and Eden: Comparing Two Parallel Functional Languages on a Beowulf Cluster. In SFP’00 — Scottish Functional Programming Workshop, volume 2 of Trends in Functional Programming, pages 39–52, University of St Andrews, Scotland, July 2000. Intellect.Google Scholar
- 18.H-W. Loidl, P.W. Trinder, and C. Butz. Tuning Task Granularity and Data Locality of Data Parallel GpH Programs. Parallel Processing Letters, 11(4):471–486, December 2001.Google Scholar
- 22.B. Mathiske, F. Matthes, and J.W. Schmidt. On Migrating Threads. In Intl. Workshop on Next Generation Information Technologies and Systems, Naharia, Israel, June 1995.Google Scholar
- 23.D. Milojicić, F. Douglis, and R. Weeler. Mobility: Processes, Computers, and Agents. Addison-Wesley, Reading, MA, USA, 1999.Google Scholar
- 24.R.S. Nikhil. Parallel Symbolic Computing in Cid. In Workshop on Parallel Symbolic Computing, LNCS 1068, pages 217–242, Beaune, France, Oct. 1995. Springer.Google Scholar
- 25.R.S. Nikhil and A. Singla. Automatic Granularity Control and Load-Balancing in Cid. Technical report, DEC Research Labs, December 1994.Google Scholar
- 27.D. Ridge, D. Becker, P. Merkey, and T. Sterling. Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs. In IEEE Aerospace Conference, pages 79–91, 1997.Google Scholar
- 29.P.W. Trinder, K. Hammond, J.S. Mattson Jr., A.S. Partridge, and S.L. Peyton Jones. GUM: a Portable Parallel Implementation of Haskell. In PLDI’96 — Conf. on Programming Language Design and Implementation, pages 79–88, Philadephia, USA, May 1996.Google Scholar
- 30.T. von Eicken, D.E. Culler, S.C. Goldstein, and K.E. Schauser. Active Messages: a Mechanism for Integrated Communication and Computation. In ISCA’92 — Intl. Symp. on Computer Architecture, pages 256–266, Gold Coast, Australia, May 1992. ACM Press.Google Scholar
- 31.P. Wegner. Programming Languages, Information Structures and Machine Organisation. McGraw-Hill, New York, 1971.Google Scholar