Thread Migration in a Parallel Graph Reducer

  • André Rauber Du Bois
  • Hans-Wolfgang Loidl
  • Phil Trinder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2670)


To support high level coordination, parallel functional languages need effective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if current work, or threads, are also distributed. Migrating a thread incurs significant execution cost and requires careful scheduling and an elaborate implementation.

This paper describes the design, implementation and performance of thread migration in the GUM runtime system underlying Glasgow parallel Haskell (GpH). Measurements of nontrivial programs on a highlatency cluster architecture show that thread migration can improve the performance of data-parallel and divide-and-conquer programs with low processor utilisation. Thread migration also reduces the variation in performance results obtained in separate executions of a program. Moreover, migration does not incur significant overheads if there are no migratable threads, or on a single processor. However, for programs that already exhibit good processor utilisation, migration may increase performance variability and very occasionally reduce performance.


Black Hole Processing Element Processor Utilisation Functional Language Thread Pool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Barak and O. Laádan. The MOSIX Multicomputer Operating System for High-Performance Cluster Computing. Future Generation Computer Systems, 13(4–5):361–372, 1998.CrossRefGoogle Scholar
  2. 2.
    R. Baron, R. Rashid, E. Siegel, A. Tevanian, and M. Young. Mach-1: An Operating Environment for Large-Scale Multiprocessor Applications. IEEE Software, 2(4):65–67, July 1985.CrossRefGoogle Scholar
  3. 3.
    R.D. Blumofe, C.F. Joerg, C.E. Leiserson, K.H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP’95 — Symp. on Principles and Practice of Parallel Programming, pages 207–216, Santa Barbara, USA, 1995.Google Scholar
  4. 4.
    S. Breitinger, R. Loogen, Y. Ortega Mallén, and R. Peña Marí. Eden — The Paradise of Functional Concurrent Programming. In EuroPar’96 — European Conf. on Parallel Processing, LNCS 1123, pages 710–713, Lyon, France, 1996. Springer.Google Scholar
  5. 5.
    T. Bülck, A. Held, W. Kluge, S. Pantke, C. Rathsack, S-B. Scholz, and R. Schröder. Experience with the Implementation of a Concurrent Graph Reduction System on an nCUBE/2 Platform. In CONPAR’94 — Conf. on Parallel and Vector Processing, LNCS 854, pages 497–508. Springer, 1994.Google Scholar
  6. 6.
    J.S. Chase, F.G Amador, E.D. Lazowska, H.M Levy, and R.J. Littlefield. the Amber System: Parallel Programming on a Network of Multiprocessors. In Symp. on Operating Systems Principles, pages 147–158, Litchfield Park, AZ, USA, 1989.Google Scholar
  7. 7.
    D.E. Culler, S.C. Goldstein, K.E. Schauser, and T. von Eicken. TAM — A Compiler Controlled Threaded Abstract Machine. J. of Parallel and Distributed Computing, 18:347–370, June 1993.CrossRefGoogle Scholar
  8. 8.
    A. R. Du Bois, R. Pointon, H-W. Loidl, and P. W. Trinder. Implementing Declarative Parallel Bottom-Avoiding Choice. In 14th Symposium on Computer Architecture and High Performance Computing, pages 82–89, Vitoria, Brazil, october 2002. IEEE Press.Google Scholar
  9. 9.
    K. Hammond and S.L. Peyton Jones. Some Early Experiments on the GRIP Parallel Reducer. In IFL’90 — Intl. Workshop on the Parallel Implementation of Functional Languages, pages 51–72, Nijmegen, The Netherlands, June 1990.Google Scholar
  10. 10.
    K. Hammond and S.L. Peyton Jones. Profiling Scheduling Strategies on the GRIP Multiprocessor. In IFL’92 — Intl. Workshop on the Parallel Implementation of Functional Languages, pages 73–98, RWTH Aachen, Germany, September 1992.Google Scholar
  11. 11.
    Impala. Impala-(IMplicitly PArallel LAnguage Application Suite). <URL:>, July 2001.
  12. 12.
    A. Itzkovitz, A. Schuster, and L. Shalev. Thread Migration and its Applications in Distributed Shared Memory Systems. J. of Systems and Software, 42(1):71–87, 1998.CrossRefGoogle Scholar
  13. 13.
    M. H. G. Kesseler. The Implementation of Functional Languages on Parallel Machines with Distributed Memory. PhD thesis, Wiskunde en Informatica, Katholieke Universiteit van Nijmegen, The Netherlands, 1996.Google Scholar
  14. 14.
    H. Kingdon, D.R. Lester, and G. Burn. The HDG-machine: a Highly Distributed Graph-Reducer for a Transputer Network. Computer Journal, 34(4):290–301, 1991.CrossRefGoogle Scholar
  15. 15.
    H-W. Loidl. Granularity in Large-Scale Parallel Functional Programming. PhD thesis, University of Glasgow, March 1998.Google Scholar
  16. 16.
    H-W. Loidl and K. Hammond. Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer. In IFL’96 — Intl. Workshop on the Implementation of Functional Languages, LNCS 1268, pages 184–199, Bonn/Bad-Godesberg, Germany, September 1996. Springer.Google Scholar
  17. 17.
    H-W. Loidl, U. Klusik, K. Hammond, R. Loogen, and P.W. Trinder. GpH and Eden: Comparing Two Parallel Functional Languages on a Beowulf Cluster. In SFP’00 — Scottish Functional Programming Workshop, volume 2 of Trends in Functional Programming, pages 39–52, University of St Andrews, Scotland, July 2000. Intellect.Google Scholar
  18. 18.
    H-W. Loidl, P.W. Trinder, and C. Butz. Tuning Task Granularity and Data Locality of Data Parallel GpH Programs. Parallel Processing Letters, 11(4):471–486, December 2001.Google Scholar
  19. 19.
    H-W. Loidl, P.W. Trinder, K. Hammond, S.B. Junaidu, R.G. Morgan, and S.L. Peyton Jones. Engineering Parallel Symbolic Programs in GPH. Concurrency — Practice and Experience, 11:701–752, 1999.CrossRefGoogle Scholar
  20. 20.
    D.K. Lowenthal, V.W. Freeh, and G.R. Andrews. Using Fine-Grain Threads and Run-Time Decision Making in Parallel Computing. J. of Parallel and Distributed Computing, 37:42–54, 1996.CrossRefGoogle Scholar
  21. 21.
    E. Mascarenhas and V. Rego. Ariadne: Architecture of a Portable Threads System Supporting Thread Migration. Software — Practice and Experience, 26(3):327–356, March 1996.CrossRefGoogle Scholar
  22. 22.
    B. Mathiske, F. Matthes, and J.W. Schmidt. On Migrating Threads. In Intl. Workshop on Next Generation Information Technologies and Systems, Naharia, Israel, June 1995.Google Scholar
  23. 23.
    D. Milojicić, F. Douglis, and R. Weeler. Mobility: Processes, Computers, and Agents. Addison-Wesley, Reading, MA, USA, 1999.Google Scholar
  24. 24.
    R.S. Nikhil. Parallel Symbolic Computing in Cid. In Workshop on Parallel Symbolic Computing, LNCS 1068, pages 217–242, Beaune, France, Oct. 1995. Springer.Google Scholar
  25. 25.
    R.S. Nikhil and A. Singla. Automatic Granularity Control and Load-Balancing in Cid. Technical report, DEC Research Labs, December 1994.Google Scholar
  26. 26.
    S.L. Peyton Jones. Implementing Lazy Functional Languages on Stock Hardware: the Spineless Tagless G-machine. J. of Functional Programming, 2(2):127–202, July 1992.CrossRefzbMATHGoogle Scholar
  27. 27.
    D. Ridge, D. Becker, P. Merkey, and T. Sterling. Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs. In IEEE Aerospace Conference, pages 79–91, 1997.Google Scholar
  28. 28.
    P.W. Trinder, K. Hammond, H-W. Loidl, and S.L. Peyton Jones. Algorithm + Strategy = Parallelism. J. of Functional Programming, 8(1):23–60, January 1998.zbMATHCrossRefGoogle Scholar
  29. 29.
    P.W. Trinder, K. Hammond, J.S. Mattson Jr., A.S. Partridge, and S.L. Peyton Jones. GUM: a Portable Parallel Implementation of Haskell. In PLDI’96 — Conf. on Programming Language Design and Implementation, pages 79–88, Philadephia, USA, May 1996.Google Scholar
  30. 30.
    T. von Eicken, D.E. Culler, S.C. Goldstein, and K.E. Schauser. Active Messages: a Mechanism for Integrated Communication and Computation. In ISCA’92 — Intl. Symp. on Computer Architecture, pages 256–266, Gold Coast, Australia, May 1992. ACM Press.Google Scholar
  31. 31.
    P. Wegner. Programming Languages, Information Structures and Machine Organisation. McGraw-Hill, New York, 1971.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • André Rauber Du Bois
    • 1
  • Hans-Wolfgang Loidl
    • 2
  • Phil Trinder
    • 1
  1. 1.School of Mathematical and Computer SciencesHeriot-Watt UniversityRiccarton, EdinburghUK
  2. 2.Institut für InformatikLudwig-Maximilians-Universität MünchenMunchenGermany

Personalised recommendations