The MuSE system: A flexible combination of on-stack execution and work-stealing

  • Markus Leberecht
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1586)


Executing subordinate activities by pushing return addresses on the stack is the most efficient working mode for sequential programs. It is supported by all current processors, yet in most cases is inappropriate for parallel execution of indepented threads of control. This paper describes an approach of dynamically switching between efficient on-stack execution of sequential threads and off-stack spawning of parallel activities. The presented method allows to incorporate work-stealing into the scheduler, letting the system profit from its near-to-optimal loadbalancing properties.

Key words

Multithreading Work-Stealing Scalable Coherent Interface PC Cluster 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Acher, H. Hellwagner, W. Karl, and M. Leberecht. A PCI-SCI Bridge for Building a PC Cluster with Distributed Shared Memory. In Proceedings of the 6th International Workshop on SCI-Based High-Performance Low-Cost Computing, Santa Clara, CA, September 1996.Google Scholar
  2. 2.
    R. D. Blumofe and C. E. Leiserson. Scheduling Multithreaded Computations by Works Stealing. In Proceedings of the 35th Annual Sympsoium on Foundations of Computer Science (FOCS ’94), pages 356–368, Santa Fe, NM, USA, Nov 1994.Google Scholar
  3. 3.
    D. C. Cann. The Optimizing SISAL Compiler: Version 12.0. Technical Report UCRL-MA-110080, Lawrence Livermore, National Laboratory, April 1992.Google Scholar
  4. 4.
    P. Färber. Execution Architecture of the Multithreaded ADAM Prototype. PhD thesis, Eidgenössische Technische Hochschule, Zurich, Switzerland, 1996.Google Scholar
  5. 5.
    S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy Threads: Implementing a Fast Parallel Call. Journal of Parallel and Distributed Computing, 37(1):5–20, 25 August 1996.CrossRefGoogle Scholar
  6. 6.
    H. Hellwagner, W. Karl, and M. Leberecht. Enabling a PC Cluster for High-Performance Computing. SPEEDUP Journal, June 1997.Google Scholar
  7. 7.
    M. Ibel, K. E. Schauser, C. J. Scheiman, and M. Weis High-Performance Cluster Computing Using Scalable Coherent Interface. In Proceedings of the 7th Workshop on Low-Cost/High-Performance Computing (SCIzzL-7), Santa Clara, USA, March 1997. SCIzzL.Google Scholar
  8. 8.
    A. M. Mainwaring and D. E. Culler Active Messages: Organization and Applications Programming Interface. Computer Science Division, University of California at Berkeley, 1995. Scholar
  9. 9.
    J. Plevyak, V. Karamcheti, X. Zhang and A. Chien. A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers. In Proodings of the 1995 ACM/IEEE Supercomputing Conference, San Diego, CA, December 1995. ACM/IEEE.Google Scholar
  10. 10.
    S. Skedzielewski and J. Glauert. IF1—An Intermediate Form for Applicative Languages. Technical Report TR M-170, Lawrence Livermore National Laboratory, July 1985.Google Scholar
  11. 11.
    IEEE Computer Society. IEEE Standard for Scalable Coherent Interface (SCI). The Institute of Electrical and Electronics Engineers, Inc., 345 East 47th Street, New York, NY 10017, USA, August 1993.Google Scholar
  12. 12.
    K. Taura, S. Matsuoka, and A. Yonezawa. Stack Threads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs. In T. Ito and A. Yonezawa, editors, Proceedings of the International Workshop on the Theory and Practice of Parallel Programming, volume 907 of Lecture Notes of Computer Science, pages 121–136, Sendai, Japan, November 1994. Springer Verlag.Google Scholar

Copyright information

© Springer-Verlag 1999

Authors and Affiliations

  • Markus Leberecht
    • 1
  1. 1.Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)Technische Universität MünchenGermany

Personalised recommendations