Skip to main content

Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2716))

Abstract

Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies.

Ph.D. candidate, research student visiting from the Technical University of Catalonia (UPC), Barcelona, Spain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Message Passing Interface Forum. MPI: A message-passing interface standard, 1994.

    Google Scholar 

  2. V.S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315–339, December 1990.

    Article  Google Scholar 

  3. Arvind Krishnamurthy and Katherine A. Yelick. Optimizing parallel programs with explicit synchronization. In SIGPLAN Conference on Programming Language Design and Implementation, pages 196–204, 1995.

    Google Scholar 

  4. OpenMP Architecture Review Board. OpenMP specification FORTRAN version 2.0, 2000. http://www.openmp.org.

  5. OpenMP Architecture Review Board. OpenMP specification C/C++ version 2.0, 2002. http://www.openmp.org.

  6. Edinburgh Parallel Computing Center. OpenMP microbenchmarks, 1999. http://www.epcc.ed.ac.uk/research/openmpbench.

  7. John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. on Computer Systems, 9(1):21–65, February 1991.

    Article  Google Scholar 

  8. Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou. A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: The case of the SGI Origin2000. June 1999.

    Google Scholar 

  9. Steve Behling et al. The POWER4 processor introduction and tuning guide. Technical Report SG24-7041-00, International Technical Support Organization, November 2001. ISBN 0738423556.

    Google Scholar 

  10. J. M. Bull. Measuring synchronization and scheduling overheads in OpenMP. In First European Workshop on OpenMP, October 1999.

    Google Scholar 

  11. IBM Technical Disclosure Bulletin. Barrier Synchronization Using Fetch-and-Add and Broadcast. 34(8):33–34, 1992.

    Google Scholar 

  12. Rainer Kreuzburg. Method of synchronization, 2001. United States Patent, No. US 6,330,619.

    Google Scholar 

  13. Stefan Andersson et al. RS/6000 scientific and technical computing: POWER3 introduction and tuning guide. Technical Report SG24-5155-00, International Technical Support Organization, October 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, G., Martínez, F., Tal, A., Blainey, B. (2003). Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-45009-2_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40435-4

  • Online ISBN: 978-3-540-45009-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics