Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor

Zhang, Guansong; Martínez, Francisco; Tal, Arie; Blainey, Bob

doi:10.1007/3-540-45009-2_7

Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor

Guansong Zhang⁵,
Francisco Martínez⁵,
Arie Tal⁵ &
…
Bob Blainey⁵

Conference paper
First Online: 01 January 2003

559 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2716))

Abstract

Barrier synchronization is an important and performance critical primitive in many parallel programming models, including the popular OpenMP model. In this paper, we compare the performance of several software implementations of barrier synchronization and introduce a new implementation, distributed counters with local sensor, which considerably reduces overhead on POWER3 and POWER4 SMP systems. Through experiments with the EPCC OpenMP benchmark, we demonstrate a 79% reduction in overhead on a 32-way POWER4 system and an 87% reduction in overhead on a 16-way POWER3 system when comparing with a fetch-and-add implementation. Since these improvements are primarily attributed to reduced L2 and L3 cache misses, we expect the relative performance of our implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies.

Ph.D. candidate, research student visiting from the Technical University of Catalonia (UPC), Barcelona, Spain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Message Passing Interface Forum. MPI: A message-passing interface standard, 1994.
Google Scholar
V.S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315–339, December 1990.
Article Google Scholar
Arvind Krishnamurthy and Katherine A. Yelick. Optimizing parallel programs with explicit synchronization. In SIGPLAN Conference on Programming Language Design and Implementation, pages 196–204, 1995.
Google Scholar
OpenMP Architecture Review Board. OpenMP specification FORTRAN version 2.0, 2000. http://www.openmp.org.
OpenMP Architecture Review Board. OpenMP specification C/C++ version 2.0, 2002. http://www.openmp.org.
Edinburgh Parallel Computing Center. OpenMP microbenchmarks, 1999. http://www.epcc.ed.ac.uk/research/openmpbench.
John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. on Computer Systems, 9(1):21–65, February 1991.
Article Google Scholar
Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou. A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: The case of the SGI Origin2000. June 1999.
Google Scholar
Steve Behling et al. The POWER4 processor introduction and tuning guide. Technical Report SG24-7041-00, International Technical Support Organization, November 2001. ISBN 0738423556.
Google Scholar
J. M. Bull. Measuring synchronization and scheduling overheads in OpenMP. In First European Workshop on OpenMP, October 1999.
Google Scholar
IBM Technical Disclosure Bulletin. Barrier Synchronization Using Fetch-and-Add and Broadcast. 34(8):33–34, 1992.
Google Scholar
Rainer Kreuzburg. Method of synchronization, 2001. United States Patent, No. US 6,330,619.
Google Scholar
Stefan Andersson et al. RS/6000 scientific and technical computing: POWER3 introduction and tuning guide. Technical Report SG24-5155-00, International Technical Support Organization, October 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Toronto Lab, Toronto, ON, L6G 1C7, Canada
Guansong Zhang, Francisco Martínez, Arie Tal & Bob Blainey

Authors

Guansong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Arie Tal
View author publications
You can also search for this author in PubMed Google Scholar
Bob Blainey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Road, Toronto, Ontario, M5S 3G4, Canada
Michael J. Voss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G., Martínez, F., Tal, A., Blainey, B. (2003). Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_7

Download citation

DOI: https://doi.org/10.1007/3-540-45009-2_7
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40435-4
Online ISBN: 978-3-540-45009-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics