Low-Overhead, High-Speed Multi-core Barrier Synchronization

Sartori, John; Kumar, Rakesh

doi:10.1007/978-3-642-11515-8_4

Low-Overhead, High-Speed Multi-core Barrier Synchronization

John Sartori²¹ &
Rakesh Kumar²¹

Conference paper

1323 Accesses
23 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations.

In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shang, S., Hwang, K.: Distributed hardwired barrier synchronization for scalable multiprocessor clusters. IEEE Trans. Parallel Distrib. Syst. 6(6), 591–605 (1995)
Article Google Scholar
Hoefler, T.: A survey of barrier algorithms for coarse grained supercomputers. Chemnitzer Informatik-Berichte (2004)
Google Scholar
Almási, G., et al.: Optimization of MPI collective communication on Bluegene/L systems. In: ICS 2005, pp. 253–262 (2005)
Google Scholar
Ramakrishnan, V., Scherson, I.D.: Efficient techniques for nested and disjoint barrier synchronization. J. Parallel Distrib. Comput. 58(2), 333–356 (1999)
Article Google Scholar
Chen, J., Watson, W.: Software barrier performance on dual quad-core Opterons. In: NAS 2008, pp. 303–309 (2008)
Google Scholar
Nikolopoulos, D., Papatheodorou, T.: Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In: IPDPS 2000, p. 711 (2000)
Google Scholar
Lee, J.B., Jhon, C.S.: Reducing coherence overhead of barrier synchronization in software DSMs. In: ICS 1998, pp. 1–18 (1998)
Google Scholar
Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)
Article Google Scholar
Coteus, P., et al.: Packaging the BlueGene/L supercomputer. IBM Journal of Research and Development 49(2-3), 213–248 (2005)
Article Google Scholar
Adams, D.: Cray T3D system architecture overview manual (1993), ftp://ftp.cray.com/product-info/mpp/T3D_Architecture_Over/T3D.overview.html
Freudenthal, E., Peze, O.: Efficient synchronization algorithms using fetch-and-add on multiple bitfield integers. Ultracomputer Note 148 (1988)
Google Scholar
Beckmann, C., Polychronopoulos, C.: Fast barrier synchronization hardware. In: ICS 1990, pp. 180–189 (1990)
Google Scholar
Biswas, R.: NAS parallel benchmarks (2009), http://www.nas.nasa.gov
Kumar, R., Zyuban, V., Tullsen, D.: Interconnections in multi-core architectures: Understanding mechanisms, overheads, and scaling. In: ISCA 2005 (2005)
Google Scholar
Althaus, E., Funke, S., Har-peled, S., Knemann, J.: Approximating k-hop minimum-spanning trees. Operations Research Letters 33, 120 (2005)
Article Google Scholar
Kumar, A., et al.: Express virtual channels: Towards the ideal interconnection fabric. SIGARCH Comput. Archit. News 35(2), 150–161 (2007)
Article Google Scholar
Binkert, N.L., et al.: The M5 simulator: Modeling networked systems. MICRO 26(4), 52–60 (2006)
Google Scholar
Sampson, J., et al.: Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. MICRO 39, 235–246 (2006)
Google Scholar
McMahon, F.: Livermore loops coded in C (1992), http://www.netlib.org/benchmark/livermorec
E.M.B. Consortium: EEMBC (2009), http://www.eembc.org
Zhu, W., et al.: Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In: ISCA 2007, pp. 35–45 (2007)
Google Scholar
Villa, O., Palermo, G., Silvano, C.: Efficiency and scalability of barrier synchronization on NOC based many-core architectures. In: CASES 2008, pp. 81–90 (2008)
Google Scholar
Scott, S.L.: Synchronization and communication in the T3E multiprocessor. SIGOPS Oper. Syst. Rev. 30(5), 26–36 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Coordinated Science Laboratory, University of Illinois at Urbana-Champaign,
John Sartori & Rakesh Kumar

Authors

John Sartori
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sartori, J., Kumar, R. (2010). Low-Overhead, High-Speed Multi-core Barrier Synchronization . In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics