In an asymmetric rendezvous system, such as an unfair synchronous queue or an elimination array, threads of two types, consumers and producers, show up and are matched each with a unique thread of the other type. Here we present new highly scalable, high throughput asymmetric rendezvous systems that outperform prior synchronous queue and elimination array implementations under both symmetric and asymmetric workloads (more operations of one type than the other). Based on this rendezvous system, we also construct a highly scalable and competitive stack implementation.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
This reflects Java semantics, where arrays are of references to objects and not of objects themselves.
This is standard array semantics in Java, but not in C++.
Java benchmarks were ran with HotSpot Server JVM, build 1.7.0_05-b05. C++ benchmarks were compiled with Sun C++ 5.9 on the SPARC machine and with gcc 4.3.3 (-O3 optimization setting) on the Intel machine. In the C++ experiments we used the Hoard 3.8  memory allocator.
We remove all statistics counting from the code and use the latest JVM. Thus, the results we report are usually slightly better than those reported in the original papers. On the other hand, we fixed a bug in the benchmark of  that miscounted timed-out operations of the Java channel as successful operations; thus the results we report for it are sometimes lower.
We reduced the overhead due to memory allocation in the original implementations  by caching objects popped from the stack and using them in future push operations.
Afek, Y., Korland, G., Natanzon, M., Shavit, N.: Scalable producer-consumer pools based on elimination-diffraction trees. In: Euro-Par 2010—Parallel Processing, vol. 6272 of LNCS, pp. 151–162. Springer, Berlin, Heidelberg (2010)
Andrews, G.R.: Concurrent Programming: Principles and Practice. Benjamin-Cummings Publishing Co, Redwood City (1991)
Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. SIGARCH Comput. Archit. News 28(5), 117–128 (2000)
Fatourou, P., Kallimanis, N.D.: A highly-efficient wait-free universal construction. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 325–334. ACM, New York, NY, USA (2011)
Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 257–266. ACM, New York, NY, USA, (2012)
Hanson, D.R.: C Interfaces and Implementations: Techniques for Creating Reusable Software. Addison-Wesley Longman Publishing, Boston (1996)
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pp. 355–364. ACM, New York, NY, USA (2010)
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Scalable flat-combining based synchronous queues. In: Proceedings of the 24th International Symposium on Distributed Computing (DISC 2010), vol. 6343 of LNCS, pp. 79–93. Springer, Berlin, Heidelberg (2010)
Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. J. Parallel Distrib. Comput. 70(1), 1–12 (2010). doi: 10.1016/j.jpdc.2009.08.011
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. (TOPLAS) 13, 124–149 (1991)
Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 12, 463–492 (1990)
Lea, D., Scherer, W.N. III, Scott, M.L.: java.util.concurrent. Exchanger source code. http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/Exchanger.java (2011)
Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of the 14th International Symposium on Distributed Computing (DISC 2000), vol. 1914 of LNCS, pp. 164–178. Springer, Berlin, Heidelberg (2000)
Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC ’96, pp. 267–275. ACM, New York, NY, USA (1996)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2005, pp. 253–262. ACM, New York, NY, USA (2005)
Scherer, W.N., III, Lea, D., Scott, M.L.: Scalable synchronous queues. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006, pp. 147–156. ACM, New York, NY, USA (2006)
Scherer, W.N. III, Lea, D., Scott, M.L.: A scalable elimination-based exchange channel. In: Workshop on Synchronization and Concurrency in Object-Oriented Languages (SCOOL 2005) October (2005)
Scherer, W.N. III, Scott, M.L.: Nonblocking concurrent data structures with condition synchronization. In: Proceedings of the 18th International Symposium on Distributed Computing (DISC 2004), vol. 3274 of LNCS, pp. 174–187. Springer, Berlin/Heidelberg (2004)
Shavit, N., Touitou, D.: Elimination trees and the construction of pools and stacks. Theory Comput. Syst. 30(6), 645–670 (1997). doi: 10.1007/s002240000072
Shavit, N., Zemach, A.: Diffracting trees. ACM Trans. Comput. Syst. (TOCS) 14, 385–428 (1996)
Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000)
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’11, ACM, New York, NY, USA (2011)
Treiber, R.K.: Systems programming: coping with parallelism. Technical Report RJ5118, IBM Almaden Research Center (2006)
We are grateful to Hillel Avni, Nir Shavit and the anonymous reviewers, whose comments and suggestions helped to considerably improve this paper.
This work was supported by the Israel Science Foundation under grant 1386/11 and by machine donations from Intel and Oracle. Adam Morrison is supported by an IBM PhD Fellowship.
About this article
Cite this article
Afek, Y., Hakimi, M. & Morrison, A. Fast and scalable rendezvousing. Distrib. Comput. 26, 243–269 (2013). https://doi.org/10.1007/s00446-013-0185-0
- Ring Size
- Private Location
- False Match
- Free Node
- Thread Count