Scalable eventually consistent counters over unreliable networks
Abstract
Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network “likes”. Classic distributed counters, strongly consistent via linearisability or sequential consistency, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDCs) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the total operation ordering in classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first Conflict-free Replicated Data Type (CRDT) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.
Keywords
Conflict-free Replicated Data Types Distributed counters Eventual consistencyNotes
Acknowledgements
We would like to thank Marc Shapiro, the anonymous reviewers and the associate editor for the comments that helped improve the paper.
References
- 1.Ascó, A.: SyncFree: WP1 deliverable. In: Application and Environment Requirements. https://syncfree.lip6.fr/attachments/article/46/d1_1.pdf, (2015)
- 2.Aspnes, J., Herlihy, M., Shavit, N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
- 3.Attiya, H., Dolev, S., Welch, J.L.: Connection management without retaining information. In: Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, vol. 2, pp. 622–631. (1995)Google Scholar
- 4.Attiya, H., Rappoport, R.: The level of handshake required for establishing a connection. In: Tel, G., Vitnyi, P. (eds.) Distributed Algorithms. Lecture Notes in Computer Science, vol. 857, pp. 179–193. Springer, Berlin Heidelberg (1994)CrossRefGoogle Scholar
- 5.Brewer, E.A.: Towards robust distributed systems (abstract). In: Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’00, p. 7, New York (2000)Google Scholar
- 6.Cerf, V., Kahn, R.: A protocol for packet network intercommunication. IEEE Trans. Commun. 22(5), 637–648 (1974)CrossRefGoogle Scholar
- 7.Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)CrossRefGoogle Scholar
- 8.Davey, B.A., Priestley, H.A.: Introduction to lattices and order. Cambridge University Press, Cambridge (2002)CrossRefzbMATHGoogle Scholar
- 9.DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pp. 205–220, New York (2007)Google Scholar
- 10.Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry, D.: Epidemic algorithms for replicated database maintenance. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’87, pp. 1–12, New York (1987)Google Scholar
- 11.Fekete, A., Lynch, N., Mansour, Y., Spinelli, J.: The impossibility of implementing reliable communication in the face of crashes. J. ACM 40(5), 1087–1107 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)CrossRefGoogle Scholar
- 13.Goodman, J.R., Vernon, M.K., Woest, P.J.: Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In: Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-III, pp. 64–75, New York (1989)Google Scholar
- 14.Goudarzi, A.: Cassandra-4775: Counters 2.0. https://issues.apache.org/jira/browse/CASSANDRA-4775, (2012)
- 15.Gray, J., Lamport, L.: Consensus on transaction commit. ACM Trans. Database Syst. 31(1), 133–160 (2006)CrossRefGoogle Scholar
- 16.Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141. www.cidrdb.org, (2007)
- 17.Herlihy, M., Lim, B.-H., Shavit, N.: Scalable concurrent counting. ACM Trans. Comput. Syst. 13(4), 343–364 (1995)CrossRefGoogle Scholar
- 18.Herlihy, Maurice P., Wing, Jeannette M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
- 19.Hernandez, J., Phillips, I.: Weibull mixture model to characterise end-to-end Internet delay at coarse time-scales. IEE Proc. Commun. 153(2), 295–304 (2006)CrossRefGoogle Scholar
- 20.Jesus, P., Baquero, C., Almeida, P.S.: Flow updating: fault-tolerant aggrega-tion for dynamic networks. J. Parallel Distrib. Comput. 78, 53–64 (2015)CrossRefGoogle Scholar
- 21.Klophaus, R.: Riak core: building distributed applications without shared state. In: ACM SIGPLAN Commercial Users of Functional Programming, CUFP ’10, p. 14:1, New York (2010)Google Scholar
- 22.Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
- 23.Lebresne, S.: Cassandra-2495: add a proper retry mechanism for counters in case of failed request. https://issues.apache.org/jira/browse/CASSANDRA-2495, (2011)
- 24.Lynch, N.A., Tuttle, M.R.: Hierarchical correctness proofs for distributed algorithms. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’87, pp. 137–151, New York (1987)Google Scholar
- 25.Parker Jr., D.S., Popek, G.J., Rudisin, G., Stoughton, A., Walker, B.J., Walton, E., Chow, J.M., Edwards, D., Kiser, S., Kline, C.: Detection of mutual inconsistency in distributed systems. IEEE Trans. Softw. Eng. 9(3), 240–247 (1983)CrossRefGoogle Scholar
- 26.Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: A comprehensive study of convergent and commutative replicated data types. In: Rapport de recherche 7506, Institut Nat. de la Recherche en Informatique et Automatique (INRIA), Rocquencourt, France, (2011)Google Scholar
- 27.Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, SSS’11, pp. 386–400. Springer, Berlin (2011)Google Scholar
- 28.Shavit, N., Zemach, A.: Diffracting trees. ACM Trans. Comput. Syst. 14(4), 385–428 (1996)CrossRefGoogle Scholar
- 29.Stone, H.S.: Database applications of the fetch-and-add instruction. IEEE Trans. Comput. 33(7), 604–612 (1984)CrossRefGoogle Scholar
- 30.Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M., Theimer, M., Welch, B.W.: Session guarantees for weakly consistent replicated data. In: IEEE Computer Society and Proceedings of the Third International Conference on Parallel and Distributed Information Systems, PDIS ’94, pp. 140–149, Washington (1994)Google Scholar
- 31.Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)CrossRefGoogle Scholar
- 32.Wattenhofer, R., Widmayer, P.: The counting pyramid: an adaptive distributed counting scheme. J. Parallel Distrib. Comput. 64(4), 449–460 (2004)CrossRefzbMATHGoogle Scholar
- 33.Yew, P.-C., Tzeng, N.-F., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 36(4), 388–395 (1987)Google Scholar