Skip to main content
Log in

Simple multi-party set reconciliation

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Many distributed cloud-based services use multiple loosely consistent replicas of user information to avoid the high overhead of more tightly coupled synchronization. Periodically, the information must be synchronized, or reconciled. One can place this problem in the theoretical framework of set reconciliation: two parties \(A_1\) and \(A_2\) each hold a set of keys, named \(S_1\) and \(S_2\) respectively, and the goal is for both parties to obtain \(S_1 \cup S_2\). Typically, set reconciliation is interesting algorithmically when sets are large but the set difference \(|S_1-S_2|+|S_2-S_1|\) is small. In this setting the focus is on accomplishing reconciliation efficiently in terms of communication; ideally, the communication should depend on the size of the set difference, and not on the size of the sets. In this paper, we extend recent approaches using Invertible Bloom Lookup Tables (IBLTs) for set reconciliation to the multi-party setting. There are three or more parties \(A_1,A_2,\ldots ,A_n\) holding sets of keys \(S_1,S_2,\ldots ,S_n\) respectively, and the goal is for all parties to obtain \(\cup _i S_i\). While this could be done by pairwise reconciliations, we seek more effective methods. Our general approach can function even if the number of parties is not exactly known in advance, and with some additional cost can be used to determine which other parties hold missing keys. Our methodology uses network coding techniques in conjunction with IBLTs, allowing efficiency in network utilization along with efficiency obtained by passing messages of size \(O(|\cup _i S_i - \cap _i S_i|)\). By connecting reconciliation with network coding, we can provide efficient reconciliation methods for a number of natural distributed settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. See also https://redd.it/2hchs0 for further discussion.

  2. Some preliminary results for multi-party settings, also using the IBLT framework but based on pairwise reconciliations, were provided to us by Goyal and Varghese [22]. Also, after the appearance of this work on the arxiv, multi-party set reconciliation using characteristic polynomials and repeated pairwise reconciliations was examined in [4]; their conclusion is that is while it is possible, it currently seems much less efficient than the methods considered here.

  3. To obtain structures where m / t is very close to 1, one must use irregular IBLTs, where different keys utilize a different number of hash functions. We simplify our description here and use regular IBLTs, whics use the same number of hash functions for each key. See [20, 34] for more discussion.

  4. We note that it is possible to show that \(H(x) = (a^x \bmod p) \bmod q\), where \(1<a<p-1\) is a random positive integer and \(p>q^2\), has the needed properties when all keys are smaller than q.

  5. For this problem we follow the standard notation for gossip problems and use n for the number of vertices.

  6. These experiments were performed by Marco Gentili, who we thank for allowing their use in this paper.

References

  1. Andresen, G.: \(O(1)\) Block Propagation. https://gist.github.com/gavinandresen/e20c3b5a1d4b97f79ac2

  2. Appleby, A.: MurmurHash2 Implementation. https://github.com/aappleby/smhasher/wiki/MurmurHash2

  3. Bailis, P., Kingsbury, K.: The network is reliable: an informal survey of real-world communications failures. Queue 12(7), 20:20–20:32 (2014)

    Google Scholar 

  4. Boral, A., Mitzenmacher, M.: Multi-party set reconciliation using characteristic polynomials. In: Proceedings of the 52nd Allerton Conference, pp. 1182–1187 (2014)

  5. Botelho, F., Wormald, N., Ziviani, N.: Cores of random \(r\)-partite hypergraphs. Inf. Process. Lett. 112(8), 314–319 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Broder, A., Charikar, M., Frieze, A., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  7. Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. IEEE/ACM Trans. Netw. 12(5), 767–780 (2004)

    Article  Google Scholar 

  8. Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., et al.: Windows Azure Storage: a highly available cloud storage service with strong consistency. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)

  9. Chen, D., Konrad, C., Yi, Ke, Yu, W., Zhang, Q.: Robust set reconciliation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2014)

  10. Chierichetti, F., Lattanzi, S., Panconesi, A.: Almost tight bounds for rumour spreading with conductance. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing, pp. 399–408 (2010)

  11. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  12. Deb, S., Médard, M., Choute, C.: Algebraic gossip: a network coding approach to optimal multiple rumor mongering. IEEE Trans. Inf. Theory 52(6), 2486–2507 (2006)

    Article  MATH  Google Scholar 

  13. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220 (2007)

  14. Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry, D.: Epidemic algorithms for replicated database maintenance. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 1–12 (1987)

  15. Dodis, Y., Ostrovsky, R., Reyzin, L., Smith, A.: Fuzzy extractors: how to generate strong keys from biometrics and other noisy data. SIAM J. Comput. 38(1), 97–139 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Eppstein, D., Goodrich, M.: Straggler identification in round-trip data streams via Newton’s identities and invertible bloom filters. IEEE Trans. Knowl. Data Eng. 23(2), 297–306 (2011)

    Article  MATH  Google Scholar 

  17. Eppstein, D., Goodrich, M., Uyeda, F., Varghese, G.: What’s the difference? Efficient set reconciliation without prior context. In: Proceedings of the ACM SIGCOMM Conference, pp. 218–229 (2011)

  18. Giakkoupis, G.: Tight bounds for rumor spreading in graphs of a given conductance. In: Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science, pp. 57–68 (2011)

  19. Gabrys, R., Farnoud, F.: Reconciling similar sets of data. In: Proceedings of the International Symposium on Information Theory, pp. 2837–2341 (2014)

  20. Goodrich, M., Mitzenmacher, M.: Invertible bloom lookup tables. In: Proceedings of the 49th Annual Allerton Conference, pp. 792–799 (2011)

  21. Gorale, A.: Bitcoin in Bloom: How IBLTs Allow Bitcoin to Scale. Cryptocoin News, October 2, 2014. https://www.cryptocoinsnews.com/bitcoin-in-bloom-how-iblts-allow-bitcoin-scale

  22. Goyal, N., Varghese, G.: Personal communication (2013)

  23. Haeupler, B.: Analyzing network coding gossip made easy. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing, pp. 293–302 (2011)

  24. Hedetniemi, S., Hedetniemi, S., Liestman, A.: A survey of gossiping and broadcasting in communication networks. Networks 18(4), 319–349 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  25. Huang, Z., Yi, Ke, Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of the 31st Symposium on Principles of Database Systems, pp. 295–306 (2012)

  26. Juels, A., Sudan, M.: A fuzzy vault scheme. Des. Codes Crypt. 38(2), 237–257 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Katti, S., Rahul, H., Hu, W., Katabi, D., Médard, M., Crowcroft, J.: XORs in the air: practical wireless network coding. ACM SIGCOMM Comput. Commun. Rev. 36(4), 243–254 (2006)

    Article  Google Scholar 

  28. Knödel, W.: New gossips and telephones. Discrete Math. 13(1), 95 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  29. Kostić, D., Snoeren, A., Vadhat, A., Braud, R., Killian, C., Anderson, J., Albrecht, J., Rodriguesz, A., Vandkieft, E.: High-bandwidth data dissemination for large-scale distributed systems. ACM Trans. Comput. Syst. 26(1), Article 3 (2008)

  30. Luby, M., Mitzenmacher, M., Shokrollahi, A., Spielman, D.: Efficient erasure correcting codes. IEEE Trans. Inf. Theory 47(2), 569–584 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  31. Minsky, Y., Trachtenberg, A., Zippel, R.: Set reconciliation with nearly optimal communication complexity. IEEE Trans. Inf. Theory 49(9), 2213–2218 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  32. Mitzenmacher, M., Varghese, G.: The complexity of object reconciliation, and open problems related to set difference and coding. In: Proceedings of the 50th Annual Allerton Conference, pp. 1126–1132 (2012)

  33. Molloy, M.: The pure literal rule threshold and cores in random hypergraphs. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 672–681 (2004)

  34. Rink, M.: Mixed hypergraphs for linear-time construction of denser hashing-based data structures. In: SOFSEM 2013: Theory and Practice of Computer Science, pp. 356–368 (2013)

  35. Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identities. J. ACM 27(4), 701–717 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  36. Skachek, V., Rabbat, M.: Subspace synchronization: a network-coding approach to object reconciliation. In: Proceedings of the International Symposium on Information Theory, pp. 2301–2305 (2014)

  37. Skjegstad, M., Maseng, T.: Low complexity set reconciliation using Bloom filters. In: Proceedings of the 7th ACM ACM SIGACT/SIGMOBILE International Workshop on Foundations of Mobile Computing, pp. 33–41 (2011)

  38. Shah, D.: Gossip algorithms. Found. Trends Netw. 3(1), 1–125 (2008)

    Article  MATH  Google Scholar 

  39. Zippel, R.: Probabilistic algorithms for sparse polynomials. In: Proceedings on the International Symposium on Symbolic and Algebraic Computation, pp. 216–226 (1979)

Download references

Acknowledgements

The first author thanks George Varghese for suggesting the problem of multi-party set synchronization, and Marco Gentili for allowing us to report on his experiments in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Mitzenmacher.

Additional information

Michael Mitzenmacher was supported in part by NSF Grants CCF-1563710, CCF-1535795, CCF-1320231, CNS-1228598, IIS-0964473, and CCF-0915922. Rasmus Pagh was supported by the Danish National Research Foundation under the Sapere Aude program.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mitzenmacher, M., Pagh, R. Simple multi-party set reconciliation. Distrib. Comput. 31, 441–453 (2018). https://doi.org/10.1007/s00446-017-0316-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-017-0316-0

Keywords

Navigation