Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Abstract

It is shown that for cuckoo hashing with a stash as proposed by Kirsch et al. (Proc. 16th European Symposium on Algorithms (ESA), pp. 611–622, Springer, Berlin, 2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with constant stash size s the probability of a rehash is O(1/n s+1), the lookup time and the deletion time are O(s) in the worst case, and the amortized expected insertion time is O(s) as well. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (Discrete Math. Theor. Comput. Sci., 12(3):81–102, 2010) (resp. Θ(logn)-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (Proc. 35th ACM Symp. on Theory of Computing (STOC), pp. 629–638, 2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. The results can be generalized to situations where the stash size is non-constant.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    κ-wise independent families of hash functions are defined in Sect. 2.

  2. 2.

    Personal communication with Mikkel Thorup, 2012.

  3. 3.

    The notation “\(\exists T \subseteq S \colon\mathcal{A}_{T} \cap\mathrm{bad}_{T}\)” stands for the formally correct \(\bigcup_{T \subseteq S} (\mathcal{A}_{T} \cap\mathrm{bad}_{T})\). Generally, in slight abuse of notation, we will often use the name of an event “\(\mathcal{A}_{T}\)” (or “\(\mathcal{A}_{T} \cap\mathrm{bad}_{T}\)”) also for the statement “\(\mathcal{A}_{T}\) occurs” (or “\(\mathcal{A}_{T} \cap \mathrm{bad}_{T}\) occurs”).

  4. 4.

    When the stash has non-constant size, this yields non-constant lookup time. One way to circumvent this is to organize the stash itself as a hash table, which introduces failure probabilities of other types. See [1] for a detailed discussion of this issue.

  5. 5.

    http://www.boost.org.

  6. 6.

    Source code available at: http://eiche.theoinf.tu-ilmenau.de/ch-stash/.

References

  1. 1.

    Arbitman, Y.: Efficient dictionary data structures based on cuckoo hashing. Master’s thesis, Weizmann Institute of Science (2010)

  2. 2.

    Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)

  3. 3.

    Devroye, L., Morin, P.: Cuckoo hashing: Further analysis. Inf. Process. Lett. 86(4), 215–219 (2003)

  4. 4.

    Diestel, R.: Graph Theory. Springer, Berlin (2005)

  5. 5.

    Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. J. Algorithms 25(1), 19–51 (1997)

  6. 6.

    Dietzfelbinger, M., Rink, M.: Applications of a splitting trick. In: Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP). LNCS, vol. 5555, pp. 354–365. Springer, Berlin (2009)

  7. 7.

    Dietzfelbinger, M., Schellbach, U.: On risks of using cuckoo hashing with simple universal hash classes. In: Proc. 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 795–804 (2009)

  8. 8.

    Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci. 380(1–2), 47–68 (2007)

  9. 9.

    Dietzfelbinger, M., Woelfel, P.: Almost random graphs with simple hash functions. In: Proc. 35th ACM Symp. on Theory of Computing (STOC), New York, NY, USA, pp. 629–638 (2003)

  10. 10.

    Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.G.: Space efficient hash tables with worst case constant access time. Theory Comput. Syst. 38(2), 229–248 (2005)

  11. 11.

    Goodrich, M.T., Mitzenmacher, M.: Privacy-preserving access of outsourced data via oblivious ram simulation. In: Proc. 38th International Colloquium on Automata, Languages and Programming (ICALP), pp. 576–587 (2011)

  12. 12.

    Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. In: Proc. 16th European Symposium on Algorithms (ESA). LNCS, vol. 5193, pp. 611–622. Springer, Berlin (2008)

  13. 13.

    Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. SIAM J. Comput. 39(4), 1543–1561 (2009)

  14. 14.

    Klassen, T.Q., Woelfel, P.: Independence of tabulation-based hash classes. In: Proc. 10th Theoretical Informatics—Latin American Symposium (LATIN). LNCS, vol. 7256, pp. 506–517. Springer, Berlin (2012)

  15. 15.

    Kutzelnigg, R.: A further analysis of cuckoo hashing with a stash and random graphs of excess r. Discrete Math. Theor. Comput. Sci. 12(3), 81–102 (2010)

  16. 16.

    Mitzenmacher, M., Vadhan, S.P.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proc. 19th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 746–755 (2008)

  17. 17.

    Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)

  18. 18.

    Pǎtraşcu, M., Thorup, M.: The power of simple tabulation hashing. J. ACM 59(3), 14 (2012)

  19. 19.

    Siegel, A.: On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33(3), 505–543 (2004)

  20. 20.

    Thorup, M., Zhang, Y.: Tabulation based 4-universal hashing with applications to second moment estimation. In: Proc. 15th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 615–624 (2004)

  21. 21.

    Thorup, M., Zhang, Y.: Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41(2), 293–331 (2012)

  22. 22.

    Wegman, M.N., Carter, L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22, 265–279 (1981)

  23. 23.

    Woelfel, P.: Asymmetric balanced allocation with simple hash functions. In: Proc. 17th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 424–433 (2006)

Download references

Acknowledgements

We thank Pascal Klaue for implementing the algorithms and carrying out the experiments presented in Sect. 7. We thank the anonymous reviewers, whose suggestions helped a lot in improving the presentation of this work. We especially thank one reviewer who pointed out the extensions to non-constant stash size and κ-wise independent hash families.

Author information

Correspondence to Martin Aumüller.

Additional information

M. Dietzfelbinger was supported in part by DFG grant DI 412/10-2. P. Woelfel was supported by a Discovery Grant from the National Sciences and Research Council of Canada (NSERC). A preliminary version of this paper appeared under the title “Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash” in Proceedings of the 20th Annual European Symposium on Algorithms, Ljubljana, Slovenia, September 2012, Lecture Notes in Computer Science 7501, Springer 2012.

Appendix: Excess, Stash Size, and Insertions

Appendix: Excess, Stash Size, and Insertions

In this supplementary section, provided for the convenience of the reader, we clarify the connection between stash size needed and the excess ex(G(S,h 1,h 2)) of the cuckoo graph G(S,h 1,h 2) as well as the role of insertion procedures. In particular, we prove Lemma 5. The central statements of this section can also be found in [13, 15].

A.1 The Excess of a Graph

For G a graph, ζ(G) denotes the number of connected components of G. The cyclomatic number γ(G), technically defined as “the dimension of the cycle space of G”, can be characterized by the following basic formula [4]:

$$ \gamma(G) = m - n + \zeta(G), $$
(7)

for n the number of nodes and m the number of edges of G. Note that acyclic graphs are characterized by the equation n=m+ζ(G) and hence by the equation γ(G)=0. The following lemma gives two helpful ways of viewing γ(G).

Lemma 13

  1. (a)

    Assume Gis obtained from G by removing an edge e. If e is a cycle edge then γ(G′)=γ(G)−1, otherwise γ(G′)=γ(G).

  2. (b)

    If we remove edges from G sequentially, in an arbitrary order, and the resulting graph is acyclic, then γ(G) is the number of removed cycle edges—edges that are on a cycle when removed.

  3. (c)

    γ(G) is the minimum number of edges one has to remove from G such that the resulting graph is acyclic.

Proof

(a) We have, using (7) twice:

$$\gamma\bigl(G'\bigr) = (m-1) - n + \zeta\bigl(G'\bigr) = \gamma(G) - \bigl(1 - \bigl(\zeta \bigl(G'\bigr)-\zeta(G)\bigr) \bigr). $$

We observe:

  • If e is a cycle edge in G, then ζ(G′)=ζ(G), and hence γ(G′)=γ(G)−1.

  • If e is not a cycle edge, then ζ(G′)=ζ(G)+1, and hence γ(G′)=γ(G).

(b) By what we just observed, to reduce the cyclomatic number from γ(G) to 0 the number of rounds in which an edge is removed that is on a cycle must be γ(G). (c) By (b), if we start with G and iterate removing cycle edges, we obtain an acyclic graph, and the number of steps is γ(G). If we remove fewer than γ(G) edges (in any order), by (b) the resulting graph cannot be acyclic. □

We have defined the excess ex(G) of a graph G as the minimum number of edges one has to remove from G so that the remaining subgraph has only acyclic and unicyclic components. In [15] the characterization of this quantity given next was used as a definition; the same idea was used in [13] (without giving it a name).

For G a graph, let ζ cyc(G) denote the number of cyclic components of G.

Lemma 14

In all graphs G the equation ex(G)=γ(G)−ζ cyc(G) is satisfied.

Proof

Assume G has n nodes and m edges.

“≤”: Starting with G, we iteratively remove cycle edges until each cyclic component has only one cycle left. The number of edges removed is at least ex(G). Call the resulting graph G′. Removing one cycle edge from each of the ζ cyc(G) cyclic components of G′ will yield an acyclic graph. Lemma 13(b) tells us that together exactly γ(G) edges have been removed; hence γ(G)≥ex(G)+ζ cyc(G).

“≥”: Choose a set E + of ex(G) edges in G such that removing these edges leaves a graph G′ with only acyclic and unicyclic components. Now imagine that the edges in E + are removed one by one in an arbitrary order. Let β denote the number of edges in E + that are on a cycle when removed; the other ex(G)−β many were non-cycle edges when removed. Removing one cycle edge from each cyclic component of G′ will leave an acyclic graph. Counting the number of cycle edges we removed altogether, and applying Lemma 13(b) again, we see that γ(G)=β+ζ cyc(G′). Since removing a non-cycle edge from a graph can increase the number of cyclic components by at most 1, we have that ζ cyc(G′)≤ζ cyc(G)+(ex(G)−β). Combining the inequalities yields γ(G)≤ζ cyc(G)+ex(G). □

A.2 The Excess of the Cuckoo Graph and the Stash Size

The purpose of this section is to prove Lemma 5, which we recall here. We assume that h 1 and h 2 are given, and write G(S) for G(S,h 1,h 2), for SU.

Lemma 5

[13]

The keys from S can be stored in the two tables and a stash of size s using h 1,h 2 if and only if ex(G(S))≤s.

Proof

“⇒”: Assume T is a subset of S of size at most s such that all keys from S′=ST can be stored in the two tables. Then all components of G(S′) must be acyclic or unicyclic. (Assume C is a component with γ(C)>1. Then by (7) the number of edges (keys) in C would be strictly larger than the number of nodes (table positions), which is impossible.) Since G(S′) is obtained from G(S) by removing the edges (h 1(x),h 2(x)), xT, we get ex(G(S))≤s.

“⇐”: Assume ex(G(S))≤s. Choose a subset T of S of size ex(G(S)) such that G(ST) has only acyclic and unicyclic components. From what is known about the behavior of standard cuckoo hashing, we can store S′=ST in the two tables using h 1 and h 2 (e.g., see [3, Sect. 4]). (This can also be proved directly. If one of the nodes touched by an edge (h 1(x),h 2(x)), xS′, has degree 1, we place x in the corresponding cell. Iterating this, we can place all keys excepting those that belong to cycle edges. Since G(S′) has only acyclic and unicyclic components, the cycle edges form isolated simple cycles, and clearly the keys that belong to such a cycle can be placed in the corresponding cells.) By assumption, the keys from T fit into the stash. □

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Aumüller, M., Dietzfelbinger, M. & Woelfel, P. Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash. Algorithmica 70, 428–456 (2014). https://doi.org/10.1007/s00453-013-9840-x

Download citation

Keywords

  • Data structures
  • Hash table
  • Randomized algorithms
  • Hash functions
  • Cuckoo hashing
  • Random graphs