Abstract
It is shown that for cuckoo hashing with a stash as proposed by Kirsch et al. (Proc. 16th European Symposium on Algorithms (ESA), pp. 611–622, Springer, Berlin, 2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with constant stash size s the probability of a rehash is O(1/n ^{s+1}), the lookup time and the deletion time are O(s) in the worst case, and the amortized expected insertion time is O(s) as well. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (Discrete Math. Theor. Comput. Sci., 12(3):81–102, 2010) (resp. Θ(logn)wise independence for standard cuckoo hashing) the new approach even works with 2wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (Proc. 35th ACM Symp. on Theory of Computing (STOC), pp. 629–638, 2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. The results can be generalized to situations where the stash size is nonconstant.
This is a preview of subscription content, log in to check access.
Notes
 1.
κwise independent families of hash functions are defined in Sect. 2.
 2.
Personal communication with Mikkel Thorup, 2012.
 3.
The notation “\(\exists T \subseteq S \colon\mathcal{A}_{T} \cap\mathrm{bad}_{T}\)” stands for the formally correct \(\bigcup_{T \subseteq S} (\mathcal{A}_{T} \cap\mathrm{bad}_{T})\). Generally, in slight abuse of notation, we will often use the name of an event “\(\mathcal{A}_{T}\)” (or “\(\mathcal{A}_{T} \cap\mathrm{bad}_{T}\)”) also for the statement “\(\mathcal{A}_{T}\) occurs” (or “\(\mathcal{A}_{T} \cap \mathrm{bad}_{T}\) occurs”).
 4.
When the stash has nonconstant size, this yields nonconstant lookup time. One way to circumvent this is to organize the stash itself as a hash table, which introduces failure probabilities of other types. See [1] for a detailed discussion of this issue.
 5.
 6.
Source code available at: http://eiche.theoinf.tuilmenau.de/chstash/.
References
 1.
Arbitman, Y.: Efficient dictionary data structures based on cuckoo hashing. Master’s thesis, Weizmann Institute of Science (2010)
 2.
Carter, L., Wegman, M.N.: Universal classes of hash functions. J. Comput. Syst. Sci. 18(2), 143–154 (1979)
 3.
Devroye, L., Morin, P.: Cuckoo hashing: Further analysis. Inf. Process. Lett. 86(4), 215–219 (2003)
 4.
Diestel, R.: Graph Theory. Springer, Berlin (2005)
 5.
Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closestpair problem. J. Algorithms 25(1), 19–51 (1997)
 6.
Dietzfelbinger, M., Rink, M.: Applications of a splitting trick. In: Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP). LNCS, vol. 5555, pp. 354–365. Springer, Berlin (2009)
 7.
Dietzfelbinger, M., Schellbach, U.: On risks of using cuckoo hashing with simple universal hash classes. In: Proc. 20th ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 795–804 (2009)
 8.
Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci. 380(1–2), 47–68 (2007)
 9.
Dietzfelbinger, M., Woelfel, P.: Almost random graphs with simple hash functions. In: Proc. 35th ACM Symp. on Theory of Computing (STOC), New York, NY, USA, pp. 629–638 (2003)
 10.
Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.G.: Space efficient hash tables with worst case constant access time. Theory Comput. Syst. 38(2), 229–248 (2005)
 11.
Goodrich, M.T., Mitzenmacher, M.: Privacypreserving access of outsourced data via oblivious ram simulation. In: Proc. 38th International Colloquium on Automata, Languages and Programming (ICALP), pp. 576–587 (2011)
 12.
Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. In: Proc. 16th European Symposium on Algorithms (ESA). LNCS, vol. 5193, pp. 611–622. Springer, Berlin (2008)
 13.
Kirsch, A., Mitzenmacher, M., Wieder, U.: More robust hashing: cuckoo hashing with a stash. SIAM J. Comput. 39(4), 1543–1561 (2009)
 14.
Klassen, T.Q., Woelfel, P.: Independence of tabulationbased hash classes. In: Proc. 10th Theoretical Informatics—Latin American Symposium (LATIN). LNCS, vol. 7256, pp. 506–517. Springer, Berlin (2012)
 15.
Kutzelnigg, R.: A further analysis of cuckoo hashing with a stash and random graphs of excess r. Discrete Math. Theor. Comput. Sci. 12(3), 81–102 (2010)
 16.
Mitzenmacher, M., Vadhan, S.P.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proc. 19th ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 746–755 (2008)
 17.
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
 18.
Pǎtraşcu, M., Thorup, M.: The power of simple tabulation hashing. J. ACM 59(3), 14 (2012)
 19.
Siegel, A.: On universal classes of extremely random constanttime hash functions. SIAM J. Comput. 33(3), 505–543 (2004)
 20.
Thorup, M., Zhang, Y.: Tabulation based 4universal hashing with applications to second moment estimation. In: Proc. 15th ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 615–624 (2004)
 21.
Thorup, M., Zhang, Y.: Tabulationbased 5independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41(2), 293–331 (2012)
 22.
Wegman, M.N., Carter, L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22, 265–279 (1981)
 23.
Woelfel, P.: Asymmetric balanced allocation with simple hash functions. In: Proc. 17th ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 424–433 (2006)
Acknowledgements
We thank Pascal Klaue for implementing the algorithms and carrying out the experiments presented in Sect. 7. We thank the anonymous reviewers, whose suggestions helped a lot in improving the presentation of this work. We especially thank one reviewer who pointed out the extensions to nonconstant stash size and κwise independent hash families.
Author information
Additional information
M. Dietzfelbinger was supported in part by DFG grant DI 412/102. P. Woelfel was supported by a Discovery Grant from the National Sciences and Research Council of Canada (NSERC). A preliminary version of this paper appeared under the title “Explicit and Efficient Hash Functions Suffice for Cuckoo Hashing with a Stash” in Proceedings of the 20th Annual European Symposium on Algorithms, Ljubljana, Slovenia, September 2012, Lecture Notes in Computer Science 7501, Springer 2012.
Appendix: Excess, Stash Size, and Insertions
Appendix: Excess, Stash Size, and Insertions
In this supplementary section, provided for the convenience of the reader, we clarify the connection between stash size needed and the excess ex(G(S,h _{1},h _{2})) of the cuckoo graph G(S,h _{1},h _{2}) as well as the role of insertion procedures. In particular, we prove Lemma 5. The central statements of this section can also be found in [13, 15].
A.1 The Excess of a Graph
For G a graph, ζ(G) denotes the number of connected components of G. The cyclomatic number γ(G), technically defined as “the dimension of the cycle space of G”, can be characterized by the following basic formula [4]:
for n the number of nodes and m the number of edges of G. Note that acyclic graphs are characterized by the equation n=m+ζ(G) and hence by the equation γ(G)=0. The following lemma gives two helpful ways of viewing γ(G).
Lemma 13

(a)
Assume G′ is obtained from G by removing an edge e. If e is a cycle edge then γ(G′)=γ(G)−1, otherwise γ(G′)=γ(G).

(b)
If we remove edges from G sequentially, in an arbitrary order, and the resulting graph is acyclic, then γ(G) is the number of removed cycle edges—edges that are on a cycle when removed.

(c)
γ(G) is the minimum number of edges one has to remove from G such that the resulting graph is acyclic.
Proof
(a) We have, using (7) twice:
We observe:

If e is a cycle edge in G, then ζ(G′)=ζ(G), and hence γ(G′)=γ(G)−1.

If e is not a cycle edge, then ζ(G′)=ζ(G)+1, and hence γ(G′)=γ(G).
(b) By what we just observed, to reduce the cyclomatic number from γ(G) to 0 the number of rounds in which an edge is removed that is on a cycle must be γ(G). (c) By (b), if we start with G and iterate removing cycle edges, we obtain an acyclic graph, and the number of steps is γ(G). If we remove fewer than γ(G) edges (in any order), by (b) the resulting graph cannot be acyclic. □
We have defined the excess ex(G) of a graph G as the minimum number of edges one has to remove from G so that the remaining subgraph has only acyclic and unicyclic components. In [15] the characterization of this quantity given next was used as a definition; the same idea was used in [13] (without giving it a name).
For G a graph, let ζ _{cyc}(G) denote the number of cyclic components of G.
Lemma 14
In all graphs G the equation ex(G)=γ(G)−ζ _{cyc}(G) is satisfied.
Proof
Assume G has n nodes and m edges.
“≤”: Starting with G, we iteratively remove cycle edges until each cyclic component has only one cycle left. The number of edges removed is at least ex(G). Call the resulting graph G′. Removing one cycle edge from each of the ζ _{cyc}(G) cyclic components of G′ will yield an acyclic graph. Lemma 13(b) tells us that together exactly γ(G) edges have been removed; hence γ(G)≥ex(G)+ζ _{cyc}(G).
“≥”: Choose a set E ^{+} of ex(G) edges in G such that removing these edges leaves a graph G′ with only acyclic and unicyclic components. Now imagine that the edges in E ^{+} are removed one by one in an arbitrary order. Let β denote the number of edges in E ^{+} that are on a cycle when removed; the other ex(G)−β many were noncycle edges when removed. Removing one cycle edge from each cyclic component of G′ will leave an acyclic graph. Counting the number of cycle edges we removed altogether, and applying Lemma 13(b) again, we see that γ(G)=β+ζ _{cyc}(G′). Since removing a noncycle edge from a graph can increase the number of cyclic components by at most 1, we have that ζ _{cyc}(G′)≤ζ _{cyc}(G)+(ex(G)−β). Combining the inequalities yields γ(G)≤ζ _{cyc}(G)+ex(G). □
A.2 The Excess of the Cuckoo Graph and the Stash Size
The purpose of this section is to prove Lemma 5, which we recall here. We assume that h _{1} and h _{2} are given, and write G(S) for G(S,h _{1},h _{2}), for S⊆U.
Lemma 5
[13]
The keys from S can be stored in the two tables and a stash of size s using h _{1},h _{2} if and only if ex(G(S))≤s.
Proof
“⇒”: Assume T is a subset of S of size at most s such that all keys from S′=S−T can be stored in the two tables. Then all components of G(S′) must be acyclic or unicyclic. (Assume C is a component with γ(C)>1. Then by (7) the number of edges (keys) in C would be strictly larger than the number of nodes (table positions), which is impossible.) Since G(S′) is obtained from G(S) by removing the edges (h _{1}(x),h _{2}(x)), x∈T, we get ex(G(S))≤s.
“⇐”: Assume ex(G(S))≤s. Choose a subset T of S of size ex(G(S)) such that G(S−T) has only acyclic and unicyclic components. From what is known about the behavior of standard cuckoo hashing, we can store S′=S−T in the two tables using h _{1} and h _{2} (e.g., see [3, Sect. 4]). (This can also be proved directly. If one of the nodes touched by an edge (h _{1}(x),h _{2}(x)), x∈S′, has degree 1, we place x in the corresponding cell. Iterating this, we can place all keys excepting those that belong to cycle edges. Since G(S′) has only acyclic and unicyclic components, the cycle edges form isolated simple cycles, and clearly the keys that belong to such a cycle can be placed in the corresponding cells.) By assumption, the keys from T fit into the stash. □
Rights and permissions
About this article
Cite this article
Aumüller, M., Dietzfelbinger, M. & Woelfel, P. Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash. Algorithmica 70, 428–456 (2014). https://doi.org/10.1007/s004530139840x
Received:
Accepted:
Published:
Issue Date:
Keywords
 Data structures
 Hash table
 Randomized algorithms
 Hash functions
 Cuckoo hashing
 Random graphs