Abstract
A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
I write \(\left( \begin{array}{l}X \\ t \end{array}\right) \) for the set of subsets of X of cardinality t.
- 2.
The sub-hypergraph induced by \(X \subseteq V\) is \((X,E_X)\) where \(E_X = E \cap {\left( \begin{array}{l}X \\ t \end{array}\right) }\).
- 3.
The degree of a vertex is the number of hyperedges including it.
- 4.
Keeping only the (at most) n non-zero \(a^*_i\) and storing their indices in an array on which a rank/select structure is provided.
- 5.
In [20] a variant is also discussed that directly produces a MPHF, but its construction time is no longer linear in expectation.
- 6.
In order to highlight better the differences between the various approaches, in this section I consider the long-keys scenario.
- 7.
The last bucket may, of course, be smaller than b.
- 8.
If \(N_{\gamma }\) is a sequence of 1s, \(N_{\gamma }^+\) will not be added to the set.
- 9.
If \({\text {exit}}(x)\) is the root, the algorithm will return \(i=-1\), so it is still true that \(x[:i+1]\) is the name of the exit node.
References
Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)
Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)
Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)
Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)
Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)
Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)
Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)
Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)
Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)
Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appear
Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)
Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984)
Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Molloy, M.: Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms 27, 124–135 (2005)
Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007)
Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)
Acknowledgements
I want to thank Sebastiano Vigna for his comments and insightful suggestions. This paper is partially funded by the Google Focused Award “Web Algorithmics for Large-Scale Data Analysis”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boldi, P. (2015). Minimal and Monotone Minimal Perfect Hash Functions. In: Italiano, G., Pighizzini, G., Sannella, D. (eds) Mathematical Foundations of Computer Science 2015. MFCS 2015. Lecture Notes in Computer Science(), vol 9234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48057-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-48057-1_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48056-4
Online ISBN: 978-3-662-48057-1
eBook Packages: Computer ScienceComputer Science (R0)