Heuristic Hash Functions

Mailund, Thomas

doi:10.1007/978-1-4842-4066-3_6

Thomas Mailund²

1422 Accesses

Abstract

The main topic of this book is implementing hash tables; it’s only secondarily about hash functions. This is why you have assumed a priori that you have uniformly distributed hash keys. In reality, this is unlikely to be the case; real data are rarely random samples from the space of possible data values. In this chapter, you will learn about commonly used heuristic hash functions. In the next chapter, you will see an approach to achieving stronger probabilistic guarantees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For all but two of the tables, that of size 64 and that of size 67, this means that the load is higher than 1, so this obviously will only work for chained hashing. The purpose of the examples in this chapter, however, is merely to show how keys are distributed over bins with tables of different sizes, so don’t worry about conflict resolution and load.
2.
When I say deterministic here, I mean that a hash function should always produce the same output on the same input. There are plenty of randomized hash functions, in the sense that they use random numbers as part of their construction. You fix these random numbers when you use the function to hash application keys. You can change from one hash function to another by picking new random numbers, but you can’t change them at arbitrary times if you want your function to consistently give you the same output for the same input. Universal hashing, which will be discussed in the next chapter, uses random numbers to create deterministic hash functions.
3.
The simplest I have seen was used to hash ASCII strings and only used the first character. For standard ASCII, there are only 128 characters (they use 7 bits per character), while for Extended ASCII there are 256. That is not the bad part, however. If you hash common words, such as variable names in a program, then they do not use the full set of ASCII characters. Using only the first character of a string is a very poor hash function.

Author information

Authors and Affiliations

Aarhus N, Denmark
Thomas Mailund

Authors

Thomas Mailund
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mailund, T. (2019). Heuristic Hash Functions. In: The Joys of Hashing. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4066-3_6

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4066-3_6
Published: 10 February 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4065-6
Online ISBN: 978-1-4842-4066-3
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics