Hash functions and size tables of the form 2 ^ p

When calculating the index of the hash table from the hash code of the key, why do we avoid using the modulo remainder when the size of the bucket array is 2?

0


source to share


2 answers


When calculating the hash, you want to get as much information as you can save on a good allocation across the full range of bits: for example, unsigned 32-bit integers are usually good if you don't have many (> 3 billion) items to store in the hash table.

Convert hashcode to bucket index that you are really interested in. When the number of codes n is two, all you have to do is AND between the hash code h and (n-1), and the result is h mod n.



The reason this might be bad is because the AND operation simply strips off the bits - the high-level bits - from the hash code. It can be good or bad, depending on other things. On the one hand, it will be very fast, since AND is much faster than splitting (and this is the usual reason why you decide to use the power of 2 buckets), but on the other hand, weak hash functions can have poor entropy in the least significant bits: that is, the least significant bits do not change much when the data hashing changes.

+4


source


Let's say the table size is m = 2 ^ p. Let k be a key. Then, whenever we do k mod m, we only get the last p bits of the binary representation of k. Thus, if I add multiple keys that have the same last p bits, the hash function will perform VERY VERY badly, since all keys will be hashed in the same slot in the table. So avoid powers of 2



0


source







All Articles