Hashing using the split method
For hash function: h(k) = k mod m;
I understand that I m=2^n
will always give the last n
bits of the LSB. I also understand that m=2^p-1
when K is a string converted to integers using radix 2^p
will give the same hash value for every permutation of characters in K. But why is "simple not too close to exact power of 2" a good choice? What if I choose 2^p - 2
or 2^p-3
? Why are these options considered bad?
Below is the text from CLRS:
"A simple not too close to the exact cardinality of 2 is often a good choice for m. For example, suppose we want to allocate a hash table, with collisions allowed by chaining, to hold roughly n 2000 character strings where a character is 8 bits. We don't consider an average of 3 items in a failed search, and so we allocate a hash table of size m D 701. We could choose m D 701, because that's just about 2000 = 3, but not close to any cardinality of 2.
source to share
Let's assume we are working with radix 2 p .
2 p -1 case:
Why is it a bad idea to use 2 p -1? We'll see,
k = Σa i2 ip
and if we divide by 2 p -1 we just get
k = Σa i2 ip = Σa i mod 2 p 1
therefore, since addition is commutative, we can swap numbers and get the same result.
2 p -b case:
Quoting from CLRS:
Simple not too close to exact power 2 is often a good choice for m.
k = Σa i2 ip = Σa ib i mod 2 p -b
So changing the least significant digit by one will change the hash by one. Changing the second least significant bit by one will change the hash by two. To actually change the hash, we would need to change the digits with a larger value. So, in the case of small b, we are faced with a problem similar to the case, then m is a power of 2, namely, we will depend on the distribution of the least significant digits.
source to share