How to choose the modulus value in the Rabin-Karp algorithm?

I have a question about choosing q and d in the Rabin-Karp algorithm for string search. The algorithm uses the values ​​q as a modulus and d as a hash function.

If I choose q as the amount of power 2 and d = q-1 or d = q + 1

  • How do these options affect false hits?

  • What templates will be considered as false hits in the case d = q + 1 and in the case d = q-1?

+3


source to share


1 answer


X mod (2 ^ n +1) can be described as a variable sum / difference, for example.

X = 0x11223344, n = 8, X mod 257 == 0x44 - 0x33 + 0x22 - 0x11 mod 257. The problem here is that any duplicate email will cancel itself - not very practical - of course n shouldn't be 8.

X mod (2 ^ n -1) will be the sum of all n bit patterns, for example.



X = 0x11223344, n = 8, X mod 255 == (0x44 + 0x33 + 0x22 + 0x11) mod 255

The problem is that stripping off bytes: Hash ('ab') = Hash ('ba').

+1


source







All Articles