How many hash functions are required in Bundle Min Hashing for logo recognition?

In relation to paper Angle drilling for logo recognition:

Suppose we have beams {2,5,18,444,678} and {2,5,79,368,841}, and the dictionary size is 1M. If we have only 1 sketch per bundle, we only need 1 hash function that deterministically divides 1M integers into values ​​from the uniform distribution in [0,1]. The hash function must have a fixed seed for every call. For the 4 sketches, we only need the same 4 seed hash function. Is this thought correct?

Or can we randomly choose a number from the set (bundle) as the Min Hash word since they represent a random permutation of the set?

Any link for implementing the hash functions required in the document?

Can MurmurHash3 do the job?

+3


source to share





All Articles