Using hash as bucket index, modulo vs bitmask
I am looking at hash tables where some data is hashed and used for bucket index.
Some libraries use a modulo hash with the size of a bucket, while others use a bit mask. Where only the bits used by the bucket mask are used (ensuring that the range is not exceeded).
bitwise:
index = h->hash_func(key) & h->hash_mask;
modulo:
index = h->hash_func(key) % h->bucket_tot;
While there are obvious differences between the two, such as the bitmask bucket size limits, providing hashing gives good distribution on least significant bits, modulo speed ... etc.
Are there any good reasons to choose one by one?
(I will probably try and evaluate my own use case, but curious as to what is already known on this matter).
Note, this is just a key: storing values (dictionary / hash / associative array) and not security.
An example of dynamically resizing, concatenating a hash table using a bitmask:
- https://github.com/amadvance/tommyds/blob/master/tommyds/tommyhashdyn.c
- https://gitorious.org/ghelp/glib/source/glib/ghash.c
An example of using modulo:
source to share
You mentioned the "bucket" index, so I am assuming you mean hash tables with a separate chain as conflict resolution, in which case there is no reason to use the modulo or bitmask "stronger" you talked about (which BTW isn't as obvious as you said).
In some languages, in particular, the Java / JVM based array index is a positive 32-bit positive integer, so the maximum array size for a bitmask is 2 ^ 30, which may be insufficient and a strong reason to use inactivity size and size tables are two with which you can come close to 2 ^ 31-1 (the maximum possible signed 32-bit integer). But since you were using C ++ syntax, this shouldn't bother you.
Also, if you didn't just mean a single chain, some open address conflict resolution algorithms require the table size to meet certain conditions, for example, if you are implementing double hashing, the table size should be simple. In this case, you obviously should only use modulo to get the starting index on the table.
source to share