Function for "uniform distribution" of a consecutive number in the space of possible values

I need to store a bunch of entities in the Google AppEngine (or you can think of any other hash table) under the keys I need to create from serial input.

As an example, let's say I only use keys with a decimal digit length. Then I need to store one object for key "0", one for key "1", one for key "2", etc.

The problem is that if I just use this increasing sequence directly as keys, it will physically keep all objects very close to each other, which can cause serious performance problems. Details here . For a shared hash table, you might think that all records are not evenly distributed across all buckets, but instead clustered into just a few buckets, which also leads to poor performance for search engines, etc.

So I'm looking for some function to "re-spread" my values ​​more evenly across the space of available values.

To stay with an example of single-digit keys, I could simply create a table containing a random permutation of all possible values, eg [5,9,2,4,1,8,0,6,3,7] and a pointer to that. Then, when I store records 0, 1, and 2 that will be next to each other, I will instead assign keys 5, 9 and 2, which will be more common on servers or hash buckets.

But I need to find a way to do this for 156 bit numbers, in which case a table with randomly rearranging all values ​​is not possible.

I have two requirements:

  • Every possible 156-bit number must be matched against exactly one value (up to 160 bits in order). No collisions allowed.
  • It should be cheap to calculate

I found one way to do this: just "encrypt" my value with SHACAL-1 or some other 160-bit encryption. But this seems like too much computational effort for what I am trying to achieve. Maybe some kind of pseudo-random function that I can use with my value as a seed? Did they guarantee collisions for free?


source to share

1 answer

you can use a discrete logarithm which gives you a perfect deterministic permutation of all positions in your array. However, the permutation is one-way: you cannot get the original position of your new i-th array without resorting to brute force (or re-permuting in the allowed direction)


if you don't need the extra space, you can keep the pair <value-originalindex>

and place them completely randomly (using some PRNG function), repeating in case of a collision (or taking into account the spaces already in use). The pairs are now distributed evenly. Retrieving the i-th element takes O (N), where N is the number of seats. This is the price for this algorithm.


take just a few random bits of your 156-bit values ​​and use them to form, say, a 12-bit unsigned index. Use this index to select the kth bucket from your final space (your space is divided into 2 ^ 12 buckets). The values ​​will only tend to aggregate if they use the same 12-bit random bits, which is very unlikely if you pick them carefully ... Use the remaining 156-12 = 143 bits for offset inside the buckets.


create a fixed random permutation of your 156 bits.



All Articles