How random is PHP pseudo-random, 4 bytes

I tested the randomness of the generated values โ€‹โ€‹in PHP and considered 32-bit hexadecimal to represent a unique state over a given time interval.

I wrote this simple test script:

$checks = [];
$i = 0;

while (true) {
    $hash = hash('crc32b', openssl_random_pseudo_bytes(4));

    echo $hash . PHP_EOL;

    if (in_array($hash, $checks)) {
        echo 'Copy: ' . $i . PHP_EOL;
        break;
    }

    $i++;

    $checks[] = $hash;
}

      

Surprisingly (to me) this script generates a copy in less than 100,000 iterations and up to 1000 iterations.

My question is, am I doing something wrong here? Out of 4 billion possibilities, this level of frequency seems too unlikely.

+3


source to share


1 answer


No, this is not surprising, and there is nothing wrong with a random number generator. This is a birthday problem . There are only 23 people in a room, the probability that two of them have the same birthday is 50%. This is probably counter intuitive until you realize that there are 253 possible couples out of 23 people, so you get 253 shots from two people having the same birthday.

You are doing the same here. You don't see when you see a specific 32 bit value. Instead, you're looking for a match between any two values โ€‹โ€‹you've created so far, which gives you a much better chance. If you consider the step to be 100,000, you have 1 in 43,000 chances of matching one of the numbers you have created so far, as opposed to 1 in 4,300,000,000 chances of matching a specific number. Waiting for up to 100,000, you've added a lot of these chances.



See fooobar.com/questions/209117 / ... for the calculation for a 32 bit value. On average, you only need about 93,000 values โ€‹โ€‹to get hit.

By the way, using CRC-32 over a four-byte random value has nothing to do with it. The result will be the same as without it. All you do is map each 32-bit number uniquely (one-on-one and on) to another 32-bit number.

+2


source







All Articles