What are the chances of collision in the sum of 2 hash codes?

What is the chance of collision if a new hash code is generated by adding 2 other hash codes in Java

For example:

Integer reportHashCode = reportFields.hashCode() + reportId.hashCode();

      

Suppose the Java hash is 32 bits and we can ignore the normal collision in the hash itself.

+3


source to share


2 answers


I would like to XOR

add here instead of because xor has 50-50% distribution 1

and 0

.



+2


source


How about what we learn? The following program will simulate this for you. Note that the two terms for the sum are randomly generated, so both have approximately the full integer range of probability. In fact, the two hash codes you are summing may not have a flat distribution over the whole integer space. The program can be adjusted to simulate this.

package hashcode;

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Random;
import java.util.Set;

public class HashCode {
    // Number of test cases
    private static final int TEST_CASES = 10_000_000;
    public static void main(String[] args) {
        // Random number generator
        Random rand = new Random();
        // Map from integers (result hash codes) to a list of addend pairs that formed those hash codes
        Map<Integer, Set<Pair>> hashCodesToComposites = new HashMap<>();
        // Number of collissions
        int collisions = 0;
        // Running simulations
        for (int i = 0; i < TEST_CASES; ++i) {
            if (TEST_CASES / 4 == i) {
                System.out.println("25 %");
            }
            if (TEST_CASES / 2 == i) {
                System.out.println("50 %");
            }
            if ((TEST_CASES * 3) / 4 == i) {
                System.out.println("75 %");
            }
            // Generating addends as random integers
            int first = rand.nextInt();
            int second = rand.nextInt();
            // The pair; its hash code is the sum of the above
            Pair pair = new Pair(first, second);
            // Did it occur before?
            if (hashCodesToComposites.containsKey(pair.hashCode())) {
                // Getting the set of addend pairs that created this hash code
                Set<Pair> composites = hashCodesToComposites.get(pair.hashCode());
                // Checking if by any chance the two random numbers happened to be the same (almost negligible)
                if (!composites.contains(pair)) {
                    // Actual collision from different numbers
                    collisions++;
                    // Adding to the set of composites
                    composites.add(pair);
                } // Same numbers; doesn't count as collision
            } else {
                // First occurrence of this hash code
                Set<Pair> composites = new HashSet<>();
                composites.add(pair);
                hashCodesToComposites.put(pair.hashCode(), composites);
            }
        }
        // Results
        System.out.println("Test cases: " + TEST_CASES);
        System.out.println("Collisions: " + collisions);
        System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
    }
    private static class Pair {
        final int first;
        final int second;
        final int hashCode;
        Pair(int first, int second) {
            this.first = first;
            this.second = second;
            hashCode = first + second;
        }
        @Override
        public int hashCode() {
            return hashCode;
        }
        @Override
        public boolean equals(Object obj) {
            if (this == obj) {
                return true;
            }
            final Pair other = (Pair) obj;
            return (this.first == other.first && this.second == other.second) || (this.first == other.second && this.second == other.first);
        }
    }
}

      

The result is usually 0.00115. This means that the probability of collisions is approximately 0.115%. I ran below to see what the chances of colliding only random integers are.



package hashcode;

import java.util.HashSet;
import java.util.Random;
import java.util.Set;

public class HashCode2 {
    // Number of test cases
    private static final int TEST_CASES = 10_000_000;
    public static void main(String[] args) {
        // Random number generator
        Random rand = new Random();
        Set<Integer> hashCodes = new HashSet<>();
        // Number of collissions
        int collisions = 0;
        // Running simulations
        for (int i = 0; i < TEST_CASES; ++i) {
            if (TEST_CASES / 4 == i) {
                System.out.println("25 %");
            }
            if (TEST_CASES / 2 == i) {
                System.out.println("50 %");
            }
            if ((TEST_CASES * 3) / 4 == i) {
                System.out.println("75 %");
            }
            int next = rand.nextInt();
            if (hashCodes.contains(next)) {
                collisions++;
            } else {
                hashCodes.add(next);
            }
        }
        // Results
        System.out.println("Test cases: " + TEST_CASES);
        System.out.println("Collisions: " + collisions);
        System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
    }
}

      

The likelihood is actually about the same. It is only slightly lower, but is still rounded to 0.115%. Finally, I tried the first program again, but with xor in the hashCode method Pair

instead of the sum. Result? The same way again.

So in the end, you can expect to be very close to the same collision rate as the random integers for the sum of the two hash codes and xor, provided both hash codes summed / xor 'ed have a good distribution.

0


source







All Articles