What are the chances of collision in the sum of 2 hash codes?
How about what we learn? The following program will simulate this for you. Note that the two terms for the sum are randomly generated, so both have approximately the full integer range of probability. In fact, the two hash codes you are summing may not have a flat distribution over the whole integer space. The program can be adjusted to simulate this.
package hashcode;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Random;
import java.util.Set;
public class HashCode {
// Number of test cases
private static final int TEST_CASES = 10_000_000;
public static void main(String[] args) {
// Random number generator
Random rand = new Random();
// Map from integers (result hash codes) to a list of addend pairs that formed those hash codes
Map<Integer, Set<Pair>> hashCodesToComposites = new HashMap<>();
// Number of collissions
int collisions = 0;
// Running simulations
for (int i = 0; i < TEST_CASES; ++i) {
if (TEST_CASES / 4 == i) {
System.out.println("25 %");
}
if (TEST_CASES / 2 == i) {
System.out.println("50 %");
}
if ((TEST_CASES * 3) / 4 == i) {
System.out.println("75 %");
}
// Generating addends as random integers
int first = rand.nextInt();
int second = rand.nextInt();
// The pair; its hash code is the sum of the above
Pair pair = new Pair(first, second);
// Did it occur before?
if (hashCodesToComposites.containsKey(pair.hashCode())) {
// Getting the set of addend pairs that created this hash code
Set<Pair> composites = hashCodesToComposites.get(pair.hashCode());
// Checking if by any chance the two random numbers happened to be the same (almost negligible)
if (!composites.contains(pair)) {
// Actual collision from different numbers
collisions++;
// Adding to the set of composites
composites.add(pair);
} // Same numbers; doesn't count as collision
} else {
// First occurrence of this hash code
Set<Pair> composites = new HashSet<>();
composites.add(pair);
hashCodesToComposites.put(pair.hashCode(), composites);
}
}
// Results
System.out.println("Test cases: " + TEST_CASES);
System.out.println("Collisions: " + collisions);
System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
}
private static class Pair {
final int first;
final int second;
final int hashCode;
Pair(int first, int second) {
this.first = first;
this.second = second;
hashCode = first + second;
}
@Override
public int hashCode() {
return hashCode;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
final Pair other = (Pair) obj;
return (this.first == other.first && this.second == other.second) || (this.first == other.second && this.second == other.first);
}
}
}
The result is usually 0.00115. This means that the probability of collisions is approximately 0.115%. I ran below to see what the chances of colliding only random integers are.
package hashcode;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class HashCode2 {
// Number of test cases
private static final int TEST_CASES = 10_000_000;
public static void main(String[] args) {
// Random number generator
Random rand = new Random();
Set<Integer> hashCodes = new HashSet<>();
// Number of collissions
int collisions = 0;
// Running simulations
for (int i = 0; i < TEST_CASES; ++i) {
if (TEST_CASES / 4 == i) {
System.out.println("25 %");
}
if (TEST_CASES / 2 == i) {
System.out.println("50 %");
}
if ((TEST_CASES * 3) / 4 == i) {
System.out.println("75 %");
}
int next = rand.nextInt();
if (hashCodes.contains(next)) {
collisions++;
} else {
hashCodes.add(next);
}
}
// Results
System.out.println("Test cases: " + TEST_CASES);
System.out.println("Collisions: " + collisions);
System.out.println("Probability: " + ((double) collisions / (double) TEST_CASES));
}
}
The likelihood is actually about the same. It is only slightly lower, but is still rounded to 0.115%. Finally, I tried the first program again, but with xor in the hashCode method Pair
instead of the sum. Result? The same way again.
So in the end, you can expect to be very close to the same collision rate as the random integers for the sum of the two hash codes and xor, provided both hash codes summed / xor 'ed have a good distribution.
source to share