Why are these hash codes equal?

This test fails:

var hashCode = new 
{
    CustomerId = 3354,
    ServiceId = 3,
    CmsThematicId = (int?)605,
    StartDate = (DateTime?)new DateTime(2013, 1, 5),
    EndDate = (DateTime?)new DateTime(2013, 1, 6)
}.GetHashCode();
var hashCode2 = new
{
    CustomerId = 1210,
    ServiceId = 3,
    CmsThematicId = (int?)591,
    StartDate = (DateTime?)new DateTime(2013, 3, 31),
    EndDate = (DateTime?)new DateTime(2013, 4, 1)
}.GetHashCode();
Assert.AreNotEqual(hashCode, hashCode2);

      

Can you tell me why?

+3


source to share


4 answers


Jim suggested that I (via chat) store my options, so when I show my options, select not used and then when someone logs in, I use the flag. But this is a great PITA for generating all parameters.

So my solution is to generate an int64 hash like this



const long i = -1521134295;    
return -i * (-i * (-i * (-i * -117147284 + customerId.GetHashCode()) + serviceId.GetHashCode()) + cmsThematicId.GetHashCode()) + startDate.GetHashCode();

      

I removed the end date because its value depended on serviceId and startDate, so I shouldn't have added that to the hash in the first place. I copy / paste it from the decompilation of the generated class. I didn't have a collision if I test with 300,000 different combinations.

0


source


Surprisingly, you found this match.

Anonymous classes have a generated method GetHashCode()

that generates a hash code by combining the hash codes of all properties.

This is basically a calculation:

  public override int GetHashCode()
  {
    return        -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                   1170354300 + 
                  CustomerId.GetHashCode()) +
                  ServiceId.GetHashCode()) + 
                  CmsThematicId.GetHashCode()) + 
                  StartDate.GetHashCode()) + 
                  EndDate.GetHashCode();
  }

      

If you change any of the values ​​of any of the fields, the hash code changes. The fact that you found two different sets of values ​​that get the same hash codes is a coincidence.



Please note that hash codes are not necessarily unique. It is impossible to say that hash codes have always been unique as there can be more objects than hash codes (although there are many objects). Good hash codes provide a random distribution of values.

NOTE. The above is from .NET 4. Different versions of .NET may be different and Mono is different.

If you want to compare two objects for equality, use .Equals()

. For anonymous objects, it compares each field. An even better option is to use an NUnit constraint that compares each field and tells you which field differs from it. I posted a limitation here:

fooobar.com/questions/141580 / ...

+3


source


Have you run into this while processing quite a lot of data?

Welcome to the wonderful world of hash codes. The hash code is not a "unique identifier". It can't be. There are essentially an infinite number of possible different instances of this anonymous type, but only 2 ^ 32 possible hash codes. Therefore, it is guaranteed that if you create enough of these objects, you will see some duplicates. In fact, if you randomly generate 70,000 of these objects, the odds are better than 50% that two of them will have the same hashcode.

See Birthdays, Random Numbers and Hash Codes , and the linked Wikipedia article for more information.

As for why some people haven't seen duplicates while others, they probably ran the program in different versions of .NET. The algorithm for generating hash codes is not guaranteed to remain the same for versions or platforms:

The GetHashCode method for an object must consistently return the same hash code, if there is no change in the state of the object, determines the return value of the object's Equals method. Note that this is only true for the current execution of the application, and that a different hash code may be returned if the application is started again .

+1


source


Your test is invalid.

Since hashes are not guaranteed to be unique (see other answers for a good explanation), you should not check for uniqueness of hashes.

When writing your own method, GetHashCode()

it is recommended to check the uniform distribution of random input data, and not just for uniqueness. Just make sure you are using enough random input to get a good test.

The MSDN spec on GetHashCode states the following:

For best performance, the hash function should generate a random distribution for all inputs.

This is all relative, of course. The method GetHashCode()

that is used to send 100 objects to the dictionary should not be nearly as random as the GetHashCode()

one that puts 10,000,000 objects in the dictionary.

+1


source







All Articles