C # String.getHashCode () returns the same value for different strings

My application is running as a Windows service and I am linking VS2013 to this process for debugging. I am getting a hash code for the content of image files to check for differences with the following method (in a static class):

static class FileUtils
{
    public static int GetFileHash(string filePath)
    {
        int hash = 0;
        Logger.WriteLog(ToolTipIcon.Info, "Calculating hash code for {0}", filePath);
        StreamReader sr = new StreamReader(filePath, Encoding.Unicode);
        hash = sr.ReadToEnd().GetHashCode();
        sr.Close();
        return hash;
    }
}

      

Which works great in production. However, this method will always return 2074746262 for two different images. I tried to reproduce this in a winforms app with the same code and images and I cannot. Is there something with process debugging in VS2013 that will cause this behavior? I replaced one of the images with a completely different image, but it happens anyway.

+3


source to share


4 answers


First of all, you need to know what you are using GetHashCode

incorrectly for two reasons:

  • Hash codes are not unique, they are very well distributed. There are a finite number of hash codes and an infinite number of binary strings, so it is physically impossible to create a unique hash code for each string.

  • The details of the hash code algorithm are not explicitly documented and will change for reasons that don't suit you. In particular, this is not the first time I've seen it reported that it string.GetHashCode()

    changes behavior when running under the debugger:

string.GetHashCode () returns different values ​​in debug vs release, how to avoid it?




Having said that, it seems a little unusual that three different binary strings will haveh differently in the same runtime only depending on the presence of a debugger. Aside from generally not trusting GetHashCode

like you, my next guess is that you are not hashing what you think you are hashing. I would flush the binary data myself to disk before hashing it and confirm that you do have different binary strings.

+8


source


The documentation explicitly calls for this . Don't rely on String.GetHashCode

to be unique. Your guess is wrong.



If two string objects are equal, the GetHashCode method returns identical values. However, there is no unique hash code value for each unique string value. Different strings can return the same hash code .

+2


source


Instead GetHashCode

, that definitely won't be unique across all images. Use MD5 or similar from this link:

https://msdn.microsoft.com/en-us/library/s02tk69a%28v=vs.110%29.aspx

+2


source


Using it GetHasCode

to check for uniqueness will never work, there is no guarantee that every other object will give a different hashcode.

+1


source







All Articles