Is the int value for String.hashCode () unique?

I ran into the problem days ago. Now I have tens of millions of words, string type. now I decide to store them in the database and use an index to keep them unique. And I don't want to compare the original words to make them unique. I would like to make sure that the hashCode () method for a string can be unique, won't it change if you are using a different laptop or a different time or something?

+3


source to share


3 answers


Following is the computation of the hashCode of the string that the JVM executes. As said, it is purely computed based on the individual character and its position in the String, and there is nothing that depends on the JVM or the type of machine that the JVM is running that will change the hash code.

This is also one of the reasons why the String class is declared final (not extensible, resulting in immutability) so that no one changes its behavior.

Below is the description: -

public int hashCode()

      



Returns the hash code for this string. The hash code for the String object is calculated as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

      

using int arithmetic, where s[i]

is the i-th character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

+5


source


Unique, no. By their nature, hash values ​​are not guaranteed to be unique.

Any system with an arbitrarily large number of possible inputs and a limited number of outputs will have collisions.

Thus, you will not be able to use a unique database key to store them if it is based only on a hash code. However, you can use a non-unique key to store them.

In answer to your second question about whether different versions of Java will generate different hash codes for the same string, no.



If the Java implementation follows the Oracle documentation (otherwise it is not a Java implementation), it will be consistent across all implementations. The Oracle docsString.hashCode

specify a fixed formula for calculating the hash:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

      

You might want to check that this is still the case if you are using wildly scattered versions of Java (like 1.2 vs 8), but it has been for so long since at least 1.5.

+8


source


Not,

Since a string in java can have a maximum of 2,147,473,647 (2 ^ 31 - 1) characters and all characters will change, so it will produce a very large number of combinations, but an integer only has a range of -2,147,483,648 to 2,147,483,648. So this is not possible and with this method the hash code of the string is calculated

s [0] * 31 ^ (n-1) + s [1] * 31 ^ (n-2) + ... + s [n-1].

Example:

If you create two string variables as "FB" and "Ea" then the hash code will be the same.

+1


source







All Articles