Comparing strings using hashcode () and length ()

In java (or any other PL that uses the same hashcode function) can it be said that:

Two strings of equal length have the same hashcode if and only if they are equal

let's say the hashcode function will be

s [0] * 31 ^ (n-1) + s [1] * 31 ^ (n-2) + ... + s [n-1]

example of use:

for comparing two massive sets of strings, it is theoretically faster to use option 1

, not option 2

because the hashcode computation will be done once when the String will cache its value:

OPTION 1:

 for(String s1 : collection1){
   for(String s2 : collection2){
     if((s1.hashCode() == s2.hashCode()) && (s1.length()==s2.length()){
        System.out.println("matched");
     }
   }
 }

      

OPTION 2

 for(String s1 : collection1){
   for(String s2 : collection2){
     if(s1.equals(s2)){
        System.out.println("matched");
     }
   }
 }

      

UPDATE:

after @tobias_k's comment I realized this is not correct, so I change the question.

is there a maximum length M for a string that for any two strings equal, their hashcode length will be the same if and only if they are equal

+3


source to share


4 answers


Two strings of equal length have the same hashcode if and only if they are equal

Of course not. Take 100 strings. There are many more 100 strings than different digits int

, so there must be a lot of collisions.

is there a maximum length M for a string such that for any two strings of equal length their hashcode will be the same if and only if they are equal

If there is such a length M, then it is at most 1 (and therefore not very useful), as shown by examples of hash code collisions even for strings of length 2 in Eren's and KDP .




To speed up comparison, you can compare the hash code first and then compare with equals

only if the hash code is the same.

for(String s1 : collection1){
    for(String s2 : collection2){
        if (s1.hashCode() == s2.hashCode() && s1.equals(s2)) {
            System.out.println("matched");
        }
    }
}

      

(Note: I haven't profiled if this is actually faster than just using it equals

.)

You can also put all rows from collection1

to Set

and then check if the rows from are collection2

in this set. This will basically do the same: compare the hashcode first, and then use equals

if it finds records with the same hash.

Set<String> setFromCollection1 = new HashSet<>(collection1);
for (String s : collection2) {
    if (setFromCollection1.contains(s)) {
        System.out.println("matched");
    }
}

      

+4


source


No, it’s wrong.

For example:

System.out.println ("ab " + "ab".hashCode());
System.out.println ("bC " + "bC".hashCode());

      



Outputs:

ab 3105
bC 3105

      

Equal hashCode does not mean equal strings, even for strings of the same length.

+4


source


Two strings of the same length have the same hash code if and only if they are equal - NOT NECESSARY.

check that "FB" and "Ea" are the same length and have the same hashcode, but they are not equal.

    String s = new String("FB");
    String s1 = new String("Ea");
    System.out.println(s.hashCode()); //2236
    System.out.println(s1.hashCode()); //2236
    System.out.println(s.hashCode()==s1.hashCode()); //true
    System.out.println(s.equals(s1)); //false

      

+1


source


If you are looking for speed and you will only match once then below option is best and it is used to implement map in java as well

if (value1.hashCode() == value2.hashCode() && value1.equals(value2)) {
            System.out.println("matched!");
        }

      

but if you want to make multiple matches then you should look for the best algorithm for matching because Java implementation is nave http://www.javacodegeeks.com/2010/09/string-performance-exact-string.html post has good a summary of the string matching algorithm performance.

+1


source







All Articles