Comparing strings using hashcode () and length ()
In java (or any other PL that uses the same hashcode function) can it be said that:
Two strings of equal length have the same hashcode if and only if they are equal
let's say the hashcode function will be
s [0] * 31 ^ (n-1) + s [1] * 31 ^ (n-2) + ... + s [n-1]
example of use:
for comparing two massive sets of strings, it is theoretically faster to use option 1
, not option 2
because the hashcode computation will be done once when the String will cache its value:
OPTION 1:
for(String s1 : collection1){
for(String s2 : collection2){
if((s1.hashCode() == s2.hashCode()) && (s1.length()==s2.length()){
System.out.println("matched");
}
}
}
OPTION 2
for(String s1 : collection1){
for(String s2 : collection2){
if(s1.equals(s2)){
System.out.println("matched");
}
}
}
UPDATE:
after @tobias_k's comment I realized this is not correct, so I change the question.
is there a maximum length M for a string that for any two strings equal, their hashcode length will be the same if and only if they are equal
source to share
Two strings of equal length have the same hashcode if and only if they are equal
Of course not. Take 100 strings. There are many more 100 strings than different digits int
, so there must be a lot of collisions.
is there a maximum length M for a string such that for any two strings of equal length their hashcode will be the same if and only if they are equal
If there is such a length M, then it is at most 1 (and therefore not very useful), as shown by examples of hash code collisions even for strings of length 2 in Eren's and KDP .
To speed up comparison, you can compare the hash code first and then compare with equals
only if the hash code is the same.
for(String s1 : collection1){
for(String s2 : collection2){
if (s1.hashCode() == s2.hashCode() && s1.equals(s2)) {
System.out.println("matched");
}
}
}
(Note: I haven't profiled if this is actually faster than just using it equals
.)
You can also put all rows from collection1
to Set
and then check if the rows from are collection2
in this set. This will basically do the same: compare the hashcode first, and then use equals
if it finds records with the same hash.
Set<String> setFromCollection1 = new HashSet<>(collection1);
for (String s : collection2) {
if (setFromCollection1.contains(s)) {
System.out.println("matched");
}
}
source to share
Two strings of the same length have the same hash code if and only if they are equal - NOT NECESSARY.
check that "FB" and "Ea" are the same length and have the same hashcode, but they are not equal.
String s = new String("FB");
String s1 = new String("Ea");
System.out.println(s.hashCode()); //2236
System.out.println(s1.hashCode()); //2236
System.out.println(s.hashCode()==s1.hashCode()); //true
System.out.println(s.equals(s1)); //false
source to share
If you are looking for speed and you will only match once then below option is best and it is used to implement map in java as well
if (value1.hashCode() == value2.hashCode() && value1.equals(value2)) {
System.out.println("matched!");
}
but if you want to make multiple matches then you should look for the best algorithm for matching because Java implementation is nave http://www.javacodegeeks.com/2010/09/string-performance-exact-string.html post has good a summary of the string matching algorithm performance.
source to share