Java hashcode () string conflict

Question

Java hashcode () string conflict

I don't know much about hash codes. I found this code that prints collisions.

Can you please tell me what collisions are and how to reduce them? Why should we use hash codes?

public static int getHash(String str, int limit)
{
    int hashCode = Math.abs(str.hashCode()%(limit));
    return hashCode;
}

/**
 * @param args
 */
public static void main(String[] args)
{
    int hashLimit = 10000;
    int stringsLimit = 10000;
    String[] arr = new String[hashLimit];
    List<String> test = new ArrayList<String>();
    Random r = new Random(2);
    for ( int i = 0 ; i < stringsLimit ; i++ )
    {
        StringBuffer buf = new StringBuffer("");
        for ( int j = 0 ; j < 10 ; j++ )
        {
            char c = (char)(35+60*r.nextDouble());
            buf.append(c);
        }
        test.add(buf.toString());
        //System.out.println(buf.toString());
    }
    int collisions = 0;
    for ( String curStr : test )
    {
        int hashCode = getHash(curStr,hashLimit);
        if ( arr[hashCode] != null && !arr[hashCode].equals(curStr) )
        {
            System.out.println("collision of ["+arr[hashCode]+"] ("+arr[hashCode].hashCode()+" = "+hashCode+") with ["+curStr+"] ("+curStr.hashCode()+" = "+hashCode+")");
            collisions++;
        }
        else
        {
            arr[hashCode] = curStr;
        }
    }
    System.out.println("Collisions: "+collisions);
}

+3

java hashcode collision

crtn Mar 30 12 at 16:11

source to share

3 answers

To answer the other part of your question: To reduce the chances of collisions, you should implement a hashing algorithm that ensures the hash codes are evenly distributed across the many possible inputs.

For example, suppose you have implemented a naive method hashCode()

for hashing MyString

instances:

public class MyString {
  private final char[] arr;

  // Constructor and other methods.

  public int hashCode() {
    return arr.length == 0 ? 0 : (int) arr[0];
  }
}

This example uses only the first character to generate the hash code . Therefore, if you want to hash strings: "apple", "anaconda", "anecdote", they will all have the same hash value. A more efficient hashcode would check all letters in the character array to determine the value of the hashcode, which will hopefully reduce the chance of collisions.

+2

Adamski Mar 30 12 at 16:19

source to share

We have a "collision" if two ~~different~~ non-equal objects have the same hashcode. This can be a problem, for example, when trying to use both objects as keys in a Hashmap.

0

Andreas_D Mar 30 12 at 16:13

source to share

Jon Skeet · Accepted Answer · 2012-03-30T16:13:58+0000

Could you please tell me what collision is and how to reduce it?

Collisions are when two unequal objects have the same hash code. They are a fact of life - you have to deal with it.

Why should we use hash codes?

Because they speed up keyword searches. A hash table can use a hash code to very quickly get a set of possible key matches down to a very small set (often only one), after which you need to check for actual key equality.

You shouldn't assume that two hash codes are equal, which means that the objects from which they were derived are equal. The opposite is true: assuming the correct implementation, if two objects produce different hash codes, then they are not equal.

Java hashcode () string conflict

More articles: