Why does HashMap resize in case of collision or worst case

I only ask this question regarding java version prior to 1.7. I am using reflection to find out the current capacity of a HashMap. In the program below, I am putting 12 unique people into one HashMap bucket (using the same hash code). Then I put the 13th unique person in the same or a different bucket (using the same or different hash codes). In both cases, after adding this 13th element, the HashMap resizes to 32 buckets. I understand that due to a load factor of .75 and an initial capacity of 16, the HashMap is reshaping it in half with a 13th element. But there are still empty baskets and only 2 baskets are used for this 13th item.

My questions:

1) I understand correctly. I'm not wrong. Is this the expected behavior of HashMap.

2) If all this is correct, then even if there are 12 or 11 free buckets, why is it necessary to double the HashMap with the 13th element in this case. Not unnecessary overhead or costly HashMap resizing. In this case, you need to double the HashMap. Whereas the 13th can be placed in any available bucket according to the hashcode.

public class HashMapTest {
    public static void main(String[] args) throws NoSuchFieldException,
            SecurityException, IllegalArgumentException, IllegalAccessException {
        HashMap<Person, String> hm = new HashMap<Person, String>();
        for (int i = 1; i <= 12; i++) {
            // 12 Entry in same bucket(linkedlist)
            hm.put(new Person(), "1");
        }
        System.out.println("Number of Buckets in HashMap : "+bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
        System.out.println("**********************************");
        // 13th element in different bucket
        hm.put(new Person(2), "2");
        System.out.println("Number of Buckets in HashMap : "+bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
    }
    public static int bucketCount(HashMap<Person, String> h)
            throws NoSuchFieldException, SecurityException,
            IllegalArgumentException, IllegalAccessException {
        Field tableField = HashMap.class.getDeclaredField("table");
        tableField.setAccessible(true);
        Object[] table = (Object[]) tableField.get(h);
        return table == null ? 0 : table.length;
    }
}

class Person {
    int age = 0;
    Person() {
    }
    Person(int a) {
        age = a;
    }
    @Override
    public boolean equals(Object obj) {
        return false;
    }
    @Override
    public int hashCode() {
        if (age != 0) {
            return 1;
        } else {
            return age;
        }
    }
}

      

OUTPUT

Number of Buckets in HashMap : 16
Number of Entry in HashMap :  12
**********************************
Number of Buckets in HashMap : 32
Number of Entry in HashMap :  13

      

+3


source to share


3 answers


  • yes, and that's the expected behavior.
  • HashMap doesn't care how many buckets are in use. He only knows that the load factor has been reached, and the likelihood that collisions will thus become too great, and therefore the map must be changed. Even though there have been a lot of collisions already, resizing the map can actually fix it. Not in your case, since you chose the identical hashcode on purpose, but in a more realistic case, the hashcodes should have much better distribution. A HashMap cannot do anything to make itself efficient if you have chosen the wrong hashcodes with the target, and there is no point in adding complexity to handle the edge case, which should never happen and that HashMap cannot fix anyway.


+4


source


Yes, the behavior you are observing is expected behavior.

The implementation HashMap

assumes the use of sensible hashCode

keys. Yours is hashCode

supposed to distribute the keys as evenly as possible among the available buckets. If you don't (as in your example where all keys are the same hashCode

), you will get poor performance.



Assuming even distribution, it makes sense to double the size HashMap

after you pass the load factor. It doesn't check how many buckets are actually empty (since it doesn't know if new entries will be assigned to empty buckets or busy buckets). It just checks the average number of records per bucket. After this number exceeds the load factor, the number of buckets is doubled.

+2


source


There is also one small aspect here; when you change the size of the internal array (from 16 to 32), you also "touch" all records. let me explain:

when there are 16 buckets (the internal array has a size of 16), it only last 4 bits

decides where this entry will go; think %

, but inside it is actually (n - 1) & hash

, where n

is the number of buckets.

As the internal array grows, one more bit is taken into account to decide where the entry is: there it was 4 bits

, now it is 5 bits

; this means that all records are re-hashed, and they can now move to different buckets; therefore resizing occurs to overclock the recordings.

If you really want to fill in all the "gaps" you use load_factor

of 1

; instead of the default 0.75

; but this has implications as described in the constructors of the HashMap.

0


source







All Articles