How and when Rehashing is executed in a HashMap
I have a confusion about Hashing and Rehashing. below is my understanding, please correct me if i am wrong.
as per the picture, bucket is actually an array of the Entry class that stores the element as a linked list. each new key-value pair whose key has the same Entry Array bucket hash will be stored as an Entry object in the bucket that stores those hash elements. if the key has a new hash that is not currently present in the forge of the Entry Array, then a new bucket will be added with the appropriate hash.
now the question is when does the actual rename take place?
case 1: lets assume I have an Entry array having a hashcode of 1,2 3 with a new item added at the time the Entry Array Bucket reached 12 its OK, but as soon as a new item appears whose hashcode is 13 (assuming I add an element to the row hashcode 1, then 2, etc.), a new Map / Array Entry Bucket will be created (please specify which one) and the new capacity will be 32, now the Entry array can contain various 32 buckets.
case 2: the amount of dispenser in the bucket, the default capacity for HashMap 16 means it can store 16 items in it, the dispenser in one bucket or anyway. due to a load factor of .75, as soon as the 13th element is added, a new bucket array with 32 will be created in it, that is, now the total node input in all link lists can be 32.
I think case 2 is correct. please customize the Re Hashing process, it is better if this diagram will be used.
source to share
Repeat play increases the number of buckets available depending on the number of recordings currently stored in HashMap
.
This occurs when the implementation HashMap
determines the number of buckets should be increased to maintain the expected performance O(1)
and insert performance.
You are correct about the default load factor of .75 and how this will cause replay HashMap
when adding a 13th entry.
However, it is not true that default capacity of HashMap 16 means it can store 16 element in it
. Any bucket can store multiple entries. However, to maintain the desired performance, the average number of writes per bucket should be small. This is the reason why we have a load factor and the reason why we should use the correct one hashCode()
, which distributes the keys as evenly as possible over the buckets.
source to share
Replay is the process of recalculating the hash of already saved entries (key-value pairs) to move them to a different, larger hashmap size when the load factor threshold is reached.
When the number of elements on the map crosses the limit of the load factor at this time, the hashmap doubles its capacity, and the hash code is recalculated by the already stored elements to evenly distribute the key-value pairs in the new codes.
Why is rewriting required?
After doubling the capacity, what to do with the key-value pairs already present in the buckets?
If we keep the existing key-value pairs as they are, then doubling the capacity may not help, because the O (1) complexity will only be achieved if the elements are evenly distributed across all buckets.
So, for every existing hashcode key-value pair, it is computed again with the increased capacity of the hashmap as a parameter, which causes it to put the item in the same bucket or in a different bucket.
A re-redirection is performed to distribute the entries on a new length hash scheme such that the overlap time and time position complexity remains O (1).
Note:
Hashmap maintains O (1) complexity when inserting data and retrieving data from a hashmap, but for the 13th key-value pair, the post request will no longer be O (1), since once the map realizes that the 13th item has entered, that is, 75% of the card is full.
It will double the capacity (array) first and then go to Rehash. Rehashing requires recalculating the hashcode of the already placed 12 key-value pairs again and placing them in a new index, which takes time.
you should read this it might help
http://javabypatel.blogspot.co.il/2015/10/what-is-load-factor-and-rehashing-in-hashmap.html
source to share