How does a single instance in each bucket give the best performance in java Hashmap?

I read in a book that if a hash function returns a unique hash value for each individual object, its most efficient. If the hashcode () method on a class gives a unique hash value for each distinct object, and I want to store n different instances of that class in a Hashmap, then there will be n buckets to hold n instances. The time complexity will be O (n). Then, how does a single entry (instance) for each hash value give the best performance? Is it related to the bucket data structure?

+3


source to share


2 answers


You seem to think that the n buckets for n elements

complexity of the time will be O(n)

, which is not true.

How about another example, suppose you have ArrayList

with n elements, how long will it take to complete get(index)

, for example? O(1)

right?

Now think about HashMap

, this index in the example ArrayList

is actually hashCode

for a map. When we insert into a HashMap to find the location of that item (bucket), we use hashcode (index). If there is a bucket entry, the search time for the value from the map is O(1)

.

But even if there are multiple values ​​in the same bucket, the overall search complexity for the HashMap is still O(1)

.

The data structure of the bucket is also important. For example, for worst case scenarios. In the current implementation, HashMap

it uses two types: LinkedNode

and TreeNode

; depending on several things, such as the amount in the bucket at that point in time. Linked easily:



next.next.next...

      

TreeNode

    - left
node   
    - right

      

This is a tree red-black

. In such a data structure, the search complexity O(logn)

is much better than O(n)

.

+2


source


Java HashMap binds key k to value v. Every Java object has a hashCode () method that creates an integer that is not necessarily unique.

I read in a book that if a hash function returns a unique hash value for each individual object, its most efficient.

Another definition would be that the best Hash function is the one that produces the least collisions.

If the hashcode () method on a class gives a unique hash value for each distinct object, and I want to store n different instances of that class in a Hashmap, then there will be n buckets to hold n instances. The time complexity will be O (n).



In our case, the HashMap contains a table of buckets of a certain size, say> = n for our purposes. It uses the hash code of the object as a key and returns the index to the table through the hash function. If we have n objects and the Hash function returns n different indices, we have zero collisions. This is the optimal case, and the complexity of finding and retrieving any object is O (1).

Now, if the Hash function returns the same index for two different keys (objects), then we have a collision, and the table bucket at this index already contains a value. In this case, the table bucket will point to another newly allocated bucket. In this order, a list is created by the index in which the collision occurred. So the worst case complexity will be O (m), where m is the size of the largest list.

In conclusion, the performance of a HashMap depends on the number of collisions. Less is better.

I believe this video will help you.

+2


source







All Articles