ConcurrentHashMap.keySet () problem. RemoveAll ()
I am reading about ConcurrentHashMap
and checking the removeAll () implementation of my keyset. In the current implementation, JAVA iterates over the entire keyset data structure, even if the collection contains only one or no elements.
Actual implementation
public final boolean removeAll(Collection<?> c) {
if (c == null) throw new NullPointerException();
boolean modified = false;
for (Iterator<E> it = iterator(); it.hasNext();) {
if (c.contains(it.next())) {
it.remove();
modified = true;
}
}
return modified;
}
Can someone tell me if this is intended by the JAVA developers or I am just thinking about it
source to share
Posting the expected implementation , but to be very frank, I am not the actual author of the code below, found this on a coding blog while reading about parallel hashmap so would like to share with everyone else.
public boolean removeAll(Collection<?> c) {
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
source to share
KeySet
in regular Map
(eg HashMap
) implements AbstractSet
. For OpenJDK 8, the source code shows this as a method removeAll
:
public boolean removeAll(Collection<?> c) {
Objects.requireNonNull(c);
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
As you can see, there is a check that has the most entries in the collection. If itself Set
is larger, then iterate over the given collection argument. Otherwise, it is done by elements itself Set
. Therefore, if there are no entries in the collection you are traversing, the actual loop is executed zero. The same if he himself Set
has no records.
However, for a ConcurrentHashMap
method it keySet()
returns an instance of an inner class KeySetView
that implements another inner class CollectionView
. The implementation removeAll
of what gets done is in line with the code you posted, which always iterates over the elements KeySetView
, not the given collections.
The reason for this is the likelihood that what Iterator
is returned by the views (keyset or recordset) allows concurrent access, reflecting the values that are present at the time of the iterator query. From the Javadoc ConcurrentHashMap
:
Likewise, iterators and enumerations return items that reflect the state of the hash table at some point in time or after the iterator / enumeration was created. They don't throw ConcurrentModificationException. However, iterators are only meant to be used one thread at a time.
Thus, the method forces the use of the key view Iterator
itself to take care of consistency between concurrent activities on the view or map.
Note, however, that the above implementation for is AbstractSet
also not optimal. If the collection c
you supply as an argument is larger than the set, it c.contains(element)
is called for each element
in the set, but depending on the type of the collection, the method contains
may not be as efficient, For example, for ArrayList
contains
is executed in linear time, whereas having an object in the set will be detected at a constant time.
source to share
Collection versus set
The implementation in question belongs to the class AbstractCollection
, and the implementation in the solution is from AbstractSet
, which has the form AbstractCollection
.
The performance gain is due to the fact that Collection
you cannot guarantee the uniqueness of the elements, so the call is size()
not enough to optimize the deletion. In Set
, however, uniqueness is guaranteed, so additional assumptions and performance improvements can be made, such as iterating with a smaller set when determining which elements to remove.
Highly specialized card and its sets
The story, however, is completely different in the sets and values ConcurrentHashMap
. This is because it ConcurrentHashMap
is highly specialized Map
with many assumptions and improvements for concurrency. This makes any general performance improvements such as in Set.removeAll()
, invalidated anymore (in light of the actual implementation).
ConcurrentHashMap
The iterator is loosely consistent and does a lot of magic behind the scenes. Possibly removeAll
on a keychain is the price paid for all other performance metrics (in terms of a concurrent access point).
Just follow iterator().remove()
on CocnurrentHashMap
down the rabbit hole ConcurrentHashMap.replaceNode()
to see how much logic lingers in the
replaceNode () source code to place items from the iterator.
source to share