Establish that only needs are equal
I'm curious if there is one Set
that only needs to .equals()
be defined for uniqueness?
When browsing the classes Set
from, java.util
I can find only the HashSet
one that needs .hashCode()
and TreeSet
(or generally SortedSet
) for which it is required Comparator
. I cannot find a class that only uses .equals()
.
Does it make sense that if I have a method .equals()
, it's enough to use it to determine the uniqueness of the object? So has an implementation Set
that should only use .equals()
? Or am I missing something here that is .equals()
not enough to determine the uniqueness of an object in an implementation Set
?
Note that I am aware of Java practice that if we override .equals()
, we must override .hashCode()
and also keep the contract defined in Object
.
source to share
The method itself equals
is sufficient for the correct implementation of the set, but not for its effective implementation.
The idea behind a hash code or comparator is that they provide ways to order objects in some ordered structure (hash table or tree) that allows you to quickly find objects. If you only have a method equals
for comparing pairs of objects, you cannot arrange the objects in any meaningful or clever order; you only have a jumbled collection of objects.
For example, when using only a method, equals
to ensure that the objects in a set are unique, each added object must be compared to any other object in disarray.
n * (n - 1) / 2
Comparisons are required to add n objects . For 5 objects, that's 10 comparisons, which is fine, but for 1000 objects, that's 499,500 comparisons. It scales horribly.
Since this will not provide scalable performance, there is no such set implementation in the standard library.
If you don't care about hashtable performance, this is the minimal implementation of a method hashCode
that works for any class:
@Override
public int hashCode() {
return 0; // or any other constant
}
While it is required that equal objects have the same hash codes, it is never required for unequal objects to have unequal hash codes for correctness, so returning a constant is legal. If you put these objects in HashSet
or use them as keys HashMap
, they end up in a mess in one bucket of hash tables. The performance will be poor, but it will work correctly.
Also, for whatever the cost, the minimal working implementation Set
that ever only uses the method equals
:
public class ArraySet<E> extends AbstractSet<E> {
private final ArrayList<E> list = new ArrayList<>();
@Override
public boolean add(E e) {
if (!list.contains(e)) {
list.add(e);
return true;
}
return false;
}
@Override
public Iterator<E> iterator() {
return list.iterator();
}
@Override
public int size() {
return list.size();
}
}
The collection stores objects in ArrayList
and uses them list.contains
to call equals
for objects. Inherited methods from AbstractSet
and AbstractCollection
provide the bulk of the interface's functionality Set
; for example, its method remove
is implemented using the list iterator method remove
. Every operation to add or remove an object, or check the membership of an object, performs a comparison against every object in the set, so it scales horribly, but works correctly.
This is useful? Maybe in some special cases. For sets that are known to be very small, performance can be good, and if you have millions of these sets it can save memory compared to HashSet
.
In general, though, it's better to write meaningful hash code methods and comparators so that you can have sets and maps that scale efficiently.
source to share
You should always override hashCode()
when overriding equals()
. The contract Object
clearly states that two identical objects have the same hash codes, and a surprising number of data structures and algorithms depend on this behavior. Not hard to add hashCode()
, and if you skip it now, you will end up with hard diagnostic errors when your objects start hitting hash structures.
source to share
It might mathematically make sense to have a set that requires nothing but .equals()
.
But such an implementation would be so slow (linear time for each operation) that it was decided that you could always give a hint.
Anyway, if you really can't write hashCode()
, just make it always return 0 and you have a structure that is as slow as you were hoping for!
source to share