Establish that only needs are equal

I'm curious if there is one Set

that only needs to .equals()

be defined for uniqueness?

When browsing the classes Set

from, java.util

I can find only the HashSet

one that needs .hashCode()

and TreeSet

(or generally SortedSet

) for which it is required Comparator

. I cannot find a class that only uses .equals()

.

Does it make sense that if I have a method .equals()

, it's enough to use it to determine the uniqueness of the object? So has an implementation Set

that should only use .equals()

? Or am I missing something here that is .equals()

not enough to determine the uniqueness of an object in an implementation Set

?

Note that I am aware of Java practice that if we override .equals()

, we must override .hashCode()

and also keep the contract defined in Object

.

+3


source to share


3 answers


The method itself equals

is sufficient for the correct implementation of the set, but not for its effective implementation.

The idea behind a hash code or comparator is that they provide ways to order objects in some ordered structure (hash table or tree) that allows you to quickly find objects. If you only have a method equals

for comparing pairs of objects, you cannot arrange the objects in any meaningful or clever order; you only have a jumbled collection of objects.

For example, when using only a method, equals

to ensure that the objects in a set are unique, each added object must be compared to any other object in disarray.
n * (n - 1) / 2

Comparisons are required to add n objects . For 5 objects, that's 10 comparisons, which is fine, but for 1000 objects, that's 499,500 comparisons. It scales horribly.

Since this will not provide scalable performance, there is no such set implementation in the standard library.


If you don't care about hashtable performance, this is the minimal implementation of a method hashCode

that works for any class:

@Override
public int hashCode() {
    return 0; // or any other constant
}

      



While it is required that equal objects have the same hash codes, it is never required for unequal objects to have unequal hash codes for correctness, so returning a constant is legal. If you put these objects in HashSet

or use them as keys HashMap

, they end up in a mess in one bucket of hash tables. The performance will be poor, but it will work correctly.


Also, for whatever the cost, the minimal working implementation Set

that ever only uses the method equals

:

public class ArraySet<E> extends AbstractSet<E> {
    private final ArrayList<E> list = new ArrayList<>();

    @Override
    public boolean add(E e) {
        if (!list.contains(e)) {
            list.add(e);
            return true;
        }
        return false;
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public int size() {
        return list.size();
    }
}

      

The collection stores objects in ArrayList

and uses them list.contains

to call equals

for objects. Inherited methods from AbstractSet

and AbstractCollection

provide the bulk of the interface's functionality Set

; for example, its method remove

is implemented using the list iterator method remove

. Every operation to add or remove an object, or check the membership of an object, performs a comparison against every object in the set, so it scales horribly, but works correctly.

This is useful? Maybe in some special cases. For sets that are known to be very small, performance can be good, and if you have millions of these sets it can save memory compared to HashSet

.

In general, though, it's better to write meaningful hash code methods and comparators so that you can have sets and maps that scale efficiently.

+5


source


You should always override hashCode()

when overriding equals()

. The contract Object

clearly states that two identical objects have the same hash codes, and a surprising number of data structures and algorithms depend on this behavior. Not hard to add hashCode()

, and if you skip it now, you will end up with hard diagnostic errors when your objects start hitting hash structures.



+3


source


It might mathematically make sense to have a set that requires nothing but .equals()

.

But such an implementation would be so slow (linear time for each operation) that it was decided that you could always give a hint.

Anyway, if you really can't write hashCode()

, just make it always return 0 and you have a structure that is as slow as you were hoping for!

+2


source







All Articles