Efficiently detecting the presence or absence of two collections in Java

I know that in Java I can manually determine if two collections have overlap by making one of them into a set and then iterating over the other contains checks:

<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
    for (T item : collection1)
        if (collection2.contains(item))
            return true;
    return false;
}

      

or alternatively:

<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
    return collection1.stream().anyMatch(collection2::contains);
}

      

But are there any existing utility methods that do this and intelligently choose which collection to iterate over to turn into a set, take advantage of what is already there, etc.? I know Guava has it Sets.intersection

, but it calculates all intersections, not just whether it is empty.

Note that I would prefer that the comparison be short-circuited as soon as any common element is found. Checking for overlap between two huge collections should take time proportional to the number of non-overlapping items (or better) instead of the total number of items.

+3


source to share


1 answer


Partial answer when collections are already installed.

Sets.intersection

really closer to what I wanted than I thought, because its result was not precomputed. Instead, it is a kind of intersection calculated on the fly.

Have a look at the anonymous class returnedintersection

:

final Predicate<Object> inSet2 = Predicates.in(set2);
return new SetView<E>() {
  @Override public Iterator<E> iterator() {
    return Iterators.filter(set1.iterator(), inSet2);
  }
  @Override public int size() {
    return Iterators.size(iterator());
  }
  @Override public boolean isEmpty() {
    return !iterator().hasNext();
  }
  @Override public boolean contains(Object object) {
    return set1.contains(object) && set2.contains(object);
  }
  @Override public boolean containsAll(Collection<?> collection) {
    return set1.containsAll(collection)
        && set2.containsAll(collection);
  }
};

      



The method isEmpty

does not process every element. Instead, it iterates over the first set, checking if the elements are in the second set. Once it finds one, it returns true. If you're out of luck, you'll be iterating over all the elements in set1 that aren't in set2 at first, but that's probably inevitable and better than always iterating over all the elements.

In other words, if you already have kits, the efficient short circuit solution is true:

boolean overlaps = !Sets.intersections(set1, set2).isEmpty();

      

This will not iterate over a smaller set instead of a larger set, or deal with unsettled collections, but it is often useful.

+1


source







All Articles