C ++: Differences Between Two Arrays
I have two unsorted random access arrays of one simple element type (int / string / etc, so there are all comparison operators, can be hashed, etc.). There should be no duplicate elements in any array.
We are looking for a common algorithm, which, given the given arrays A and B, will tell me:
- What items are in and B
- What elements are in A but not B
- What elements are in B but not A
I think I could do this with set operators like below, but is there a faster solution (such as one that doesn't require me to create two sorted sets)?
r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);
source to share
First, it is not clear from your question whether you mean
std::set
when you talk about sorted sets. If so, then the first reaction should be to use std::vector
, if possible, on the original vectors. Just sort them and then:
std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );
And the same for r2
and r3
.
Other than that, I doubt there is much you can do. Just one loop can improve some things:
std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
if ( *ita < *itb ) {
onlyA.push_back( *ita );
++ ita;
} else if ( *itb < *ita ) {
onlyB.push_back( *itb );
++ itb;
} else {
both.push_back( *ita );
++ ita;
++ itb;
}
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );
reserve
can make a difference, and if most of the elements end up in the same vector it probably won't cost the extra memory expensive.
source to share
Something like the following algorithm would run O (| A | + | B |) (assuming O (1) out behavior unordered_map
):
- Let the list
onlyA
initially contain all A, and the listsonlyB
andbothAB
start out as empty. - Let the hash table
Amap
concatenate the elements inonlyA
with their corresponding iterator inonlyA
. - For each element b in
B
- If b finds a matching iterator ai in Amap
- Add b to
bothAB
- Remove b from
onlyA
with ai
- Add b to
- Otherwise add b to
onlyB
- If b finds a matching iterator ai in Amap
At the end of the above algorithm
- onlyA contains elements in A but not in B,
- onlyB contains elements in B but not in A,
- both AB contain elements in both A and B.
Below is the implementation of the above. The result is returned as a tuple < onlyA
, onlyB
, bothAB
>.
template <typename C>
auto venn_ify (const C &A, const C &B) ->
std::tuple<
std::list<typename C::value_type>,
std::list<typename C::value_type>,
std::list<typename C::value_type>
>
{
typedef typename C::value_type T;
typedef std::list<T> LIST;
LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
for (auto b : B) {
auto ai = Amap.find(b);
if (ai == Amap.end()) onlyB.push_back(b);
else {
bothAB.push_back(b);
onlyA.erase(ai->second);
}
}
return std::make_tuple(onlyA, onlyB, bothAB);
}
source to share