C ++: Differences Between Two Arrays

Question

C ++: Differences Between Two Arrays

I have two unsorted random access arrays of one simple element type (int / string / etc, so there are all comparison operators, can be hashed, etc.). There should be no duplicate elements in any array.

We are looking for a common algorithm, which, given the given arrays A and B, will tell me:

What items are in and B
What elements are in A but not B
What elements are in B but not A

I think I could do this with set operators like below, but is there a faster solution (such as one that doesn't require me to create two sorted sets)?

r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);

+3

c ++ set c ++ 11

Fire lancer 21 Aug '14 at 15:20

source to share

3 answers

Something like the following algorithm would run O (| A | + | B |) (assuming O (1) out behavior unordered_map

):

Let the list onlyA

initially contain all A, and the lists onlyB

and bothAB

start out as empty.
Let the hash table Amap

concatenate the elements in onlyA

with their corresponding iterator in onlyA

.
For each element b in B
- If b finds a matching iterator ai in Amap
  - Add b to bothAB
  - Remove b from onlyA
    
    with ai
- Otherwise add b to onlyB

At the end of the above algorithm

onlyA contains elements in A but not in B,
onlyB contains elements in B but not in A,
both AB contain elements in both A and B.

Below is the implementation of the above. The result is returned as a tuple < onlyA

, onlyB

, bothAB

>.

template <typename C>
auto venn_ify (const C &A, const C &B) ->
    std::tuple<
        std::list<typename C::value_type>,
        std::list<typename C::value_type>,
        std::list<typename C::value_type>
    >
{
    typedef typename C::value_type T;
    typedef std::list<T> LIST;
    LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
    std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
    for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
    for (auto b : B) {
        auto ai = Amap.find(b);
        if (ai == Amap.end()) onlyB.push_back(b);
        else {
            bothAB.push_back(b);
            onlyA.erase(ai->second);
        }
    }
    return std::make_tuple(onlyA, onlyB, bothAB);
}

+3

jxh 21 Aug 14 at 16:10

source to share

You can do this in linear time by placing the elements of A in an unordered_map, where the elements from A are keys. Check if B elements are in keys on the map.

-1

dohashi 21 Aug 14 at 15:32

source to share

James kanze · Accepted Answer · 2014-08-21T17:16:15+0000

First, it is not clear from your question whether you mean std::set

when you talk about sorted sets. If so, then the first reaction should be to use std::vector

, if possible, on the original vectors. Just sort them and then:

std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );

And the same for r2

and r3

.

Other than that, I doubt there is much you can do. Just one loop can improve some things:

std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
    if ( *ita < *itb ) {
        onlyA.push_back( *ita );
        ++ ita;
    } else if ( *itb < *ita ) {
        onlyB.push_back( *itb );
        ++ itb;
    } else {
        both.push_back( *ita );
        ++ ita;
        ++ itb;
    }
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );

reserve

can make a difference, and if most of the elements end up in the same vector it probably won't cost the extra memory expensive.

C ++: Differences Between Two Arrays

More articles: