C ++: Differences Between Two Arrays

I have two unsorted random access arrays of one simple element type (int / string / etc, so there are all comparison operators, can be hashed, etc.). There should be no duplicate elements in any array.

We are looking for a common algorithm, which, given the given arrays A and B, will tell me:

  • What items are in and B
  • What elements are in A but not B
  • What elements are in B but not A

I think I could do this with set operators like below, but is there a faster solution (such as one that doesn't require me to create two sorted sets)?

r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);

      

+3


source to share


3 answers


First, it is not clear from your question whether you mean std::set

when you talk about sorted sets. If so, then the first reaction should be to use std::vector

, if possible, on the original vectors. Just sort them and then:

std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );

      

And the same for r2

and r3

.



Other than that, I doubt there is much you can do. Just one loop can improve some things:

std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
    if ( *ita < *itb ) {
        onlyA.push_back( *ita );
        ++ ita;
    } else if ( *itb < *ita ) {
        onlyB.push_back( *itb );
        ++ itb;
    } else {
        both.push_back( *ita );
        ++ ita;
        ++ itb;
    }
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );

      

reserve

can make a difference, and if most of the elements end up in the same vector it probably won't cost the extra memory expensive.

+3


source


Something like the following algorithm would run O (| A | + | B |) (assuming O (1) out behavior unordered_map

):

  • Let the list onlyA

    initially contain all A, and the lists onlyB

    and bothAB

    start out as empty.
  • Let the hash table Amap

    concatenate the elements in onlyA

    with their corresponding iterator in onlyA

    .
  • For each element b in B

    • If b finds a matching iterator ai in Amap
      • Add b to bothAB

      • Remove b from onlyA

        with ai
    • Otherwise add b to onlyB

At the end of the above algorithm



  • onlyA contains elements in A but not in B,
  • onlyB contains elements in B but not in A,
  • both AB contain elements in both A and B.

Below is the implementation of the above. The result is returned as a tuple < onlyA

, onlyB

, bothAB

>.

template <typename C>
auto venn_ify (const C &A, const C &B) ->
    std::tuple<
        std::list<typename C::value_type>,
        std::list<typename C::value_type>,
        std::list<typename C::value_type>
    >
{
    typedef typename C::value_type T;
    typedef std::list<T> LIST;
    LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
    std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
    for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
    for (auto b : B) {
        auto ai = Amap.find(b);
        if (ai == Amap.end()) onlyB.push_back(b);
        else {
            bothAB.push_back(b);
            onlyA.erase(ai->second);
        }
    }
    return std::make_tuple(onlyA, onlyB, bothAB);
}

      

+3


source


You can do this in linear time by placing the elements of A in an unordered_map, where the elements from A are keys. Check if B elements are in keys on the map.

-1


source







All Articles