How to efficiently remove elements from vector C ++

I have a vector of a pair of vectors (V1, V2) called V1V2 pairs of the following form:

(1,2,3),(938,462,4837) -> (V1,V2)
(3,9,13),(938,0472,944)
(81,84,93),(938,84,845)

      

Then I need to keep the following:

(1,2,3),(938,462,4837) -> (V1,V2)
(3,9,13),(938,0472,944)
(81,84,93),(84,845)

      

I need to start scanning a pair V1V2 from the beginning and where ever two V1s are not equal, I need to remove intersecting elements from V2. For this I wrote the following code. However, my code turns out to be very inefficient as my vector pair V1V2 is large and has many elements in V2 (about a billion).

int main(int argc, char** argv) {
    std::vector<std::pair<std::vector<unsigned>, std::vector<unsigned> > > pairV1V2;
    std::vector<std::pair <std::vector<unsigned>,std::vector<unsigned> > >::iterator itm2,lm2=pairV1V2.end();
    for(std::vector<std::pair <std::vector<unsigned>,std::vector<unsigned> > >::iterator itm=pairV1V2.begin(), lm=pairV1V2.end(); itm!=lm; ++itm)
    {
        //Outer values
        vector<unsigned> outerV1=(*itm).first;
        vector<unsigned> outerV2=(*itm).second;
        sort(outerV2.begin(), outerV2.end());
        itm2=itm;
        itm2++;
        for(itm2;itm2!=lm2;++itm2)
        {
            vector<unsigned> innerV1=(*itm2).first;
            vector<unsigned> innerV2=(*itm2).second;
            vector<unsigned> setDiffV1;
            std::set_difference(innerV1.begin(), innerV1.end(), outerV1.begin(), outerV1.end(),
                                                      std::inserter(setDiffV1, setDiffV1.end()));            
            if(setDiffV1.size()==0) //check whether any two V1 are different
            {                 
                sort(innerV2.begin(), innerV2.end());
                if((itm->second.size()!=0)&&(itm2->second.size()!=0)){                                
                    std::vector<unsigned> delIntersectingElem;
                    std::set_intersection(outerV2.begin(),outerV2.end(),innerV2.begin(), innerV2.end(),
                              std::back_inserter(delIntersectingElem));

                   if(delIntersectingElem.size()!=0) //if there are intersecting V2's
                   {                    
                        for(std::vector<unsigned>::iterator its=(itm2->second).begin(),ls=(itm2->second).end();its!=ls;)
                        { 
                            //if *its is present in delIntersectingElem then delete it.
                            if(!(std::find(delIntersectingElem.begin(), delIntersectingElem.end(), (*its)) == delIntersectingElem.end()))
                            {
                                (itm2->second).erase(its); //delete intersecting elements from inner v2
                                ls--;
                            }else{
                                ++its;
                            }
                        }                    
                    }
                }
            } 
        }
    }    
    return 0;
}

      

Can someone please help me improve my current code - it gives the correct answer (in the example I could skip a few cases for brevity - but the code handles them all) but very slow (like diagonalized by perf). I would be grateful if there are any suggestions for improvement in my current code snippet. However, if the logic of the two codes is the same, then the new algorithm is also acceptable

+3


source to share


2 answers


It uses an unfinished STL algorithm called it remove_if

, which allows you to efficiently (O (n)) remove all elements matching a predicate from a container. This is most useful if you have vector

or deque

because they have an expensive delete operation (O (n)) on the middle element. However, you need to know that it remove_if

doesn't actually erase any element, it moves all elements that do not match the predicate before the front of the range you specify. So the canonical way to do " erase_if

" (in this example, all odd integers will be removed):


std::vector ints = …;
ints.erase(std::remove_if(begin(ints), end(ints), [](int i) { return i%2 != 0; }), end(ints));

      

Explanation: remove_if

Moves around all ints that do not match the predicate (i.e. even int in this example) and returns an iterator for the last of those elements. Then we actually erase all elements from this to the end of the vector using range overloading vector<int>::erase

.



For example, suppose that ints == {5,7,4,10,9,16,20,6}

. remove_if

will turn ints

into {4,10,16,20,6,UNSPEC,UNSPEC,UNSPEC}

where I used UNSPEC

to denote any unspecified value and it will also return an iterator pointing to the first element UNSPEC

. Then we will remove all elements with an unspecified value and get the {4,10,16,20,6}

desired result.

UPDATE: Regarding the previous answer, I want to point out which remove_if

is stable, i.e. does not change the order of the rest of the elements.

+4


source


The most efficient way to remove an element from a vector is the writeback trick, but this only applies if you don't need ordering.

#include <vector>
#include <iostream>

int main()
{
    std::vector<int> v { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    auto it = v.begin() + 5;
    // replace the current element with the back of the vector,
    // then shrink the size of the vector by 1.
    *it = std::move(v.back());
    v.pop_back();

    for (auto n : v) {
        std::cout << n << " ";
    }
    std::cout << "\n";
}

      

http://ideone.com/0jbWHZ

If you know there will be many deletions or a very large vector, you can save efficiency by using this trick, remembering to keep ++ your current iterator after deletion, and std::sort()

vector when you are through.

--- Edit ---

#include <algorithm>
#include <iostream>
#include <vector>

//! Efficiently remove an element from a vector without
//! preserving order. If the element is not the last element
//! in the vector, transfer the last element into its position
//! using a move if possible.
//! Regardless, we then shrink the size of the vector deleting
//! the element at the end, which will either be destructed or
//! the element we were deleting.
//! @note: Effectively invalidates the current iterator.
template<class ValueType>
bool unstable_remove(
    typename std::vector<ValueType>& container,
    typename std::vector<ValueType>::iterator it
    )
{
    // Leave in-situ if we are already the tail element.
    auto lastEl = container.end() - 1;
    if (it != lastEl) {
        // overwrite this element with what is in the last,
        // which should have the same effect as deleting this.
        *it = std::move(*lastEl);
    }
    // release the last cell of the vector, because it should
    // now either be destructed or contain the value we were
    // deleting.
    container.pop_back();
}

int main()
{
    std::vector<int> ints { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    auto it = ints.begin();
    while (it != ints.end()) {
        if ((*it % 3) == 0) {
            unstable_remove(ints, it);
            // do not pass go / ++it
            continue;
        }
        ++it;
    }
    std::cout << "after removes:\n";
    for (auto val : ints)
        std::cout << val << " ";
    std::cout << "\n";
    std::sort(ints.begin(), ints.end());
    std::cout << "after sort:\n";
    for (auto val : ints)
        std::cout << val << " ";
    std::cout << "\n";
}

      

Produces ( http://ideone.com/hGZPOC )



after removes:
1 2 10 4 5 8 
after sort:
1 2 4 5 8 10 

      

--- Edit 2 ---

Here's a cleanup of your code for readability, I've also dropped your trailing captures because ... you are removing elements.

#include <vector>
#include <cstdint>

using vec_t = std::vector<uint32_t>;
using vecpair_t = std::pair<vec_t, vec_t>;
using pairvec_t = std::vector<vecpair_t>;

int main(int argc, char** argv) {
    pairvec_t pairV1V2;
    for(auto itm = pairV1V2.begin(); itm != pairV1V2.end(); ++itm)
    {
        //Outer values
        auto& outerV1 = itm->first; // NOTE '&' - reference not copy!
        auto& outerV2 = itm->second;
        sort(outerV2.begin(), outerV2.end());
        for(auto itm2 = itm + 1; itm2 != pairV1V2.end(); ++itm2)
        {
            auto& innerV1 = itm2->first;
            auto& innerV2 = itm2->second;
            vec_t setDiffV1;

      

As for another way to optimize this - since your lists are sorted - go through both lists at the same time, comparing the values.

template<typename ValueType>
void dedupe_vectors(
    typename std::vector<ValueType>& lhs,
    typename std::vector<ValueType>& rhs
    )
{
    auto lit = lhs.begin();
    auto rit = rhs.begin();
    while (rit != rhs.end) {
        while (lit != lhs.end() && *lit < *rit)
            ++lit;
        if (lit == lhs.end())
            break;
        if (*lit == *rit) {
            v2.erase(rit);
            continue;
        }  
        ++rit;
    }
}

      

I know we are testing lit

vs twice lhs.end

. Take a look at the code your compiler generates with -O3 and see if it found it itself. If so, then you might be concerned about optimization.

+3


source







All Articles