Duplicate elements in std :: vector

I have std::vector

, and I want to check all the elements in it. If a certain element appears multiple times, I signal an error.

This is how I did it:

std::vector<std::string> test;
test.push_back("YES");
test.push_back("YES");

for(int i = 0; i < test.size(); i++)
{
    if(test[i] > 1)
    {
        DCS_LOG_DEBUG("ERROR WITH COUNT")
    }
}

      

It didn't work, although I know how to count using the method std::vector::count()

. But I want to get a score for each item, not count all ... any ideas?

+3


source to share


7 replies


The easiest way is a std::sort

vector and then use std::adjacent_find

.


However, if you don't want to sort the vector, you can do something like this in C ++ 11:

#include <unordered_map>
#include <functional> // For std::hash<std::string>.
#include <string>
#include <iostream>

int main() {

    // Test data.
    std::vector<std::string> v;
    v.push_back("a");
    v.push_back("b");
    v.push_back("c");
    v.push_back("a");
    v.push_back("c");
    v.push_back("d");
    v.push_back("a");

    // Hash function for the hashtable.
    auto h = [](const std::string* s) {
        return std::hash<std::string>()(*s);
    };

    // Equality comparer for the hashtable.
    auto eq = [](const std::string* s1, const std::string* s2) {
        return s1->compare(*s2) == 0;
    };

    // The hashtable:
    //      Key: Pointer to element of 'v'.
    //      Value: Occurrence count.
    std::unordered_map<const std::string*, size_t, decltype(h), decltype(eq)> m(v.size(), h, eq);

    // Count occurances.
    for (auto v_i = v.cbegin(); v_i != v.cend(); ++v_i)
        ++m[&(*v_i)];

    // Print strings that occur more than once:
    for (auto m_i = m.begin(); m_i != m.end(); ++m_i)
        if (m_i->second > 1)
            std::cout << *m_i->first << ": " << m_i->second << std::endl;

    return 0;

}

      



Prints:

a: 3
c: 2

      

I have not actually compared this, but this has a chance of being quite effective for the following reasons:

  • Assuming the actual vector elements do not produce pathologically one-sided hashes, this is actually an O (n) algorithm rather than O (n * log (n)) for sorting.
  • We use a hash table of pointers for the strings, not the strings themselves, so there is no unnecessary copying.
  • We can "preallocate" the hash tables (we pass v.size()

    on build m

    ), so the size of the hash table is reduced.
+7


source


Specific element

Count is the standard way:

#include <algorithm>
...

    if (count (test.begin(), test.end(), "YES") > 1)
        std::cerr << "positive\n";

      

If you need more performance, you can do it in the classic way:

bool exists = false;
for (auto const& v : test) {
    if (v == "YES") {
        if (exists) {
            std::cerr << "positive\n";
            break;
        }
        else exists = true;
    }
}

      

Any element multiple times



For large vectors try std::set

:

std::set<std::string> exists;
for (auto const &v : test) {
    if (!exists.insert(v).second)
        std::cerr << "positive\n";
}

      

In this approach, if you also want to know if you've already mentioned your non-uniqueness, you can use std::multiset

:

const std::multiset<std::string> counts (test.begin(), test.end());
for (auto const &v: test)
    if (counts.count (v) == 2) std::cerr << "meh\n";

      

If the container is small and you just want to see if there is any item more than once:

auto multitimes = [&test] (std::string const &str) {
    return count(test.begin(),test.end(),str)>1;
};
if (any_of (test.begin(), test.begin(), multitimes))
    std::cerr << "something was there more than once\n";

      

+5


source


You can use std :: map and define a mapping from key (string) to count (int):

#include <map>
#include <string>
/* ... */
std::map<std::string, int> count_map;

/* ... */

count_map[key]++;

      

+4


source


+2


source


The easiest way to do what you want is to sort the array and then see which elements occur more than once. If you don't want to modify the array itself, you will have to create a copy. It's an O (n * lg n) solution with no extra space if you don't care about ordering, and with O (n) extra space if you do.

sort(test.begin(), test.end());

// If you only care if there is a repeated element, do this:
int size = test.size();
unique(test.begin(), test.end());
if (test.size() != size) {
  cout << "An element is repeated.";
}

// If you do care which elements are repeated, do this:
for (unsigned index = 1; index < test.size(); ++index) {
  if (test[index] == test[index - 1] && (index == 1 || test[index - 2] != test[index])) {
     cout << test[index] << " is repeated.";
  }
}

      

I have provided two solutions: first, you only have to if the line repeats, and second when you care about which lines are repeated.

+2


source


If you don't mind the extra space, try pushing items in map

. Whenever you find your element already on the map, you can signal an error directly.

map<string, int> occurrences;

for (vector<string>::const_iterator cit = test.begin(); cit != test.end(); ++cit)
    if ((++occurrences[*cit]) == 2)
        cout << "ERROR"; // You can even signal which element is repeated here easily, using *cit.

      

Note that this code correctly emits the message only once per item (even if the item is repeated many times), as cleverly tweaked by Tony Delroy . While this method correctly accounts for the occurrence of every row in the entire collection (which might be something necessary), this method is prone to overflow int

if there are 2 31 copies of the same item (or more). You can use instead long long int

, if so and you really want to count each row.

If you are not interested in the number of lines, a more efficient way is to use set

, as smerlin suggests (because this only supports a line and not a couple of lines and int

how it does map

), thus reducing the space requirements ... and throwing an error every time when you find an item in the set:

set<string> occurrences;

for (vector<string>::const_iterator cit = test.begin(); cit != test.end(); ++cit)
    if (false == occurrences.insert(*cit).second)
        cout << "ERROR"; // You can even signal which element is repeated here easily, using *cit.

      

If you want to fix the problem before this happens, paste the items into set

. It removes duplicates automatically. But note that the items are set

sorted, so you won't preserve the insertion order. set

Much better if you don't mind, since searching in it and reading elements in sorted order is much more efficient.

+2


source


In one solution, you can use two for loops .... I think it will be simple ..

For example:

std::vector<std::string> test;
test.push_back("YES");
test.push_back("YES");

for(int i = 0; i < test.size(); i++)
{
    for(int j = 0; j < test.size(); j++)
    {
         if(i != j)
         {
              if(test[i] == test[j])
              {
                   DCS_LOG_DEBUG("ERROR WITH COUNT")
              }
         }
    }
}

      

+1


source