Removing duplicate characters from a string

I have a string like eg. acaddef

or bbaaddgg

. I need to remove all duplicate characters as quickly as possible. So, for example, pooaatat

after that it should look like poat

and ggaatpop

should look like gatpo

. Is there a built-in function or algorithm to do this quickly? I tried to search for STL, but no satisfactory result.

+3


source to share


3 answers


So here are 4 different solutions.

Fixed array

std::string str = "pooaatat";

// Prints "poat"
short count[256] = {0};
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](unsigned char c) { return count[c]++ == 0; });

      

Counting Algorithm + Iterator

std::string str = "pooaatat";

// Prints "poat"
std::string::iterator iter = str.begin();
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return !std::count(str.begin(), iter++, c); });

      



Unordered set

std::string str = "pooaatat";

// Prints "poat"
std::unordered_set<char> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container.insert(c).second; });

      

Unordered map

std::string str = "pooaatat";

// Prints "poat"
std::unordered_map<char, int> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container[c]++ == 0; });

      

+3


source


AFAIK, there is no built-in algorithm for this. The algorithm is std::unique

valid if you only want to remove consecutive duplicate characters.

However, you can follow the following simple approach:

If the string contains only ASCII characters, you can form a boolean array A [256] indicating whether the corresponding character has already been encountered or not.



Then just traverse the input line and copy the character to output if A [character] is still 0 (and make A [character] = 1).

If the string contains arbitrary characters, you can use std::unordered_map

or std::map

from char to int.

+3


source


Inline regular expressions must be efficient, i.e.

#include <regex>
[...]

const std::regex pattern("([\\w ])(?!\\1)");
string s = "ssha3akjssss42jj 234444 203488842882387 heeelloooo";
std::string result;

for (std::sregex_iterator i(s.begin(), s.end(), pattern), end; i != end; ++i)
    result.append((*i)[1]);

std::cout << result << std::endl;

      

Of course, you can change the patring group to suit your needs. It's good that it is already supported in Visual Studio 2010 tr1. However, gcc 4.8 seems to have a problem with regex iterators.

0


source







All Articles