Removing duplicate characters from a string
I have a string like eg. acaddef
or bbaaddgg
. I need to remove all duplicate characters as quickly as possible. So, for example, pooaatat
after that it should look like poat
and ggaatpop
should look like gatpo
. Is there a built-in function or algorithm to do this quickly? I tried to search for STL, but no satisfactory result.
source to share
So here are 4 different solutions.
Fixed array
std::string str = "pooaatat";
// Prints "poat"
short count[256] = {0};
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
[&](unsigned char c) { return count[c]++ == 0; });
Counting Algorithm + Iterator
std::string str = "pooaatat";
// Prints "poat"
std::string::iterator iter = str.begin();
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
[&](char c) { return !std::count(str.begin(), iter++, c); });
Unordered set
std::string str = "pooaatat";
// Prints "poat"
std::unordered_set<char> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
[&](char c) { return container.insert(c).second; });
Unordered map
std::string str = "pooaatat";
// Prints "poat"
std::unordered_map<char, int> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
[&](char c) { return container[c]++ == 0; });
source to share
AFAIK, there is no built-in algorithm for this. The algorithm is std::unique
valid if you only want to remove consecutive duplicate characters.
However, you can follow the following simple approach:
If the string contains only ASCII characters, you can form a boolean array A [256] indicating whether the corresponding character has already been encountered or not.
Then just traverse the input line and copy the character to output if A [character] is still 0 (and make A [character] = 1).
If the string contains arbitrary characters, you can use std::unordered_map
or std::map
from char to int.
source to share
Inline regular expressions must be efficient, i.e.
#include <regex>
[...]
const std::regex pattern("([\\w ])(?!\\1)");
string s = "ssha3akjssss42jj 234444 203488842882387 heeelloooo";
std::string result;
for (std::sregex_iterator i(s.begin(), s.end(), pattern), end; i != end; ++i)
result.append((*i)[1]);
std::cout << result << std::endl;
Of course, you can change the patring group to suit your needs. It's good that it is already supported in Visual Studio 2010 tr1. However, gcc 4.8 seems to have a problem with regex iterators.
source to share