Removing all non-letter characters from a string in C #
I want to remove all letters from a string. When I say all letters, I mean everything that is not in the alphabet or apostrophe. This is the code I have.
public static string RemoveBadChars(string word)
{
char[] chars = new char[word.Length];
for (int i = 0; i < word.Length; i++)
{
char c = word[i];
if ((int)c >= 65 && (int)c <= 90)
{
chars[i] = c;
}
else if ((int)c >= 97 && (int)c <= 122)
{
chars[i] = c;
}
else if ((int)c == 44)
{
chars[i] = c;
}
}
word = new string(chars);
return word;
}
It's close, but doesn't quite work. The problem is this:
[in]: "(the"
[out]: " the"
This gives me a space instead of "(". I want to completely remove the character.
source to share
A regex would be better as it is quite inefficient, but to answer your question, the problem with your code is that you have to use a variable other than i inside the for loop. So, something like this:
public static string RemoveBadChars(string word)
{
char[] chars = new char[word.Length];
int myindex=0;
for (int i = 0; i < word.Length; i++)
{
char c = word[i];
if ((int)c >= 65 && (int)c <= 90)
{
chars[myindex] = c;
myindex++;
}
else if ((int)c >= 97 && (int)c <= 122)
{
chars[myindex] = c;
myindex++;
}
else if ((int)c == 44)
{
chars[myindex] = c;
myindex++;
}
}
word = new string(chars);
return word;
}
source to share
There Char
is a method in the class that can help. Use Char.IsLetter()
to find the correct letters (and an extra check for the apostrophe) and then pass the result to the constructor string
:
var input = "(the;':";
var result = new string(input.Where(c => Char.IsLetter(c) || c == '\'').ToArray());
Output:
source to share
Use Regular Expression (Regex) instead .
public static string RemoveBadChars(string word)
{
Regex reg = new Regex("[^a-zA-Z']");
return reg.Replace(word, string.Empty);
}
If you don't want to replace spaces:
Regex reg = new Regex("[^a-zA-Z' ]");
source to share
private static Regex badChars = new Regex("[^A-Za-z']");
public static string RemoveBadChars(string word)
{
return badChars.Replace(word, "");
}
This creates a regular expression consisting of a character class (enclosed in square brackets) that looks for anything that is not (leading ^
within the character class) AZ, az, or. It then defines a function that replaces whatever matches the expression with an empty string.
source to share