Regular expression excludes non-word characters but leaves spaces
I am trying to write Regex
to stop using invalid character input in a zip code field.
from this link I was able to exclude all the "no word" characters.
Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
But this also excludes the "Space" symbols.
I'm sure this is possible, but I find the regex very confusing!
Can anyone help with an explanation of the regex pattern used?
source to share
You can use character class subtraction :
[\W_-[\s]]+
It matches one or more non-word and underscore characters, excluding whitespace.
source to share
Assuming valid postcodes only contain an alphanumeric character, you can replace anything other than alphanumeric characters and spaces with an empty string:
Regex regex = new Regex(@"[^a-zA-Z0-9\s]");
string cleanText = regex.Replace(messyText, "").ToUpper();
Note that this \s
includes tabs, newlines, and other unusable character. You may not want to consider them valid. In this case, just list the whitespace character literally:
[^a-zA-Z0-9 ]
source to share
You can invert your character class to make it a negative character class like this:
[^\sa-zA-Z0-9]+
This will match any character other than a whitespace or alphanumeric character.
RegEx Demo (since it is not a .NET regex)
source to share