Regular expression excludes non-word characters but leaves spaces

I am trying to write Regex

to stop using invalid character input in a zip code field.

from this link I was able to exclude all the "no word" characters.

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

      

But this also excludes the "Space" symbols.

I'm sure this is possible, but I find the regex very confusing!

Can anyone help with an explanation of the regex pattern used?

+3


source to share


3 answers


You can use character class subtraction :

[\W_-[\s]]+

      



It matches one or more non-word and underscore characters, excluding whitespace.

+6


source


Assuming valid postcodes only contain an alphanumeric character, you can replace anything other than alphanumeric characters and spaces with an empty string:

Regex regex = new Regex(@"[^a-zA-Z0-9\s]");
string cleanText = regex.Replace(messyText, "").ToUpper();

      



Note that this \s

includes tabs, newlines, and other unusable character. You may not want to consider them valid. In this case, just list the whitespace character literally:

[^a-zA-Z0-9 ]

      

+3


source


You can invert your character class to make it a negative character class like this:

[^\sa-zA-Z0-9]+

      

This will match any character other than a whitespace or alphanumeric character.

RegEx Demo (since it is not a .NET regex)

0


source







All Articles