Extract all valid characters from regex
I need to extract a list of all allowed characters from a given regex.
So, for example, if the regex looks like this (some random example):
[A-Z]*\s+(4|5)+
the output should be
ABCDEFGHIJKLMNOPQRSTUVWXYZ45
(omitting spaces)
One obvious solution would be to determine the complete set of valid characters and use a method find
to return the appropriate subsequence for each character. This seems to be a bit of a boring solution, though.
Can anyone think of a (possibly simple) algorithm on how to implement this?
+3
source to share
1 answer
One thing you can do is:
- split regex into subgroup
- check char panel against subgroup
See next example (not perfect yet) FROM#
static void Main(String[] args)
{
Console.WriteLine($"-->{TestRegex(@"[A-Z]*\s+(4|5)+")}<--");
}
public static string TestRegex(string pattern)
{
string result = "";
foreach (var subPattern in Regex.Split(pattern, @"[*+]"))
{
if(string.IsNullOrWhiteSpace(subPattern))
continue;
result += GetAllCharCoveredByRegex(subPattern);
}
return result;
}
public static string GetAllCharCoveredByRegex(string pattern)
{
Console.WriteLine($"Testing {pattern}");
var regex = new Regex(pattern);
var matches = new List<char>();
for (var c = char.MinValue; c < char.MaxValue; c++)
{
if (regex.IsMatch(c.ToString()))
{
matches.Add(c);
}
}
return string.Join("", matches);
}
What are the outputs:
Testing [AZ]
Testing \ s
Testing (4 | 5)
-> ABVGDEZHZIKLMNOPRSTUFHCHSHEYYA
?????????? 45 <-
0
source to share