Extract all valid characters from regex

I need to extract a list of all allowed characters from a given regex.

So, for example, if the regex looks like this (some random example):

[A-Z]*\s+(4|5)+

      

the output should be

ABCDEFGHIJKLMNOPQRSTUVWXYZ45

      

(omitting spaces)

One obvious solution would be to determine the complete set of valid characters and use a method find

to return the appropriate subsequence for each character. This seems to be a bit of a boring solution, though.

Can anyone think of a (possibly simple) algorithm on how to implement this?

+3


source to share


1 answer


One thing you can do is:

  • split regex into subgroup
  • check char panel against subgroup

See next example (not perfect yet)



static void Main(String[] args)
{
    Console.WriteLine($"-->{TestRegex(@"[A-Z]*\s+(4|5)+")}<--");
}

public static string TestRegex(string pattern)
{
    string result = "";
    foreach (var subPattern in Regex.Split(pattern, @"[*+]"))
    {
        if(string.IsNullOrWhiteSpace(subPattern))
            continue;
        result += GetAllCharCoveredByRegex(subPattern);
    }

    return result;
}

public static string GetAllCharCoveredByRegex(string pattern)
{
    Console.WriteLine($"Testing {pattern}");
    var regex = new Regex(pattern);
    var matches = new List<char>();
    for (var c = char.MinValue; c < char.MaxValue; c++)
    {

        if (regex.IsMatch(c.ToString()))
        {
            matches.Add(c);
        }
    }
    return string.Join("", matches);
}

      

What are the outputs:

Testing [AZ]

Testing \ s

Testing (4 | 5)

-> ABVGDEZHZIKLMNOPRSTUFHCHSHEYYA

?????????? 45 <-

0


source







All Articles