Regex.Split White Space

string pattern = @"(if)|(\()|(\))|(\,)";
string str = "IF(SUM(IRS5555.IRs001)==IRS5555.IRS001,10,20)";
string[] substrings = Regex.Split(str,pattern,RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase) ;
foreach (string match in substrings)
{
    Console.WriteLine("Token is:{0}", match);
}

      

And it came out

Token is:
Token is:IF
Token is:
Token is:(
Token is:SUM
Token is:(
Token is:IRS5555.IRs001
Token is:)
Token is:==IRS5555.IRS001
Token is:,
Token is:10
Token is:,
Token is:20
Token is:)
Token is:

      

As you can see the empty string at 1,3 and the last token, I cannot figure out why such a result, there is no empty string in my given string.

I do not want this result

+3


source to share


2 answers


try this:

        string pattern = @"(if)|(\()|(\))|(\,)";
        string str = "IF(SUM(IRS5555.IRs001)==IRS5555.IRS001,10,20)";
        var substrings = Regex.Split(str, pattern, RegexOptions.IgnoreCase).Where(n => !string.IsNullOrEmpty(n));
        foreach (string match in substrings)
        {
            Console.WriteLine("Token is:{0}", match);
        }

      



enter image description here

+4


source


This is because "IF" and "(" are delimiters, and because there is nothing to the left of "IF", and nothing between "IF" and "(" you get these two empty entries. Remove "IF" from the pattern.

string pattern = @"(\()|(\))|(\,)"; 

      


UPDATE

You can search for tokens instead of splitting the string

var matches = Regex.Matches(str, @"\w+|[().,]|==");

      

This returns the token characters of your text.

string[] array = matches.Cast<Match>().Select(m => m.Value).ToArray();

      



    [0]: "IF"
    [1]: "("
    [2]: "SUM"
    [3]: "("
    [4]: "IRS5555"
    [five]: "."
    [6]: "IRs001"
    [7]: ")"
    [8]: "=="
    [9]: "IRS5555"
    [ten]: "."
    [11]: "IRS001"
    [12]: ","
    [13]: "10"
    [fourteen]: ","
    [15]: "20"
    [sixteen]: ")"

UPDATE

Another pattern Regex

you can try along with Regex.Split

is

@"\b"

      

It will split text at word boundaries

    [0]: ""
    [1]: "IF"
    [2]: "("
    [3]: "SUM"
    [4]: "("
    [5]: "IRS5555"
    [6]: "."
    [7]: "IRs001"
    [8]: ") =="
    [9]: "IRS5555"
    [ten]: "."
    [11]: "IRS001"
    [12]: ","
    [13]: "10"
    [fourteen]: ","
    [15]: "20"
    [sixteen]: ")"
+2


source







All Articles