C # regex extraction string enclosed in single quotes

I have the following line that I need to parse with RegEx.

abc = 'def' and size = '1 x(3\" x 5\")' and (name='Sam O\'neal')

      

This is a SQL filter that I would like to split into tokens using the following delimiters:

(, ), >,<,=, whitespace, <=, >=, !=

      

After the string has been parsed, I would like the output to be:

abc,
=,
def,
and,
size,
=,
'1 up(3\" x 5\")',
and,
(,
Sam O\'neal,
),

      

I've tried the following code:

string pattern = @"(<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = new List<string>(Regex.Split(filter, pattern));
tokens.RemoveAll(x => String.IsNullOrWhiteSpace(x));

      

I'm not sure how to store the single quoted string as a single token. I am new to Regex and would appreciate any help.

+3


source to share


1 answer


Your template needs to be updated with another alternative branch: '[^'\\]*(?:\\.[^'\\]*)*'

.

It will match:

  • '

    - single quote
  • [^'\\]*

    - symbols 0+, except '

    and\

  • (?:

    - sequences not associated with capturing a group:
    • \\.

      - any escape sequence
    • [^'\\]*

      - symbols 0+, except '

      and\

  • )*

    - zero or more cases
  • '

    - single quote

In C #:

string pattern = @"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";

      



See regex demo

demo c # :

var filter = @"abc = 'def' and size = '1 x(3"" x 5"")' and (name='Sam O\'neal')";
var pattern = @"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = Regex.Split(filter, pattern).Where(x => !string.IsNullOrWhiteSpace(x));
foreach (var tok in tokens)
    Console.WriteLine(tok);

      

Output:

abc
=
'def'
and
size
=
'1 x(3" x 5")'
and
(
name
=
'Sam O\'neal'
)

      

+2


source







All Articles