How to use regex to match anything from A to B where B does not follow C

It's hard for me to deal with this. First of all, here is the tricky part of the string that I am matching:

"a \"b\" c"

      

I want to extract the following from this:

a \"b\" c

      

Of course, this is just a substring from a larger string, but everything else works as expected. The problem is that the regex ignores quotes, which are escaped with a backslash.

I have looked into various ways to do this, but nothing has given me the correct results. My last attempt looks like this:

"((\"|[^"])+?)"

      

In various tests online, this works as it should, but when I create my ASP.NET page, it crashes on the first "leaving me with only a letter, space and backslash.

The logic behind the above pattern is to grab all instances of \ "or whatever is not". I was hoping this would look for ", making sure to find them first, but I got the feeling that this is being overridden by the second part of the expression, which is just one single character. One backslash does not match two characters (\"). but it will match not "And from there the next character will be one" and the match is complete. (This is just my hypothesis as to why my template isn't working.)

Any pointers on this? I've tried various combinations with "look" methods in the regex, but I really didn't get anything. I also feel like this is what I need.

+3


source to share


2 answers


The following expression worked for me:

"(?<Result>(\\"|.)*)"

      

The expression matches the following:

  • Initial quote (letter "

    )
  • A named capture (?<name>pattern)

    consisting of:
    • Zero or more occurrences of *

      literal \"

      or ( |

      ) any single character ( .

      )
  • Final closing quote (letter "

    )

Note that the quantifier *

(zero or more) is not greedy, so the resulting quote matches the literal "

and not the "any single character" part .

.

I used ReSharper 9 built-in regex validator to design the expression and validate the results:



ReSharper "Validate Regular Expression" feature

I used the Explicit Capture option to reduce the steepness of the output ( RegexOptions.ExplicitCapture

).

It should be noted that I am matching the entire string, but I am only capturing the substring using named capture. Using the named grips is a really useful way to get the results you want. In code, it might look something like this:

    static string MatchQuotedString(string input)
        {
        const string pattern = @"""(?<Result>(\\""|.)*)""";
        const RegexOptions options = RegexOptions.ExplicitCapture;
        Regex regex = new Regex(pattern, options);
        var matches = regex.Match(input);
        var substring = matches.Groups["Result"].Value;
        return substring;
        }

      

Optimization. If you plan on reusing the regex, you can include it in the field and use a parameter RegexOptions.Compiled

, this precompiles the expression and gives you higher throughput at the cost of longer initialization.

+1


source


ORIGINAL ANSWER

To match a string like a \"b\" c

you need to use the following regex expression:

(?:\\"|[^"])+
var rx = Regex(@"(?:\\""|[^""])+");

      

See Demo RegexStorm

Here is the IDEONE daemon :

var str = "a \\\"b\\\" c";
Console.WriteLine(str);
var rx = new Regex(@"(?:\\""|[^""])+");
Console.WriteLine(rx.Match(str).Value);

      

Notice @

before the string literal which allows for shorthand literals where we need to double quotes to match literal quotes and use single slash copies instead of double ones. This makes the regular expressions easier to read and maintain.

If you want to match any escaped objects in the input string, you can use:



var rx = new Regex(@"[^""\\]*(?:\\.[^""\\]*)*");

      

See demo at RegexStorm

UPDATE

As per the lines quoted, just add quotes around the pattern:

var rx = new Regex(@"""(?<res>[^""\\]*(?:\\.[^""\\]*)*)""");

      

This pattern gives much better performance than that suggested by Tim Long's regex, see RegexHero test resuls:

enter image description here

+1


source







All Articles