Match returns an empty string

I am using the following Regex

to retrieve data from a string:

private static string ExtractRawString(string input, string fieldName)
{
    return Regex.Match(input, $@"{fieldName}:(.+?)\n").Groups[1].Value;
}

      

If the input string is:

NAME OF PRODUCT:         Product 30AMP \n \nCOMPANY PART NUMBER:   11111\nOEM COMPANY:   COMPANY2 \n \nADD IMAGE HERE:    \n \n \n \n๏‚จ - CHECKED \n \n  \nOEM PART NUMBER:  22222 \nSERIAL NUMBER:  33333 \nCLASSIFICATION:   Product \nDIMENSIONS: UNKNOWN \nWEIGHT:  0.06Kgโ€™s \nCOMPANY PRICE (INC VAT):  R 450.53 ZAR \nOEM PRICE:  \nCoO:  USA/MEXICO \n 

      

For example, I could call a function like this:

var productName = ExtractRawString(inputString, "NAME OF PRODUCT");

      

This works for each field in the input string (e.g. NAME OF PRODUCT

, COMPANY PART NUMBER

etc.) aprt from COMPANY PRICE (INC VAT)

.

When I call the following, it just returns an empty string ( ""

):

var companyPrice = ExtractRawString(inputString, "COMPANY PRICE (INC VAT)");

      

I tried replacing (.+?)

with Regex

with (.)

, but with the same result.

Can anyone please tell why this returns an empty string when the format is the same as all other fields?

+3


source to share


2 answers


You need to escape the key name, otherwise (

they are )

treated as grouping building operators. This can be done using the method Regex.Escape()

.

In addition, part of the :(.+?)\n

pattern requires a newline. You just need to use the greedy version of the quantifier and remove \n

as it .

matches any char but a newline in the .NET regex:

$@"{Regex.Escape(fieldName)}:(.+)"

      

This Regex.Escape()

will add literal backslashes in front of any special regex character, so it (

can match literal (

, etc. The greedy quantifier would capture 1 or more non-newline characters at the same time, while the lazy ( +?

) made the regex engine skip the quantitative pattern and tried to match the newline, which made the \n

pattern part of the pattern required and made the pattern rather ineffective.



Note that in order to .

match any char but newline, you must not pass a parameter RegexOptions.Singleline

to the Regex constructor. If you have no control over this, use a modifier group like this to make it .

match characters not associated with the new one:

$@"{Regex.Escape(fieldName)}:((?-s:.+))"
                              ^^^^^  ^

      

See an example COMPANY PRICE \(INC VAT\):((?-s:.+))

regex demo on the online .NET regular expression tester .

+4


source


Are the parentheses in the triple parenthesis copied in the string, in C #? I would like to make sure this happens first.



+1


source







All Articles