Why is this regex for word does not match

This is the example text:

Point 1: 1. Way of work.

This corresponds to 1.

:

\b1\.

      

But this does not match 1.

:

\b1\.\b

      

I need to have an exact match for 1.

I am testing it here .

+3


source to share


1 answer


.

is not a word symbol. \b

checks word boundaries , i.e. boundaries between a word and characters that are not considered part of words. Therefore, you cannot expect what .

is inside the "word" 1.

, because these two characters do not form a word.


A quick reference document describes \b

how:

The match must occur at the boundary between \ w (alphanumeric) and \ W (nonalphanumeric).

And it is \w

described as:



Matches any character in the word.

If you check what the Word symbol means , you can see that it includes the Unicode classes Ll [Letter, Lowercase] ; Lou [Letter, Uppercase] ; Lt [Letter, Titlecase] ; Lo [Letter, Other] ; Lm [Letter, Modifier] ; Mn [Mark, Nonspacing] ; Nd [number, decimal digit] and Pc [Punctuation, connector] .

But it .

has a Unicode Po [Punctuation, Other] class that is not listed above.

So, if you expect to \b

match the word boundary in 1.

, it is in the range between 1

and .

. This answers your question Why .

Note: . .NET regex expressions should preferably be tested on testing sites dedicated to them, such as Regex Storm . If you validate your regex with a PCRE regex expression (such as in the site you linked), you might get different results from .NET.

+1


source







All Articles