Why is this regex for word does not match
This is the example text:
Point 1: 1. Way of work.
This corresponds to 1.
:
\b1\.
But this does not match 1.
:
\b1\.\b
I need to have an exact match for 1.
I am testing it here .
.
is not a word symbol. \b
checks word boundaries , i.e. boundaries between a word and characters that are not considered part of words. Therefore, you cannot expect what .
is inside the "word" 1.
, because these two characters do not form a word.
A quick reference document describes \b
how:
The match must occur at the boundary between \ w (alphanumeric) and \ W (nonalphanumeric).
And it is \w
described as:
Matches any character in the word.
If you check what the Word symbol means , you can see that it includes the Unicode classes Ll [Letter, Lowercase] ; Lou [Letter, Uppercase] ; Lt [Letter, Titlecase] ; Lo [Letter, Other] ; Lm [Letter, Modifier] ; Mn [Mark, Nonspacing] ; Nd [number, decimal digit] and Pc [Punctuation, connector] .
But it .
has a Unicode Po [Punctuation, Other] class that is not listed above.
So, if you expect to \b
match the word boundary in 1.
, it is in the range between 1
and .
. This answers your question Why .
Note: . .NET regex expressions should preferably be tested on testing sites dedicated to them, such as Regex Storm . If you validate your regex with a PCRE regex expression (such as in the site you linked), you might get different results from .NET.