What's the difference between \ b and \>, \ <in regex?
Summary
\b word boundary \< word boundary; specifically, word boundary followed by a word; ie, start of word \> word boundary; specifically, word followed by word boundary; ie, end of word
If you have the word "bob" then the word boundary pattern \b
will return two zero-length matches that are equivalent to the beginning and end of the word. This is useful because it allows you to select words in rows. Thus, the string "foo bar" that matches with \b
has four empty matches for the start-start-start of two words.
Using this, you can see that \<
will only give you the positions of the start of words (2 matches for start of foo and start of line) and \>
end of words (two matches for end of foo and end of bar).
So, you can equate \b
to \<
like this:
\< is equivalent to start-of-word is equivalent to word-boundary-followed-by-word is equivalent to \b(?=\w)
I think your book "Mastering Regular Expression" is then a little fuzzy and describes \<
both \>
how word boundaries when it should be more precise and distinguish them as "word boundary (especially for the beginning of a word)" and "word boundary (especially for the end of a word ) "respectively.
Python example:
>>> re.compile(r'\b').findall('foo bar') ['', '', '', ''] >>> re.compile(r'\b(?=\w)').findall('foo bar') ['', '']
Please note that python does not support \<
and \>
. And here's an example of why word boundaries are useful. We can choose a BAR that is a whole word and not wrapped inside foo:
>>> re.compile(r'\bBAR\b').findall('foBARo BAR') ['BAR']
source share