What's the difference between \ b and \>, \ <in regex?

Now I am very confused.

I found this in the regex cheat sheet

\b    word boundary
\<    start of word
\>    end of word

      

But in Mastering Regular Expression I was told that

\<    word boundary
\>    word boundary

      

What is the difference between \b

and \>

\<

in regex?

+2


source share


1 answer


Summary

\b    word boundary
\<    word boundary; specifically, word boundary followed by a word; ie, start of word
\>    word boundary; specifically, word followed by word boundary; ie, end of word

      

If you have the word "bob" then the word boundary pattern \b

will return two zero-length matches that are equivalent to the beginning and end of the word. This is useful because it allows you to select words in rows. Thus, the string "foo bar" that matches with \b

has four empty matches for the start-start-start of two words.

Using this, you can see that \<

will only give you the positions of the start of words (2 matches for start of foo and start of line) and \>

end of words (two matches for end of foo and end of bar).

So, you can equate \b

to \<

like this:

  \< 
is equivalent to
  start-of-word 
is equivalent to
  word-boundary-followed-by-word 
is equivalent to
  \b(?=\w)

      



I think your book "Mastering Regular Expression" is then a little fuzzy and describes \<

both \>

how word boundaries when it should be more precise and distinguish them as "word boundary (especially for the beginning of a word)" and "word boundary (especially for the end of a word ) "respectively.

Python example:

>>> re.compile(r'\b').findall('foo bar')
['', '', '', '']
>>> re.compile(r'\b(?=\w)').findall('foo bar')
['', '']

      

Please note that python does not support \<

and \>

. And here's an example of why word boundaries are useful. We can choose a BAR that is a whole word and not wrapped inside foo:

>>> re.compile(r'\bBAR\b').findall('foBARo BAR')
['BAR']

      

+3


source







All Articles