Word-boundary problem (\ b)

Question

Word-boundary problem (\ b)

I have an array of keywords and I want to know if at least one of the keywords was found in some string that was posted. I also want to be absolutely sure that it is the keyword that was matched and not something that looks a lot like the word.

Say, for example, what our keywords are [English, Eng, En]

because we are looking for some variations of the English language.

Now say that the user input i h8 eng class

or something equally provocative and illiterate - then it should be matched eng

. It also must not match a type word england

or some odd thing chen

, even though it got a bit en

.

So, in my endless lack of wisdom, I figured I could do something along these lines to match one of my array elements using input:

.match(RegExp('\b('+array.join('|')+')\b','i'))

Given that the regex will search for matches from an array is now represented as (English|Eng|En)

, then see if there were zero-width word boundaries on both sides.

+3

javascript regex

tesc 07 Mar 12 at 15:05

source to share

4 answers

You need to double the backslash in your JavaScript string, or encode the Backspace character:

.match(RegExp('\\b('+array.join('|')+')\\b','i'))

+3

Tim Pietzcker 07 Mar 12 at 15:08

source to share

You need to get away from twice \b

because it has special meaning on the lines:

.match(RegExp('\\b('+array.join('|')+')\\b','i'))

+1

kirilloid 07 Mar 12 at 15:08

source to share

\b

is an escape sequence within string literals (see table 2.1 on this page ). You should avoid this by adding one extra slash:

.match(RegExp('\\b('+array.join('|')+')\\b','i'))

You don't need to hide \b

when using inside a regex literal:

/\b(english|eng|en)\b/i

+1

Salman A 07 Mar At 15:09

source to share

Pointy · Accepted Answer · 2012-03-07T15:06:51+0000

You need a double backslash.

When you create a regex with a constructor RegExp()

, you are passing in a string. The string string syntax also treats backslashes as a metacharacter, for quoting quotes, etc. This way the backslash will be effectively removed before the code RegExp()

even runs

When you double them, the string parsing step will leave behind a backslash. Then the parser RegExp()

will see a single backslash before "b" and do it right.

Word-boundary problem (\ b)

More articles: