Word-boundary problem (\ b)

I have an array of keywords and I want to know if at least one of the keywords was found in some string that was posted. I also want to be absolutely sure that it is the keyword that was matched and not something that looks a lot like the word.

Say, for example, what our keywords are [English, Eng, En]

because we are looking for some variations of the English language.

Now say that the user input i h8 eng class

or something equally provocative and illiterate - then it should be matched eng

. It also must not match a type word england

or some odd thing chen

, even though it got a bit en

.

So, in my endless lack of wisdom, I figured I could do something along these lines to match one of my array elements using input:

.match(RegExp('\b('+array.join('|')+')\b','i'))

      

Given that the regex will search for matches from an array is now represented as (English|Eng|En)

, then see if there were zero-width word boundaries on both sides.

+3


source to share


4 answers


You need a double backslash.

When you create a regex with a constructor RegExp()

, you are passing in a string. The string string syntax also treats backslashes as a metacharacter, for quoting quotes, etc. This way the backslash will be effectively removed before the code RegExp()

even runs



When you double them, the string parsing step will leave behind a backslash. Then the parser RegExp()

will see a single backslash before "b" and do it right.

+5


source


You need to double the backslash in your JavaScript string, or encode the Backspace character:



.match(RegExp('\\b('+array.join('|')+')\\b','i'))

      

+3


source


You need to get away from twice \b

because it has special meaning on the lines:

.match(RegExp('\\b('+array.join('|')+')\\b','i'))

      

+1


source


\b

is an escape sequence within string literals (see table 2.1 on this page ). You should avoid this by adding one extra slash:

.match(RegExp('\\b('+array.join('|')+')\\b','i'))

      

You don't need to hide \b

when using inside a regex literal:

/\b(english|eng|en)\b/i

      

+1


source







All Articles