Word-boundary problem (\ b)
I have an array of keywords and I want to know if at least one of the keywords was found in some string that was posted. I also want to be absolutely sure that it is the keyword that was matched and not something that looks a lot like the word.
Say, for example, what our keywords are [English, Eng, En]
because we are looking for some variations of the English language.
Now say that the user input i h8 eng class
or something equally provocative and illiterate - then it should be matched eng
. It also must not match a type word england
or some odd thing chen
, even though it got a bit en
.
So, in my endless lack of wisdom, I figured I could do something along these lines to match one of my array elements using input:
.match(RegExp('\b('+array.join('|')+')\b','i'))
Given that the regex will search for matches from an array is now represented as (English|Eng|En)
, then see if there were zero-width word boundaries on both sides.
source to share
You need a double backslash.
When you create a regex with a constructor RegExp()
, you are passing in a string. The string string syntax also treats backslashes as a metacharacter, for quoting quotes, etc. This way the backslash will be effectively removed before the code RegExp()
even runs
When you double them, the string parsing step will leave behind a backslash. Then the parser RegExp()
will see a single backslash before "b" and do it right.
source to share
\b
is an escape sequence within string literals (see table 2.1 on this page ). You should avoid this by adding one extra slash:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You don't need to hide \b
when using inside a regex literal:
/\b(english|eng|en)\b/i
source to share