Grep regex lookahead or start of line (or lookbehind or end of line)
I want to match a string that can contain the character type before the match, or the match can start at the beginning of the line (same for the end of the line).
For a minimal example, consider text n.b.
that I would like to match either at the beginning of a line or at the end of a line, or between two non-word characters, or some combination. The easiest way to do this is to use word boundaries ( \bn\.b\.\b
), but that doesn't match; similar cases happen for other desired matches with non-word characters in them.
I am currently using (^|[^\w])n\.b\.([^\w]|$)
, which works satisfactorily, but will also match non-word characters (like a dash) that appear immediately before and after the word, if available. I do this in grep, so when I can easily pipe the output to sed, I use the grep option --color
, which is disabled when I connect to another command (for obvious reasons).
EDIT: Option \K
(i.e. (\K^|[^\w])n\.b\.(\K[^\w]|$)
, it seems to work, but also discards the color in match in the output file. While I could call the helper tools again, I would love it if it was a quick and easy solution.
EDIT: I misunderstood the operator \K
; it just removes all text from the match before it was used. Unsurprisingly, this did not result in output color.
source to share
If you are using grep, you must use the option -P
, or lookarounds and \K
will throw errors. This means that you also have negative images at your disposal. Here's a simpler version of your regex:
(?<!\w)n\.b\.(?!\w)
Also, keep in mind that (?<=...)
both (?<!...)
lookbehinds and (?=...)
u (?!...)
are lookaheads . The wording of your title suggests that you may have gotten these confusions, a common beginner mistake.
source to share