Grep regex lookahead or start of line (or lookbehind or end of line)

I want to match a string that can contain the character type before the match, or the match can start at the beginning of the line (same for the end of the line).

For a minimal example, consider text n.b.

that I would like to match either at the beginning of a line or at the end of a line, or between two non-word characters, or some combination. The easiest way to do this is to use word boundaries ( \bn\.b\.\b

), but that doesn't match; similar cases happen for other desired matches with non-word characters in them.

I am currently using (^|[^\w])n\.b\.([^\w]|$)

, which works satisfactorily, but will also match non-word characters (like a dash) that appear immediately before and after the word, if available. I do this in grep, so when I can easily pipe the output to sed, I use the grep option --color

, which is disabled when I connect to another command (for obvious reasons).

EDIT: Option \K

(i.e. (\K^|[^\w])n\.b\.(\K[^\w]|$)

, it seems to work, but also discards the color in match in the output file. While I could call the helper tools again, I would love it if it was a quick and easy solution.

EDIT: I misunderstood the operator \K

; it just removes all text from the match before it was used. Unsurprisingly, this did not result in output color.

+3


source to share


2 answers


If you are using grep, you must use the option -P

, or lookarounds and \K

will throw errors. This means that you also have negative images at your disposal. Here's a simpler version of your regex:

(?<!\w)n\.b\.(?!\w)

      



Also, keep in mind that (?<=...)

both (?<!...)

lookbehinds and (?=...)

u (?!...)

are lookaheads . The wording of your title suggests that you may have gotten these confusions, a common beginner mistake.

+5


source


Apparently a line start match is possible inside lookahead / lookbehinds; obvious solution then (?<=^|[^\w])n\.b\.(?=[^\w]|$)

.



+2


source







All Articles