RegEx lookahead, but not immediately after
I am trying to match terms like Dutch ge-berg-te. berg
is a noun in itself, and ge ... te is a workaround, i.e. geberg
doesn't exist and doesn't matter bergte
. gebergte
does. I want RegEx to match berg
or gebergte
by working with search. I thought it would work
\b(?i)(ge(?=te))?berg(te)?\b
But this is not the case. I am guessing because the lookahead only checks the following characters, not characters. Is there a way to match characters to lookahead with the restriction that those characters must be right after the others?
Allowed matches:
- Berg
- berg
- Gebergte
- gebergte
Invalid matches :
- Geberg
- geberg
- Bergte
- bergte
ge- / Ge- and -te should always appear together. Please note that I want to try this with lookahead. I know it can be done in an easier way, but I want to see if it can be methodologically done something like this.
source to share
Here's one regex without reverse lookup:
\b(berg|gebergte)\b
Use it with the i
(ignore case) flag . This regex uses alternation and word boundary to find complete berg
OR words gebergte
.
Demo version of RegEx
Regular regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used lookahead and lookbehind for lookup berg
, preceded ge
and then te
. Alternatively, it matches a full word berg
using a word boundary helper \b
, which is also a 0-width helper, such as anchors ^
and $
.
source to share