RegEx lookahead, but not immediately after

I am trying to match terms like Dutch ge-berg-te. berg

is a noun in itself, and ge ... te is a workaround, i.e. geberg

doesn't exist and doesn't matter bergte

. gebergte

does. I want RegEx to match berg

or gebergte

by working with search. I thought it would work

\b(?i)(ge(?=te))?berg(te)?\b

      

But this is not the case. I am guessing because the lookahead only checks the following characters, not characters. Is there a way to match characters to lookahead with the restriction that those characters must be right after the others?

Allowed matches:

  • Berg
  • berg
  • Gebergte
  • gebergte

Invalid matches :

  • Geberg
  • geberg
  • Bergte
  • bergte

ge- / Ge- and -te should always appear together. Please note that I want to try this with lookahead. I know it can be done in an easier way, but I want to see if it can be methodologically done something like this.

+3


source to share


1 answer


Here's one regex without reverse lookup:

\b(berg|gebergte)\b

      

Use it with the i

(ignore case) flag . This regex uses alternation and word boundary to find complete berg

OR words gebergte

.

Demo version of RegEx



Regular regex:

(?<=\bge)berg(?=te\b)|\bberg\b

      

This regex used lookahead and lookbehind for lookup berg

, preceded ge

and then te

. Alternatively, it matches a full word berg

using a word boundary helper \b

, which is also a 0-width helper, such as anchors ^

and $

.

+1


source







All Articles