Unfixed lookbehind length

I'm trying to write a regex according to an empty string, preceded by either a vowel and ck

vowel and any other consonant (this is a call from CodeGolf). So far I have come up with (?<=[aeiou](?:ck|[^aeiou]))

. The problem is that it will not match after ck

from [^aeiou]

there. He is always the same after a c

when, say nickel

: nic-kel

. Why is this happening?

+3


source to share


2 answers


I think you need

(?<=[aeiou](?:(?!ck)[a-zA-Z-[aeiou]]|ck))

      

See demo regex .



Lookbehind is an opaque pattern that - being unanchored - is checked at every place in the line. Since you are allowing location matching with a preceding vowel and any char but a vowel, you will get matches between c

and k

and k

and e

.

If you want to allow a comparison of the position after a vowel followed by a consonant, but not when there ck

clustet, then adjust the pattern consonant with (?!ck)

a negative look (?!ck)

. The consonant must be consistent with [a-zA-Z-[aeiouAEIOU]]

. This corresponds to any ASCII letter but a

, e

, i

, o

, u

(case insensitive).

+2


source


There is nothing wrong with your regex, just add a simple one (?!ck)


before the consonants.

(?<=([aeiou](?:ck|(?!ck)[^aeiou])))

 (?<=
      (                     # (1 start)
           [aeiou] 
           (?:
                ck
             |  (?! ck )    # <== here
                [^aeiou] 
           )
      )                     # (1 end)
 )

      

But you might want to know the reason.

The reason is that on the length variable lookbehinds in C #
it starts at the point between the character.

At any point, and only at this point, it looks reversed for the match.
Anything ahead is not allowed in the match.

See how they do it:




Using a regular expression (?<=[aeiou](?:ck|[^aeiou]))

i

<= absolute position ck

, then look back

Finds [aeiou]

.

Crash ck

and[^aeiou]

Move forward (to the right) 1 position, then look back

ic

<= absolute position k

Failed ck

BUT, matches 'c' with [^aeiou]




It is important to remember that he cannot ignore his own two basic rules.

Their rules state that he must take the first match
he finds , and he must find it in the opposite direction between the characters.

So, it's clear that if finds and matches this ic

<= absolute position k


first.

Each statement contains its own relative frame position, which is equal regardless of its surrounding code.
This position is dynamic (changing) and its start is the current position of the
caller (even another statement).

So, when calling a statement within a statement, it just takes the parent's current position and validates it from that position, internally keeping its own current position.




Let's see what the fix does (?<=[aeiou](?:ck|(?!ck)[^aeiou]))

i

<= absolute position ck

, then look back

Finds [aeiou]

.

Crash ck

and[^aeiou]

Move forward (to the right) 1 position, then look back

ic

<= absolute position k

Note that internally it matches the forward and the relative position is now here => ck


because it already matches i

and checks it after.

Fails ck

because 'k' expands 1 character beyond its absolute position

HOWEVER, it can match 'c' with [^aeiou]

without going outside the absolute position

To stop STOP, you just need (?!ck)

to[^aeiou]

At this point (?!ck)

, this relative position is transmitted and is not limited to the absolute position of the callers.

He sees that he is ck

looking forward to and returns a false condition,
violating the statement.

Move forward (to the right) 1 position, then look back

ick

<= absolute position, then look back

This time he dials up a match on ick




Demo

Target string

nickel : nic-ikel

      

FROM#

string Stxt = "nickel : nic-ikel";
var RxR = new Regex(@"(?<=([aeiou](?:ck|(?!ck)[^aeiou])))");

foreach (Match match in RxR.Matches(Stxt))
    Console.WriteLine("{0}", match.Groups[1].Value);

      

Output

ick
el
ic
ik
el

      

+1


source







All Articles