Unfixed lookbehind length
I'm trying to write a regex according to an empty string, preceded by either a vowel and ck
vowel and any other consonant (this is a call from CodeGolf). So far I have come up with (?<=[aeiou](?:ck|[^aeiou]))
. The problem is that it will not match after ck
from [^aeiou]
there. He is always the same after a c
when, say nickel
: nic-kel
. Why is this happening?
source to share
I think you need
(?<=[aeiou](?:(?!ck)[a-zA-Z-[aeiou]]|ck))
See demo regex .
Lookbehind is an opaque pattern that - being unanchored - is checked at every place in the line. Since you are allowing location matching with a preceding vowel and any char but a vowel, you will get matches between c
and k
and k
and e
.
If you want to allow a comparison of the position after a vowel followed by a consonant, but not when there ck
clustet, then adjust the pattern consonant with (?!ck)
a negative look (?!ck)
. The consonant must be consistent with [a-zA-Z-[aeiouAEIOU]]
. This corresponds to any ASCII letter but a
, e
, i
, o
, u
(case insensitive).
source to share
There is nothing wrong with your regex, just add a simple one (?!ck)
before the consonants.
(?<=([aeiou](?:ck|(?!ck)[^aeiou])))
(?<=
( # (1 start)
[aeiou]
(?:
ck
| (?! ck ) # <== here
[^aeiou]
)
) # (1 end)
)
But you might want to know the reason.
The reason is that on the length variable lookbehinds in C #
it starts at the point between the character.
At any point, and only at this point, it looks reversed for the match.
Anything ahead is not allowed in the match.
See how they do it:
Using a regular expression (?<=[aeiou](?:ck|[^aeiou]))
i
<= absolute position ck
, then look back
Finds [aeiou]
.
Crash ck
and[^aeiou]
Move forward (to the right) 1 position, then look back
ic
<= absolute position k
Failed ck
BUT, matches 'c' with [^aeiou]
It is important to remember that he cannot ignore his own two basic rules.
Their rules state that he must take the first match
he finds , and he must find it in the opposite direction between the characters.
So, it's clear that if finds and matches this ic
<= absolute position k
first.
Each statement contains its own relative frame position, which is equal regardless of its surrounding code.
This position is dynamic (changing) and its start is the current position of the
caller (even another statement).
So, when calling a statement within a statement, it just takes the parent's current position and validates it from that position, internally keeping its own current position.
Let's see what the fix does (?<=[aeiou](?:ck|(?!ck)[^aeiou]))
i
<= absolute position ck
, then look back
Finds [aeiou]
.
Crash ck
and[^aeiou]
Move forward (to the right) 1 position, then look back
ic
<= absolute position k
Note that internally it matches the forward and the relative position is now here => ck
because it already matches i
and checks it after.
Fails ck
because 'k' expands 1 character beyond its absolute position
HOWEVER, it can match 'c' with [^aeiou]
without going outside the absolute position
To stop STOP, you just need (?!ck)
to[^aeiou]
At this point (?!ck)
, this relative position is transmitted and is not limited to the absolute position of the callers.
He sees that he is ck
looking forward to and returns a false condition,
violating the statement.
Move forward (to the right) 1 position, then look back
ick
<= absolute position, then look back
This time he dials up a match on ick
Demo
Target string
nickel : nic-ikel
FROM#
string Stxt = "nickel : nic-ikel";
var RxR = new Regex(@"(?<=([aeiou](?:ck|(?!ck)[^aeiou])))");
foreach (Match match in RxR.Matches(Stxt))
Console.WriteLine("{0}", match.Groups[1].Value);
Output
ick
el
ic
ik
el
source to share