Is the result of a greedy operator different in positive and negative lookahead?
I am confused about the fact that the greedy operator looks positively and negatively towards the future.
script for positive viewing
foreach (<DATA>){
$_ = m/AAA.+(?=BBB)/g;
print "$&\n";
}
__DATA__
AAA 1121 BBB
AAA 443 CCC
AAA 4431 BBB
ABC 321 EACA
AAA 321 BBB
ACD 431 MAKN
AAA 751 ABC
It outputs
AAA 1121
AAA 4431
AAA 321
Negative Lookahead
foreach (<DATA>){
$_ = m/AAA.+(?!BBB)/g;
print "$&\n";
}
It outputs
AAA 1121 BBB
AAA 443 CCC
AAA 4431 BBB
AAA 321 BBB
AAA 751 ABC
negative lookahed
Do not consider when doing (?!BBB)
. Because I am using the greedy operator preceding (?!BBB)
. In this case, the postive look for greed statement is looking (?=BBB)
. Is that why it gives a different result?
I can easily get the OP by code $_ = m/AAA\s\d+(?!.+BBB)/g;
.
But I don't know what is the execution of my code?
source to share
Consider the first example:
AAA 1121 BBB
\_/\_______/^
| | |
| | +--- this (the empty string right there) satisfies (?!BBB)
| |
| +-------- matched by .+
|
+-------------- matched by AAA
This is because the greedy .+
consumes 1121 BBB
including BBB
. After it consumes the rest of the string, it (?!BBB)
checks for the remaining empty string. And this empty string satisfies (?!BBB)
because it shouldn't BBB
?
Negative view
The algorithm is performed as follows. ^
- current position (current position in the line and current position in the template (view)).
-
The initial state:
AAA 1121 BBB AAA.+(?!BBB) ^ ^
-
Compliant
AAA
AAA 1121 BBB AAA.+(?!BBB) ^ ^
-
Compliant
.+
AAA 1121 BBB AAA.+(?!BBB) ^ ^
-
Check
(?!BBB)
AAA 1121 BBB AAA.+(?!BBB) ^ ^
-
No
BBB
matches this position => Success!AAA 1121 BBB \__________/
Positive outlook
Now let's see why exactly the AAA.+(?=BBB)
match gives:
-
The initial state:
AAA 1121 BBB AAA.+(?=BBB) ^ ^
-
Compliant
AAA
AAA 1121 BBB AAA.+(?=BBB) ^ ^
-
Compliant
.+
AAA 1121 BBB AAA.+(?=BBB) ^ ^
-
Check
(?=BBB)
AAA 1121 BBB AAA.+(?=BBB) ^ ^
No
BBB
matches this position => Backtrack (consumes less char by.+
) -
Check
(?=BBB)
AAA 1121 BBB AAA.+(?=BBB) ^ ^
No
BBB
matches this position => Backtrack (consumes less char by.+
) -
Check
(?=BBB)
AAA 1121 BBB AAA.+(?=BBB) ^ ^
No
BBB
matches this position => Backtrack (consumes less char by.+
) -
Check
(?=BBB)
AAA 1121 BBB AAA.+(?=BBB) ^ ^
-
We have a match
BBB
in this position => Success!AAA 1121 BBB \_______/
source to share
It makes no difference how it works in your two cases, but .+
is greedy in both cases.
When matched AAA.+(?=BBB)
with, AAA 1121 BBB
most .+
may match the beginning after AAA
<spc>1121<spc>
. Anything longer will result in an error (?=BBB)
.
When matched AAA.+(?!BBB)
against, AAA 1121 BBB
most .+
can match the beginning after AAA
<spc>1121<spc>BBB
. As the rest of the string, it can no longer match anything else.
Note that it does not follow at the end of the line BBB
, so it (?!BBB)
matches at the end of the line.
(?:(?!STRING).)*
matches STRING
because it [^CHAR]*
matches CHAR
.
I would go with
say $1 if /^(AAA\s+\S+)\s+(?:(?!BBB)\s)*\z/;
On the other hand, I would go with
my @F = split;
say "$F[0] $F[1]" if $F[0] eq 'AAA' && $F[2] ne 'BBB';
source to share