Is the result of a greedy operator different in positive and negative lookahead?

Question

Is the result of a greedy operator different in positive and negative lookahead?

I am confused about the fact that the greedy operator looks positively and negatively towards the future.

script for positive viewing

foreach (<DATA>){
$_ = m/AAA.+(?=BBB)/g;
print "$&\n";
} 
__DATA__
AAA 1121 BBB
AAA 443  CCC
AAA 4431 BBB
ABC 321  EACA
AAA 321  BBB
ACD 431 MAKN
AAA 751  ABC

It outputs

AAA 1121 

AAA 4431 

AAA 321

Negative Lookahead

foreach (<DATA>){
$_ = m/AAA.+(?!BBB)/g; 
print "$&\n";
}

It outputs

AAA 1121 BBB
AAA 443  CCC
AAA 4431 BBB

AAA 321  BBB

AAA 751  ABC

negative lookahed

Do not consider when doing (?!BBB)

. Because I am using the greedy operator preceding (?!BBB)

. In this case, the postive look for greed statement is looking (?=BBB)

. Is that why it gives a different result?

I can easily get the OP by code $_ = m/AAA\s\d+(?!.+BBB)/g;

.

But I don't know what is the execution of my code?

+3

regex perl

mkHun 12 jan. 15 at 11:27

source to share

2 answers

It makes no difference how it works in your two cases, but .+

is greedy in both cases.

When matched AAA.+(?=BBB)

with, AAA 1121 BBB

most .+

may match the beginning after AAA

<spc>1121<spc>

. Anything longer will result in an error (?=BBB)

.

When matched AAA.+(?!BBB)

against, AAA 1121 BBB

most .+

can match the beginning after AAA

<spc>1121<spc>BBB

. As the rest of the string, it can no longer match anything else.

Note that it does not follow at the end of the line BBB

, so it (?!BBB)

matches at the end of the line.

(?:(?!STRING).)*

matches STRING

because it [^CHAR]*

matches CHAR

.

I would go with

say $1 if /^(AAA\s+\S+)\s+(?:(?!BBB)\s)*\z/;

On the other hand, I would go with

my @F = split;
say "$F[0] $F[1]" if $F[0] eq 'AAA' && $F[2] ne 'BBB';

+2

ikegami 12 jan. 15 at 14:06

source to share

Lucas Trzesniewski · Accepted Answer · 2015-01-12T13:05:40+0000

Consider the first example:

AAA 1121 BBB
\_/\_______/^
 |     |    |
 |     |    +--- this (the empty string right there) satisfies (?!BBB)
 |     |
 |     +-------- matched by .+
 |     
 +-------------- matched by AAA

This is because the greedy .+

consumes 1121 BBB

including BBB

. After it consumes the rest of the string, it (?!BBB)

checks for the remaining empty string. And this empty string satisfies (?!BBB)

because it shouldn't BBB

?

Negative view

The algorithm is performed as follows. ^

- current position (current position in the line and current position in the template_(view)).

The initial state:

AAA 1121 BBB          AAA.+(?!BBB)
^                     ^

Compliant AAA

AAA 1121 BBB          AAA.+(?!BBB)
   ^                     ^

Compliant .+

AAA 1121 BBB          AAA.+(?!BBB)
            ^              ^

Check (?!BBB)

AAA 1121 BBB          AAA.+(?!BBB)
            ^                     ^

No BBB

matches this position => Success!

AAA 1121 BBB
\__________/

Positive outlook

Now let's see why exactly the AAA.+(?=BBB)

match gives:

The initial state:

AAA 1121 BBB          AAA.+(?=BBB)
^                     ^

Compliant AAA

AAA 1121 BBB          AAA.+(?=BBB)
   ^                     ^

Compliant .+

AAA 1121 BBB          AAA.+(?=BBB)
            ^              ^

Check (?=BBB)

AAA 1121 BBB          AAA.+(?=BBB)
            ^              ^

No BBB

matches this position => Backtrack (consumes less char by .+

)

Check (?=BBB)

AAA 1121 BBB          AAA.+(?=BBB)
           ^               ^

No BBB

matches this position => Backtrack (consumes less char by .+

)

Check (?=BBB)

AAA 1121 BBB          AAA.+(?=BBB)
          ^                ^

No BBB

matches this position => Backtrack (consumes less char by .+

)

Check (?=BBB)

AAA 1121 BBB          AAA.+(?=BBB)
         ^                        ^

We have a match BBB

in this position => Success!

AAA 1121 BBB
\_______/

Is the result of a greedy operator different in positive and negative lookahead?

Negative view

Positive outlook

More articles: