How to properly look with LPEG

To match a line starting with dog

and then cat

(but not consuming cat

) this works:

local lpeg = require 'lpeg'
local str1 = 'dogcat'
local patt1 = lpeg.C(lpeg.P('dog')) * #lpeg.P('cat')
print(lpeg.match(patt1, str1))

      

Output: dog

To match a line starting with dog

, it is followed by any sequences of characters, and then cat

(but does not consume it), as a regex lookup (dog.+?)(?=cat)

, I tried this:

local str2 = 'dog and cat'
local patt2 = lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1) * #lpeg.P("cat")
print(lpeg.match(patt2, str2))

      

My expected result dog and

, but it returns nil

.

If I remove part of the overview (i.e. using a pattern lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1)

) it can match the whole line successfully. That means the part * lpeg.P(1) ^ 1

correctly matches any sequence of characters, doesn't it?

How to fix it?

+3


source to share


1 answer


You need to undo "cat" at every position in the lookahead that might match:

local patt2 = lpeg.C(lpeg.P"dog" * (lpeg.P(1)-lpeg.P"cat") ^ 1) * #lpeg.P"cat"

      

I find it advisable to include the debugger I was working on ( pegdebug ) as it helps in cases like this. Here is the output it generates for the original lpeg expression:

+   Exp 1   "d"
 +  Dog 1   "d"
 =  Dog 1-3 "dog"
 +  Separator   4   " "
 =  Separator   4-11    " and cat"
 +  Cat 12  ""
 -  Cat 12
-   Exp 1

      

You can see that the Separator expression "eats" all characters, including "cat", and there is nothing to match with P"cat"

.



The result for the modified expression looks like this:

+   Exp 1   "d"
 +  Dog 1   "d"
 =  Dog 1-3 "dog"
 +  Separator   4   " "
 =  Separator   4-8 " and "
 +  Cat 9   "c"
 =  Cat 9-11    "cat"
=   Exp 1-8 "dog and "
/   Dog 1   0   
/   Separator   4   0   
/   Exp 1   1   "dog and "

      

Here's the complete script:

require 'lpeg'
local peg = require 'pegdebug'
local str2 = 'dog and cat'
local patt2 = lpeg.P(peg.trace { "Exp";
  Exp = lpeg.C(lpeg.V"Dog" * lpeg.V"Separator") * #lpeg.V"Cat";
  Cat = lpeg.P("cat");
  Dog = lpeg.P("dog");
  Separator = (lpeg.P(1) - lpeg.P("cat"))^1;
})
print(lpeg.match(patt2, str2))

      

+3


source







All Articles