Grako rule priority question

Question

Grako rule priority question

I am reworking minilanguage I originally built in Perl (see Chessa # on github ), but I run into a number of problems when I go to apply semantics.

Here's the grammar :

(* integers *)
DEC = /([1-9][0-9]*|0+)/;
int = /(0b[01]+|0o[0-7]+|0x[0-9a-fA-F]+)/ | DEC;
(* floats *)
pointfloat = /([0-9]*\.[0-9]+|[0-9]+\.)/;
expfloat = /([0-9]+\.?|[0-9]*\.)[eE][+-]?[0-9]+/;
float = pointfloat | expfloat;
list = '[' @+:atom {',' @+:atom}* ']';
(* atoms *)
identifier = /[_a-zA-Z][_a-zA-Z0-9]*/;
symbol = int        |
         float      |
         identifier |
         list;
(* functions *)
arglist = @+:atom {',' @+:atom}*;
function = identifier '(' [arglist] ')';
atom = function | symbol;
prec8 = '(' atom ')' | atom;
prec7 = [('+' | '-' | '~')] prec8;
prec6 = prec7 ['!'];
prec5 = [prec6 '**'] prec6;
prec4 = [prec5 ('*' | '/' | '%' | 'd')] prec5;
prec3 = [prec4 ('+' | '-')] prec4;
(* <| and >| are rotate-left and rotate-right, respectively. They assume the nearest C size. *)
prec2 = [prec3 ('<<' | '>>' | '<|' | '>|')] prec3;
prec1 = [prec2 ('&' | '|' | '^')] prec2;
expr = prec1 $;

The problem I'm running into is that the operator d

gets pulled into the identifier rule when there are no spaces between the operator and any subsequent alphanumeric strings. Although the grammar is LL (2) itself, I don't understand where the problem is here.

For example, 4d6

stops the parser because it is interpreted as 4

d6

, where d6

is an identifier. What should happen is that it is interpreted as 4

d

6

, with the operator d

being the operator. This is indeed the case in the LL parser.

A possible solution is to disable d

from the beginning of the identifier, but that would not allow such features to be called drop

as such.

+3

python python-3.x peg ebnf grako

Aerdan 24 Sep 14 at 12:05

source to share

2 answers

In Perl, you can use Marpa , a generic BNF parser that supports generic precedence with associativity (and many others) from a field, for example

:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
    Number bless => primary
    | '(' Expression ')' bless => paren assoc => group
   || Expression '**' Expression bless => exponentiate assoc => right
   || Expression '*' Expression bless => multiply
    | Expression '/' Expression bless => divide
   || Expression '+' Expression bless => add
    | Expression '-' Expression bless => subtract

Full working example here . As for programming languages, there is a C parser based on Marpa .

Hope it helps.

+3

rns 26 Sep 14 at 16:44

source to share

Apalala · Accepted Answer · 2014-09-25T21:14:01+0000

The problem with your example is that Graco has the feature turned nameguard

on by default and this will not allow analysis only d

when d6

in front.

To disable this feature, create your own Buffer

and pass it to the generated parser instance:

from grako.buffering import Buffer
from myparser import MyParser

# get the text
parser = MyParser()
parser.parse(Buffer(text, nameguard=False), 'expre')

Grako version content in Bitbucket repository adds command line parameter --no-nameguard

for generated parsers.

Grako rule priority question

More articles: