Grako rule priority question
I am reworking minilanguage I originally built in Perl (see Chessa # on github ), but I run into a number of problems when I go to apply semantics.
(* integers *)
DEC = /([1-9][0-9]*|0+)/;
int = /(0b[01]+|0o[0-7]+|0x[0-9a-fA-F]+)/ | DEC;
(* floats *)
pointfloat = /([0-9]*\.[0-9]+|[0-9]+\.)/;
expfloat = /([0-9]+\.?|[0-9]*\.)[eE][+-]?[0-9]+/;
float = pointfloat | expfloat;
list = '[' @+:atom {',' @+:atom}* ']';
(* atoms *)
identifier = /[_a-zA-Z][_a-zA-Z0-9]*/;
symbol = int |
float |
identifier |
list;
(* functions *)
arglist = @+:atom {',' @+:atom}*;
function = identifier '(' [arglist] ')';
atom = function | symbol;
prec8 = '(' atom ')' | atom;
prec7 = [('+' | '-' | '~')] prec8;
prec6 = prec7 ['!'];
prec5 = [prec6 '**'] prec6;
prec4 = [prec5 ('*' | '/' | '%' | 'd')] prec5;
prec3 = [prec4 ('+' | '-')] prec4;
(* <| and >| are rotate-left and rotate-right, respectively. They assume the nearest C size. *)
prec2 = [prec3 ('<<' | '>>' | '<|' | '>|')] prec3;
prec1 = [prec2 ('&' | '|' | '^')] prec2;
expr = prec1 $;
The problem I'm running into is that the operator d
gets pulled into the identifier rule when there are no spaces between the operator and any subsequent alphanumeric strings. Although the grammar is LL (2) itself, I don't understand where the problem is here.
For example, 4d6
stops the parser because it is interpreted as 4
d6
, where d6
is an identifier. What should happen is that it is interpreted as 4
d
6
, with the operator d
being the operator. This is indeed the case in the LL parser.
A possible solution is to disable d
from the beginning of the identifier, but that would not allow such features to be called drop
as such.
source to share
The problem with your example is that Graco has the feature turned nameguard
on by default and this will not allow analysis only d
when d6
in front.
To disable this feature, create your own Buffer
and pass it to the generated parser instance:
from grako.buffering import Buffer
from myparser import MyParser
# get the text
parser = MyParser()
parser.parse(Buffer(text, nameguard=False), 'expre')
Grako version content in Bitbucket repository adds command line parameter --no-nameguard
for generated parsers.
source to share
In Perl, you can use Marpa , a generic BNF parser that supports generic precedence with associativity (and many others) from a field, for example
:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
Number bless => primary
| '(' Expression ')' bless => paren assoc => group
|| Expression '**' Expression bless => exponentiate assoc => right
|| Expression '*' Expression bless => multiply
| Expression '/' Expression bless => divide
|| Expression '+' Expression bless => add
| Expression '-' Expression bless => subtract
Full working example here . As for programming languages, there is a C parser based on Marpa .
Hope it helps.
source to share