LALR grammar, trailing comma and multi-line list assignment

Question

LALR grammar, trailing comma and multi-line list assignment

I am trying to create a LALR grammar for a very simple language of tasks. For example:

foo = "bar"
bar = 42

The language must also handle a list of values, for example:

foo = 1, 2, 3

But I also want to process the list on multiple lines:

foo = 1, 2
      3, 4

Trailing comma (for singlets and language flexibility):

foo = 1,
foo = 1, 2,

And obviously both at the same time:

foo = 1,
      2,
      3,

I can write a grammar with a trailing comma or a multi-line list, but not for both at the same time.

My grammar looks like this:

content : content '\n'
        : content assignment
        | <empty>

assignment : NAME '=' value
           | NAME '=' list

value : TEXT
      | NUMBER

list : ???

Note. I need "\ n" in grammar to disallow code like this:

foo
=
"bar"

Thanks in advance,

Antoine.

+3

python yacc ply grammar

Antoine 13 Mar 12 at 22:17

source to share

2 answers

I don't really have much experience with this, but will this work?

listvalue : value ,
          | value '\n'
          | value , '\n'

list : listvalue list

0

aquavitae 14 Mar 12 at 5:43

source to share

Kaz · Accepted Answer · 2012-03-15T05:12:11+0000

It looks like your config language is essentially free form. I would forget about making the new line a symbol in grammar. If you want to limit the newline, you can hack it as some lexical anchoring rules, whereby the parser calls a small API added to the lexer to tell the lexer where it is in the grammar and the lexer can decide whether to accept newlines or reject them with an error.

Try this grammar.

%token NAME NUMBER TEXT

%%

config_file : assignments
            | /* empty */
            ;

assignments : assignment
            | assignments assignment
            ;

assignment : NAME '=' values comma_opt

comma_opt : ',' | /* empty */;

values : value
       | values ',' value
       ;

value : NUMBER | TEXT ;

He builds for me without conflict. I haven't run it, but the random read y.output

looks like the transition is normal.

This grammar of course allows

foo = 1, 2, 3, bar = 4, 5, 6 xyzzy = 7 answer = 42

without additional linking to a lexer.

Your restrictions mean that newlines are only allowed in values. The two NAME tokens should never appear on the same line, and the = character should appear on the same line as the previous name (and probably the first value should also be).

Basically, when the parser scans the first value, it can say that the lexer values are "now being checked," enable newlines . ”And then when minified, comma_opt

this can be turned off again. When comma_opt

decremented, the lexer may have already read the NAME

next token assignment, but it can check that it is on a different line from the previous one NAME

You want your lexer to keep track of the exact number of lines anyway.

LALR grammar, trailing comma and multi-line list assignment

More articles: