LALR grammar, trailing comma and multi-line list assignment

I am trying to create a LALR grammar for a very simple language of tasks. For example:

foo = "bar"
bar = 42

      

The language must also handle a list of values, for example:

foo = 1, 2, 3

      

But I also want to process the list on multiple lines:

foo = 1, 2
      3, 4

      

Trailing comma (for singlets and language flexibility):

foo = 1,
foo = 1, 2,

      

And obviously both at the same time:

foo = 1,
      2,
      3,

      

I can write a grammar with a trailing comma or a multi-line list, but not for both at the same time.

My grammar looks like this:

content : content '\n'
        : content assignment
        | <empty>

assignment : NAME '=' value
           | NAME '=' list

value : TEXT
      | NUMBER

list : ???

      

Note. I need "\ n" in grammar to disallow code like this:

foo
=
"bar"

      

Thanks in advance,

Antoine.

+3


source to share


2 answers


It looks like your config language is essentially free form. I would forget about making the new line a symbol in grammar. If you want to limit the newline, you can hack it as some lexical anchoring rules, whereby the parser calls a small API added to the lexer to tell the lexer where it is in the grammar and the lexer can decide whether to accept newlines or reject them with an error.

Try this grammar.

%token NAME NUMBER TEXT

%%

config_file : assignments
            | /* empty */
            ;

assignments : assignment
            | assignments assignment
            ;

assignment : NAME '=' values comma_opt

comma_opt : ',' | /* empty */;

values : value
       | values ',' value
       ;

value : NUMBER | TEXT ;

      

He builds for me without conflict. I haven't run it, but the random read y.output

looks like the transition is normal.

This grammar of course allows



foo = 1, 2, 3, bar = 4, 5, 6 xyzzy = 7 answer = 42

      

without additional linking to a lexer.

Your restrictions mean that newlines are only allowed in values. The two NAME tokens should never appear on the same line, and the = character should appear on the same line as the previous name (and probably the first value should also be).

Basically, when the parser scans the first value, it can say that the lexer values ​​are "now being checked," enable newlines . ”And then when minified, comma_opt

this can be turned off again. When comma_opt

decremented, the lexer may have already read the NAME

next token assignment, but it can check that it is on a different line from the previous one NAME

You want your lexer to keep track of the exact number of lines anyway.

+2


source


I don't really have much experience with this, but will this work?



listvalue : value ,
          | value '\n'
          | value , '\n'

list : listvalue list

      

0


source







All Articles