LALR grammar, trailing comma and multi-line list assignment
I am trying to create a LALR grammar for a very simple language of tasks. For example:
foo = "bar"
bar = 42
The language must also handle a list of values, for example:
foo = 1, 2, 3
But I also want to process the list on multiple lines:
foo = 1, 2 3, 4
Trailing comma (for singlets and language flexibility):
foo = 1,
foo = 1, 2,
And obviously both at the same time:
foo = 1, 2, 3,
I can write a grammar with a trailing comma or a multi-line list, but not for both at the same time.
My grammar looks like this:
content : content '\n'
: content assignment
| <empty>
assignment : NAME '=' value
| NAME '=' list
value : TEXT
| NUMBER
list : ???
Note. I need "\ n" in grammar to disallow code like this:
foo
=
"bar"
Thanks in advance,
Antoine.
source to share
It looks like your config language is essentially free form. I would forget about making the new line a symbol in grammar. If you want to limit the newline, you can hack it as some lexical anchoring rules, whereby the parser calls a small API added to the lexer to tell the lexer where it is in the grammar and the lexer can decide whether to accept newlines or reject them with an error.
Try this grammar.
%token NAME NUMBER TEXT
%%
config_file : assignments
| /* empty */
;
assignments : assignment
| assignments assignment
;
assignment : NAME '=' values comma_opt
comma_opt : ',' | /* empty */;
values : value
| values ',' value
;
value : NUMBER | TEXT ;
He builds for me without conflict. I haven't run it, but the random read y.output
looks like the transition is normal.
This grammar of course allows
foo = 1, 2, 3, bar = 4, 5, 6 xyzzy = 7 answer = 42
without additional linking to a lexer.
Your restrictions mean that newlines are only allowed in values. The two NAME tokens should never appear on the same line, and the = character should appear on the same line as the previous name (and probably the first value should also be).
Basically, when the parser scans the first value, it can say that the lexer values are "now being checked," enable newlines . ”And then when minified, comma_opt
this can be turned off again. When comma_opt
decremented, the lexer may have already read the NAME
next token assignment, but it can check that it is on a different line from the previous one NAME
You want your lexer to keep track of the exact number of lines anyway.
source to share