Bison shift / reduce conflict - tiger compiler
I wrote a yacc file according to the Tiger book (Appendix A, Tiger manual).
But there are still shift / reduction conflicts. I don't know how to resolve these conflicts.
% yacc --version
bison (GNU Bison) 3.0.2
You can use this cmd to reproduce the problem:
% yacc -dvt tiger.y
tiger.y: warning: 37 shift/reduce conflicts [-Wconflicts-sr]
% cat tiger.y
:
%{
#include <stdio.h>
//#include "util.h"
//#include "errormsg.h"
int yylex(void); /* function prototype */
void yyerror(char *s)
{
EM_error(EM_tokPos, "%s", s);
}
%}
%union {
int pos;
int ival;
string sval;
}
%token <sval> ID STRING
%token <ival> INT
%token
COMMA COLON SEMICOLON LPAREN RPAREN LBRACK RBRACK
LBRACE RBRACE DOT
PLUS MINUS TIMES DIVIDE EQ NEQ LT LE GT GE
AND OR ASSIGN
ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF
BREAK NIL
FUNCTION VAR TYPE
%right ASSIGN
%left OR
%left AND
%nonassoc EQ NEQ LT LE GT GE
%left PLUS MINUS
%left TIMES DIVIDE
%left UNARYMINUS
%precedence THEN
%precedence ELSE
%start program
%%
program: exp { }
;
exp:lvalue { }
|NIL { }
|LPAREN explist RPAREN { }
|LPAREN RPAREN {}
|INT {}
|STRING {}
|MINUS exp %prec UNARYMINUS {}
|ID LPAREN RPAREN {}
|ID LPAREN arglist RPAREN {}
|exp PLUS exp {}
|exp MINUS exp {}
|exp TIMES exp {}
|exp DIVIDE exp {}
|exp EQ exp {}
|exp NEQ exp {}
|exp LT exp {}
|exp LE exp {}
|exp GT exp {}
|exp GE exp {}
|exp AND exp {}
|exp OR exp {}
|ID LBRACE RBRACE {}
|ID LBRACE idlist RBRACE {}
|ID LBRACK exp RBRACK OF exp {}
|lvalue ASSIGN exp {}
|IF exp THEN exp ELSE exp {}
|IF exp THEN exp {}
|WHILE exp DO exp {}
|FOR ID ASSIGN exp TO exp DO exp {}
|BREAK {}
|LET decs IN END {}
|LET decs IN explist END {}
;
lvalue: ID {}
| lvalue DOT ID {}
| lvalue LBRACK exp RBRACK {}
;
explist: exp {}
| explist SEMICOLON exp {}
;
arglist:exp {}
|exp COMMA arglist {}
;
idlist:ID EQ exp {}
|ID EQ exp COMMA idlist {}
;
decs:dec {}
|decs dec {}
;
dec:tydec {}
|vardec {}
|fundec {}
;
tydec:TYPE ID EQ ty {}
;
ty:ID {}
|LBRACK tyfields RBRACK {}
|ARRAY OF ID {}
;
tyfields:/* NULL */
|notnulltyfields {}
;
notnulltyfields:ID COLON ID {}
|ID COLON ID COMMA notnulltyfields {}
;
vardec:VAR ID ASSIGN exp {}
|VAR ID COLON ID ASSIGN exp {}
;
fundec:FUNCTION ID LPAREN tyfields RPAREN EQ exp {}
|FUNCTION ID LPAREN tyfields RPAREN COLON ID EQ exp {}
;
source to share
Decreasing collisions are easily detected by viewing the file tiger.output
generated by the flag -v
.
Here's an example (I've edited the repetition):
State 88
11 exp: exp . PLUS exp
12 | exp . MINUS exp
# ...
29 | WHILE exp DO exp .
PLUS shift, and go to state 34
MINUS shift, and go to state 35
# ...
PLUS [reduce using rule 29 (exp)]
MINUS [reduce using rule 29 (exp)]
# ...
$default reduce using rule 29 (exp)
We can see that state 88 occurs when the expression WHILE
can be reduced (as seen from the position .
in the state description:
29 | WHILE exp DO exp .
If the lookahead token at this point is a binary operator, the parser does not know whether to shift the operator, making the long exp
in the expression WHILE
longer, or decreasing immediately WHILE
. Obviously (for us, not bison
) the decision has to shift. bison
doesn't know what because the product has exp: WHILE exp DO exp
no priority. The priority of this production will be the priority of its last terminal, which is equal DO
, so the simple solution is to prioritize for DO
. Unsurprisingly, it should be the same as priority ELSE
, as evidenced by the fact that it IF exp THEN exp ELSE exp .
doesn't create a shift / decrease conflict.
A similar problem occurs in states 112 and 129.
The change / decrease conflict in state 1 is also clear from the file output
:
State 1
9 exp: ID . LPAREN RPAREN
10 | ID . LPAREN arglist RPAREN
23 | ID . LBRACE RBRACE
24 | ID . LBRACE idlist RBRACE
25 | ID . LBRACK exp RBRACK OF exp
34 lvalue: ID .
LPAREN shift, and go to state 15
LBRACK shift, and go to state 16
LBRACE shift, and go to state 17
LBRACK [reduce using rule 34 (lvalue)]
$default reduce using rule 34 (lvalue)
Here's the parser just found ID
in a context that exp
can be minified and it has two possibilities:
-
shift .
exp
-ID [exp] OF exp
, so in the end the result will be:ID '[' exp ']' OF exp --> exp (rule 25)
-
reduce .
exp
is an lvalueID[exp]
using the following production operations:ID --> lvalue (rule 34) lvalue '[' exp ']' --> lvalue (rule 36) lvalue --> exp (rule 2)
To use the second alternative, the parser must immediately reduce ID
to lvalue
, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees OF
after the match ], but that is far in the future - in fact it could be an arbitrary amount tokens.
The solution here is not to force the parser to make a decision at this point. There are several possibilities.
-
Since the expression can only be
ID [ exp ] OF
(and not something more complex), we can excludeID
from the conflict:exp : ID | lvalue_not_id | ... lvalue: ID | lvalue_not_id lvalue_not_ID : lvalue DOT ID | ID LBRACK exp RBRACK | lvalue_not_ID LBRACK exp RBRACK
Comparing the current state machine to the state machine after this change should make it clear how this works (and this is a useful exercise when learning upstream analysis).
-
If you don't want to go to all this work, you can simply add "obviously redundant" production, as suggested by Appell in his tutorial:
lvalue: ID | lvalue DOT ID | lvalue LBRACK exp RBRACK | ID LBRACK exp RBRACK
The added production before
lvalue
will obviously create a conflict with the reduction in shift; indeed, it is exactly the same shift-reduction conflict as in the original grammar. But this time there is a conflict between two different settings forlvalue
, and the default shift action is the one you want to accept in the case of nakedID
, followed by [. After the change of productionlvalue
andexp
are still available, so the parser doesn't have to make a decision until it finds the token after ].The downside to this solution is that the parser generator will keep reporting conflict with decreasing shift since it is obviously one. Since collisions with decreasing shift is usually considered a sign that the grammar may be ambiguous, leaving a collision with decreasing shift in the code will be a long-term maintenance problem: after every grammar change it will need to be checked, reducing the conflict is benign.
-
Another solution, which unfortunately also leaves a warning, is to use the bison
%glr-parser
directive to generate the GLR parser. The GLR algorithm is capable of deferring pruning decisions by effectively maintaining two (or more) different possible analyzer stacks at the same time. For unambiguous grammars, this will still be O (n) in input length, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.) -
Finally, you can get rid of
lvalue
it by simply adding your versions toexp
. Then you would need to generalizelvalue [ exp ]
toexp [ exp ]
, which means the grammar recognizes a superset of the original language: it will now accept certain inputs that are not valid. However, in semantic actions for the respective industries, it is easy to check whether it has aexp
formlvalue
or not; if not, you can generate a syntax error in the semantic action.
source to share