Bison shift / reduce conflict - tiger compiler

I wrote a yacc file according to the Tiger book (Appendix A, Tiger manual).

But there are still shift / reduction conflicts. I don't know how to resolve these conflicts.

% yacc --version
bison (GNU Bison) 3.0.2

      

You can use this cmd to reproduce the problem:

% yacc -dvt tiger.y
tiger.y: warning: 37 shift/reduce conflicts [-Wconflicts-sr]

      

% cat tiger.y

:

%{
#include <stdio.h>
//#include "util.h"
//#include "errormsg.h"

int yylex(void); /* function prototype */

void yyerror(char *s)
{
    EM_error(EM_tokPos, "%s", s);
}
%}


%union {
    int pos;
    int ival;
    string sval;
}


%token <sval> ID STRING
%token <ival> INT

%token
  COMMA COLON SEMICOLON LPAREN RPAREN LBRACK RBRACK
  LBRACE RBRACE DOT
  PLUS MINUS TIMES DIVIDE EQ NEQ LT LE GT GE
  AND OR ASSIGN
  ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF
  BREAK NIL
  FUNCTION VAR TYPE


%right ASSIGN
%left OR
%left AND
%nonassoc EQ NEQ LT LE GT GE
%left  PLUS MINUS
%left  TIMES DIVIDE
%left  UNARYMINUS

%precedence THEN
%precedence ELSE


%start program

%%

program:    exp {  }
       ;

exp:lvalue {  }
   |NIL    {  }
   |LPAREN explist RPAREN {  }
   |LPAREN         RPAREN {}
   |INT {}
   |STRING {}
   |MINUS exp %prec UNARYMINUS {}
   |ID LPAREN RPAREN {}
   |ID LPAREN arglist RPAREN {}
   |exp PLUS exp {}
   |exp MINUS exp {}
   |exp TIMES exp {}
   |exp DIVIDE exp {}
   |exp EQ exp {}
   |exp NEQ exp {}
   |exp LT exp {}
   |exp LE exp {}
   |exp GT exp {}
   |exp GE exp {}
   |exp AND exp {}
   |exp OR exp {}
   |ID LBRACE RBRACE {}
   |ID LBRACE idlist RBRACE {}
   |ID LBRACK exp RBRACK OF exp {}
   |lvalue ASSIGN exp {}
   |IF exp THEN exp ELSE exp {}
   |IF exp THEN exp {}
   |WHILE exp DO exp {}
   |FOR ID ASSIGN exp TO exp DO exp {}
   |BREAK {}
   |LET decs IN END {}
   |LET decs IN explist END {}
   ;

lvalue: ID {}
      | lvalue DOT ID {}
      | lvalue LBRACK exp RBRACK {}
      ;

explist: exp {}
       | explist SEMICOLON exp {}
       ;

arglist:exp {}
       |exp COMMA arglist {}
       ;

idlist:ID EQ exp {}
      |ID EQ exp COMMA idlist {}
      ;

decs:dec {}
       |decs dec {}
       ;

dec:tydec {}
   |vardec {}
   |fundec {}
   ;

tydec:TYPE ID EQ ty {}
     ;

ty:ID {}
  |LBRACK tyfields RBRACK {}
  |ARRAY OF ID {}
  ;

tyfields:/* NULL */
        |notnulltyfields {}
        ;

notnulltyfields:ID COLON ID {}
               |ID COLON ID COMMA notnulltyfields {}
               ;

vardec:VAR ID ASSIGN exp {}
      |VAR ID COLON ID ASSIGN exp {}
      ;

fundec:FUNCTION ID LPAREN tyfields RPAREN EQ exp {}
      |FUNCTION ID LPAREN tyfields RPAREN COLON ID EQ exp {}
      ;

      

+3


source to share


1 answer


Decreasing collisions are easily detected by viewing the file tiger.output

generated by the flag -v

.

Here's an example (I've edited the repetition):

State 88

   11 exp: exp . PLUS exp
   12    | exp . MINUS exp
# ...
   29    | WHILE exp DO exp .

    PLUS    shift, and go to state 34
    MINUS   shift, and go to state 35
# ...

    PLUS      [reduce using rule 29 (exp)]
    MINUS     [reduce using rule 29 (exp)]
# ...

    $default  reduce using rule 29 (exp)

      

We can see that state 88 occurs when the expression WHILE

can be reduced (as seen from the position .

in the state description:

   29    | WHILE exp DO exp .

      

If the lookahead token at this point is a binary operator, the parser does not know whether to shift the operator, making the long exp

in the expression WHILE

longer, or decreasing immediately WHILE

. Obviously (for us, not bison

) the decision has to shift. bison

doesn't know what because the product has exp: WHILE exp DO exp

no priority. The priority of this production will be the priority of its last terminal, which is equal DO

, so the simple solution is to prioritize for DO

. Unsurprisingly, it should be the same as priority ELSE

, as evidenced by the fact that it IF exp THEN exp ELSE exp .

doesn't create a shift / decrease conflict.

A similar problem occurs in states 112 and 129.



The change / decrease conflict in state 1 is also clear from the file output

:

State 1

    9 exp: ID . LPAREN RPAREN
   10    | ID . LPAREN arglist RPAREN
   23    | ID . LBRACE RBRACE
   24    | ID . LBRACE idlist RBRACE
   25    | ID . LBRACK exp RBRACK OF exp
   34 lvalue: ID .

    LPAREN  shift, and go to state 15
    LBRACK  shift, and go to state 16
    LBRACE  shift, and go to state 17

    LBRACK    [reduce using rule 34 (lvalue)]
    $default  reduce using rule 34 (lvalue)

      

Here's the parser just found ID

in a context that exp

can be minified and it has two possibilities:

  • shift . exp

    - ID [exp] OF exp

    , so in the end the result will be:

    ID '[' exp ']' OF exp        --> exp    (rule 25)
    
          

  • reduce . exp

    is an lvalue ID[exp]

    using the following production operations:

    ID                           --> lvalue (rule 34)
    lvalue '[' exp ']'           --> lvalue (rule 36)
    lvalue                       --> exp    (rule 2)
    
          

To use the second alternative, the parser must immediately reduce ID

to lvalue

, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees OF

after the match ], but that is far in the future - in fact it could be an arbitrary amount tokens.

The solution here is not to force the parser to make a decision at this point. There are several possibilities.

  • Since the expression can only be ID [ exp ] OF

    (and not something more complex), we can exclude ID

    from the conflict:

    exp   : ID
          | lvalue_not_id
          | ...
    
    lvalue: ID
          | lvalue_not_id
    
    lvalue_not_ID
          : lvalue DOT ID
          | ID            LBRACK exp RBRACK
          | lvalue_not_ID LBRACK exp RBRACK
    
          

    Comparing the current state machine to the state machine after this change should make it clear how this works (and this is a useful exercise when learning upstream analysis).

  • If you don't want to go to all this work, you can simply add "obviously redundant" production, as suggested by Appell in his tutorial:

    lvalue: ID 
          | lvalue DOT ID 
          | lvalue LBRACK exp RBRACK
          | ID LBRACK exp RBRACK
    
          

    The added production before lvalue

    will obviously create a conflict with the reduction in shift; indeed, it is exactly the same shift-reduction conflict as in the original grammar. But this time there is a conflict between two different settings for lvalue

    , and the default shift action is the one you want to accept in the case of naked ID

    , followed by [. After the change of production lvalue

    and exp

    are still available, so the parser doesn't have to make a decision until it finds the token after ].

    The downside to this solution is that the parser generator will keep reporting conflict with decreasing shift since it is obviously one. Since collisions with decreasing shift is usually considered a sign that the grammar may be ambiguous, leaving a collision with decreasing shift in the code will be a long-term maintenance problem: after every grammar change it will need to be checked, reducing the conflict is benign.

  • Another solution, which unfortunately also leaves a warning, is to use the bison %glr-parser

    directive to generate the GLR parser. The GLR algorithm is capable of deferring pruning decisions by effectively maintaining two (or more) different possible analyzer stacks at the same time. For unambiguous grammars, this will still be O (n) in input length, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.)

  • Finally, you can get rid of lvalue

    it by simply adding your versions to exp

    . Then you would need to generalize lvalue [ exp ]

    to exp [ exp ]

    , which means the grammar recognizes a superset of the original language: it will now accept certain inputs that are not valid. However, in semantic actions for the respective industries, it is easy to check whether it has a exp

    form lvalue

    or not; if not, you can generate a syntax error in the semantic action.

+6


source







All Articles