ANTLR ambiguity '-'

I have a grammar and everything works fine up to this part:

lexp
: factor ( ('+' | '-') factor)* 
;

factor :('-')? IDENT;

      

This, of course, introduces ambiguity. For example, a-a

either Factor - Factor

orFactor -> - IDENT

I am getting the following warning:

[18:49:39] warning(200): withoutWarningButIncomplete.g:57:31: 
Decision can match input such as "'-' {IDENT, '-'}" using multiple alternatives: 1, 2

      

How can I remove this ambiguity? I just can't see the way around it. Is there some option I can use?

Here is the complete grammar:

program 
    : includes decls (procedure)*
    ;

/* Check if correct! */
includes
    :  ('#include' STRING)* 
    ;

decls
    : (typedident ';')*
    ;

typedident
: ('int' | 'char') IDENT    
;

procedure
    : ('int' | 'char') IDENT '(' args ')' body
    ;

args
: typedident (',' typedident )*  /* Check if correct! */
|   /* epsilon */ 
;

body
: '{' decls stmtlist '}'
;   

stmtlist
: (stmt)*;

stmt  

:  '{' stmtlist '}'
| 'read' '(' IDENT ')' ';'
| 'output' '(' IDENT ')' ';'
| 'print' '(' STRING ')' ';'
| 'return' (lexp)* ';'
| 'readc' '(' IDENT ')' ';'
| 'outputc' '(' IDENT ')' ';'
|  IDENT '(' (IDENT ( ',' IDENT )*)? ')' ';'
| IDENT '=' lexp ';';



lexp
: term (( '+' | '-' ) term) * /*Add in | '-'  to reveal the warning! !*/
;

term 
    : factor (('*' | '/' | '%') factor )*
;  


factor : '(' lexp ')' 
 | ('-')? IDENT
 | NUMBER;




fragment DIGIT
: ('0' .. '9')
;

IDENT : ('A' .. 'Z' | 'a' .. 'z') (( 'A' .. 'Z' | 'a' .. 'z' | '0' .. '9' | '_'))* ;


NUMBER
: ( ('-')? DIGIT+)
;

CHARACTER
: '\'' ('a' .. 'z' | 'A' .. 'Z'  | '0' .. '9' | '\\n' |  '\\t'  | '\\\\'  | '\\'  | 'EOF'  |'.' | ',' |':'  )  '\'' /* IS THIS COMPLETE? */
;

      

+3


source to share


1 answer


As mentioned in the comments: these rules are not ambiguous:

lexp
 : factor (('+' | '-') factor)* 
 ;

factor : ('-')? IDENT;

      

This is the reason for the ambiguity:

'return' (lexp)* ';'

      

which can parse the input in a-b

two different ways:



  • a-b

    as one binary expression
  • a

    as a single expression, and -b

    is a unary expression

You will need to change your grammar. Perhaps add a comma to multiple return values? Something like that:

'return' (lexp (',' lexp)*)? ';'

      

which will match:

return;
return a;
return a, -b;
return a-b, c+d+e, f;
...

      

+2


source







All Articles