Using ANTLR4 translating from DSL to Java requires a strategy for rebuilding (not evaluating) expressions
I am using ANTLR4 to translate ("compile") from a simple, artificial, vocabulary-limited programming language in Java. Since this exercise requires no evaluation, and therefore even conditionals will only be fully translated to equivalent Java code, I am working on implementing a listener-based solution. Thanks to my limited language vocabulary, I was able to focus on most translation tasks and related strategies, relying mainly on a single-scoped symbol table for storing and comparing compile-time and run-time variables (remember that no expression evaluation is done).
Simple arithmetic and comparison expressions are easy enough to parse and convert in Java; however, I ran into problems with nested and compound expressions. They are great at understanding, but translating them to Java is a problem. I have tried several strategies to handle them, most of which involve getting lhs and rhs expressions and checking using various mechanisms, whether it is an inline expression or not (for example, checking for parentheses or other operators within a string view) , checking the variables and looking at them in the symbol table, and if lhs or rhs is defined as a numeric or real variable, it is pushed onto the stack along with the operator. However, popping those stack items and trying to re-arrange the expressions in the correct order is useless,because the nested positions of expressions affect when they are clicked and where the associated operators are located.
I feel like I'm on the right track: my strategy for storing and re-generating expressions, but need a nudge. However, I'm afraid that maybe I'll be wasting time if I'm not on the right track or if there is a better way to do it, perhaps through a well tested design pattern.
The complete grammar is shown below. I think this is pretty self-explanatory ... except maybe the triple quotes ("") used for the inline quote mark in a string. Remember that this is a very limited language and I am not evaluating any expressions.
grammar Test;
prog
: (stat ';')+
| COMMENT ;
stat
: assign
| if_stat
| loop_stat
| expr
| get
| put
;
assign
: VARIABLE '=' expr
;
if_stat
: 'if' expr 'then' (stat ';')+ (('elsif' expr 'then' (stat ';')+)* 'else' (stat ';')+)? 'end if'
;
loop_stat
: 'loop' ('exit when' expr ';')* (stat ';')+ 'end loop'
;
expr
: number #Num
| variable #Var
| '!' expr #LogNeg
| expr '&' expr #LogAnd
| expr '|' expr #LogOr
| expr ('='|'<>'|'<'|'>'|'<='|'>=') expr #Comp
| '-' expr #Neg
| expr ('*'|'/'|'%') expr #MultDivRem
| expr ('+'|'-') expr #AddSub
| '(' expr ')' #Parens
;
get
: 'get' variable (',' variable)*
;
put
: 'put' (expr|str) (',' (expr|str))*
;
number
: NUMBER
;
variable
: VARIABLE
;
str
: STRING
;
COMMENT : '#' .*? '\n' -> skip ;
WS : [ \t\n\r]+ -> skip ;
VARIABLE : LETTER (LETTER|DIGIT|'_')* ;
NUMBER : DIGIT (DIGIT|'_' DIGIT)* ;
STRING : ('"""'|'"') .*? ('"""'|'"') ;
fragment LETTER : [a-z] | [A-Z] ;
fragment DIGIT : [0-9] ;
A sample expression processing method looks like this:
public void enterAddSub(SimpleParser.AddSubContext ctx) {
// Simplified example does not account for variables.
boolean opSeen = false;
// Get operator and left and right hand expressions.
String op = ctx.getChild(1).getText();
String lhs = ctx.getChild(0).getText();
String rhs = ctx.getChild(2).getText();
// lhs is not a nested expression, print it. If nested, skip for now.
if (isInteger(lhs) == true) {
//System.out.print(lhs + " " + op + " ");
cts.push(lhs);
cts.push(op);
opSeen = true;
}
// rhs is not a nested expression, print it. If nested, skip for now.
if (isInteger(rhs) == true) {
//System.out.print(rhs);
cts.push(rhs);
}
else {
if (opSeen == false) {
//System.out.print(op);
cts.push(op);
}
}
//System.out.println();
}
The corresponding expr exit method just pops everything off the stack into a line, and then the puzzle is not for it to be shifted together, and I can't think of an algorithm that sequentially puts the items where they need to be.
Also, I do not override Number or Variable methods, and instead use a top-down approach to access these elements from my closing exprs. Maybe this is causing me the problem; unfortunately if that's the case, I can't see how.
Any suggestion on how to keep attacking this problem in the same way or how to change strategy would be appreciated.
I've looked at numerous questions and examples on SO, but can't find an equivalent and have the Parr ANTLR4 reference, which is very helpful, but can't find a strategy for this particular problem anywhere.
source to share
One way to solve this problem is to promote and use a restricted symbol table, or in particular a table with an "op" option. Click the area on each "enterExpr" and click on each "exitExpr". When entering each sub-expr, such as "enterAddSub", add an "op object" characterizing the statement for that sub-expr in the current scope.
Now, as you enter and exit each expr expression, evaluate the op object in the parent scope to see if there is any part of the operation that needs to be printed. In the specific case of "enterAddSub" and choosing a statement printing strategy before printing anything from the second expr, turn on the counter in the op so that on the third evaluation of the op the statement will print (otherwise increment the counter). For the parens submode, the strategy is evaluated with printExpressionExpr "(" and from exitExpr print ")".
For simple cases it is common for the op object to use the onEnter and onExit methods to invoke self-evaluation and conditionally print the result.
In more interesting cases, especially where the translation can benefit from lazy evaluation, the op becomes a smart accumulator. In each onExit evaluation, it decides whether to print, accumulate, or add its values ββto the op object in its parent scope.
enterExpr:
pushScope()
parentScope().onEntry()
enterAddSub:
currentScope().add(new OpObject(ADDSUB)) // enum
enterExpr
visit ...
exitExpr
enterExpr
visit ...
exitExpr
exitAddSub:
currentScope().finalize()
exitExpr:
call parentScope().onExit()
popScope()
source to share