Error handling problems with ANTLR3

Question

Error handling problems with ANTLR3

I tried reporting errors as follows.

@members{
    public String getErrorMessage(RecognitionException e,String[] tokenNames)
    {
        List stack=getRuleInvocationStack(e,this.getClass().getName());
        String msg=null;
        if(e instanceof NoViableAltException){
            <some code>
        }
        else{
            msg=super.getErrorMessage(e,tokenNames);
        }
        String[] inputLines = e.input.toString().split("\r\n");
        String line = "";
        if(e.token.getCharPositionInLine()==0)
            line =  "at \"" + inputLines[e.token.getLine() - 2];
        else if(e.token.getCharPositionInLine()>0)
            line =  "at \"" + inputLines[e.token.getLine() - 1];
        return ": " + msg.split("at")[0] + line + "\" => [" + stack.get(stack.size() - 1) + "]";
    }

    public String getTokenErrorDisplay(Token t){
        return t.toString();
    }
}

And now the errors are displayed as follows.

line 6:7 : missing CLOSSB at "int a[6;" => [var_declaration]
line 8:0 : missing SEMICOL at "int p" => [var_declaration]
line 8:5 : missing CLOSB at "get(2;" => [call]

I have 2 questions.

1) Is there a way to do the same as what I did?

2) I want to replace CLOSSB, SEMICOL, CLOSB, etc. with their valid symbols. How can I do this using the map in the .g file?

Thank.

+1

context-free-grammar antlr antlr3 grammar

Bee 11 Mar 12 at 16:33

source to share

2 answers

1) Is there a way to do the same as what I did?

I don't know if there is a specific way to display errors. My error manifestation is the light test. If the user can figure out how to fix the error based on what you gave them, then that's good. If the user receives an error message, the message requires more work. Based on the examples given in the question, the characters were only char constants.

My favorite way to see errors is with a line with an arrow pointing to the location.

i.e.

Expected closing parenthesis on line 6.

int a[6;
       ^

2) I want to replace CLOSSB, SEMICOL, CLOSB, etc. with their valid symbols. How can I do this using the map in the .g file?

You would need to read a separately generated token file and then make a map, i.e. a dictionary data structure, to translate the token name into the token character (s).

EDIT

We must first clarify what is meant by a symbol. If you restrict the definition of a symbol to only tokens that are defined in the token file using char or a string, then this can be done, that is, "!" = 13 or "public" = 92, if you decide to use a symbol definition - any text associated with a token, then this is something other than what I am or am planning to consider.

When ANTLR generates its token map, it uses three different sources:

char or string constants in lexer
char or string constants in the parser.
Internal markers like Invalid, Down, Up

Since the tokens in the lexer are not complete, the token file should be used as a starting point. If you look at the token file, you will notice that the lowest value is 4. If you look at the TokenTypes file (this is the name of the C # version), you will find the remaining specific tokens. If you find names like T__ in the token file, these are ANTLR names generated for char / string literals in the parser.

If you use string and / or char literals in parser rules, then ANTLR should create a new set of lexer rules that includes all string and / or char literals in parser rules. Remember that the parser can only see tokens, not raw texts. Therefore, string and / or char literals cannot be passed to the parser.

To see the new lexer ruleset, use org.antlr.Tool -Xsavelexer and then open the generated grammar file. The name can be like .g. If you have a string, and / or char literals in the parser rules, you will see a lexer rule with a name starting with T .

Now that you know all the tokens and their meanings, you can create a mapping table from the information provided in the error into the string you want to output instead of a character.

The code http://markmail.org/message/2vtaukxw5kbdnhdv#query:+page:1+mid:2vtaukxw5kbdnhdv+state:results is an example.

However, the mapping of tokens can change for things like changing rules in a lexer or changing char / string literals in a parser. Therefore, if the message unexpectedly displays the wrong string for a character, you will have to manually update the mapping table.

While not a perfect solution, it is a possible solution depending on how you define the symbol.

Note. The last time I looked at ANTLR 4.x, it was automatically creating a table for access in the parser because it was such a problem for many with ANTLR 3.x.

+2

Guy Coder 11 Mar 12 at 17:19

source to share

Bart kiers · Accepted Answer · 2012-03-11T19:19:51+0000

Bhatia wrote:

* 1) Is there a way to do the same thing I did?

There is no single way to do this. Please note that correct error handling and reporting is complex. Terence Parr spends an entire chapter on this in the Ultimate ANTLR Reference (chapter 10). I recommend that you get a copy and read it.

Bhatia wrote:

2) I want to replace CLOSSB, SEMICOL, CLOSB, etc. with their valid symbols. How can I do this using the map in the .g file?

You can not. It SEMICOL

may sound simple for this, but how do you get this information for a token, for example FOO

:

FOO : (X | Y)+;

fragment X : '4'..'6';
fragment Y : 'a' | 'bc' | . ;

Error handling problems with ANTLR3

More articles: