ANTLR3: match everything up to a specific keyword

I am using ANTLR 3 to do the following.

Let's say I have a SQL query. I know that in general the WHERE, ORDER BY and GROUP BY clauses are optional. In terms of ANTLR grammar, I would describe it like this:

query: select_clause from_clause where_clause? group_by_clause? order_by_clause?

The rule for each sentence will obviously start with the corresponding keyword.

I really need to extract the content of each article as a string without having to deal with its internal structure.

To do this, I started with the following grammar:

query:
    select_clause from_clause where_clause? group_by_clause? order_by_clause?
EOF;

select_clause:
    SELECT_CLAUSE
;

from_clause:
    FROM_CLAUSE
;

where_clause:
    WHERE_CLAUSE
;

group_by_clause:
    GROUP_BY_CLAUSE
;

order_by_clause:
    ORDER_BY_CLAUSE
;

SELECT_CLAUSE: 'select' ANY_CHAR *;

FROM_CLAUSE: 'from' ANY_CHAR *;

WHERE_CLAUSE: 'where' ANY_CHAR *;

GROUP_BY_CLAUSE: 'group by' ANY_CHAR *;

ORDER_BY_CLAUSE: 'order by' ANY_CHAR *;

ANY_CHAR:.;

WS: '' + {skip ();};

It didn't work. I have had further attempts at putting together the correct grammar without success. I suspect this task is doable with ANTLR3, but I'm just missing smth.

More generally, I would like to be able to collect characters from the input stream into a single token until it encounters a specific keyword that indicates the start of a new token. This keyword must be part of the new token.

Can you help me?

+3


source to share


1 answer


Instead of adding them to your tokens, why not move them ANY_CHAR*

into parsing rules instead? You can even glue these single tokens using a rewrite rule.

Quick demo:

grammar T;

options { output=AST; }
tokens  { QUERY; ANY; }

query           : select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF
                  -> ^(QUERY select_clause from_clause where_clause? group_by_clause? order_by_clause?)
                ;
select_clause   : SELECT_CLAUSE^ any;
from_clause     : FROM_CLAUSE^ any;
where_clause    : WHERE_CLAUSE^ any;
group_by_clause : GROUP_BY_CLAUSE^ any;
order_by_clause : ORDER_BY_CLAUSE^ any;
any             : ANY_CHAR* -> ANY[$text];

SELECT_CLAUSE   : 'select';
FROM_CLAUSE     : 'from';
WHERE_CLAUSE    : 'where';
GROUP_BY_CLAUSE : 'group' S+ 'by';
ORDER_BY_CLAUSE : 'order' S+ 'by';
ANY_CHAR        : . ;
WS              : S+ {skip();};

fragment S      : ' ' | '\t' | '\r' | '\n';

      

If you now parse the input:



select JUST ABOUT ANYTHING from YOUR BASEMENT order by WHATEVER

the following AST will be created:

enter image description here

Trying to do something like this in your lexer would be messy and would mean some code (or predicates) to check for keywords in the char -stream (both not so good!).

+2


source







All Articles