ANTLR3: match everything up to a specific keyword
I am using ANTLR 3 to do the following.
Let's say I have a SQL query. I know that in general the WHERE, ORDER BY and GROUP BY clauses are optional. In terms of ANTLR grammar, I would describe it like this:
query: select_clause from_clause where_clause? group_by_clause? order_by_clause?
The rule for each sentence will obviously start with the corresponding keyword.
I really need to extract the content of each article as a string without having to deal with its internal structure.
To do this, I started with the following grammar:
query: select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF; select_clause: SELECT_CLAUSE ; from_clause: FROM_CLAUSE ; where_clause: WHERE_CLAUSE ; group_by_clause: GROUP_BY_CLAUSE ; order_by_clause: ORDER_BY_CLAUSE ; SELECT_CLAUSE: 'select' ANY_CHAR *; FROM_CLAUSE: 'from' ANY_CHAR *; WHERE_CLAUSE: 'where' ANY_CHAR *; GROUP_BY_CLAUSE: 'group by' ANY_CHAR *; ORDER_BY_CLAUSE: 'order by' ANY_CHAR *; ANY_CHAR:.; WS: '' + {skip ();};
It didn't work. I have had further attempts at putting together the correct grammar without success. I suspect this task is doable with ANTLR3, but I'm just missing smth.
More generally, I would like to be able to collect characters from the input stream into a single token until it encounters a specific keyword that indicates the start of a new token. This keyword must be part of the new token.
Can you help me?
source to share
Instead of adding them to your tokens, why not move them ANY_CHAR*
into parsing rules instead? You can even glue these single tokens using a rewrite rule.
Quick demo:
grammar T;
options { output=AST; }
tokens { QUERY; ANY; }
query : select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF
-> ^(QUERY select_clause from_clause where_clause? group_by_clause? order_by_clause?)
;
select_clause : SELECT_CLAUSE^ any;
from_clause : FROM_CLAUSE^ any;
where_clause : WHERE_CLAUSE^ any;
group_by_clause : GROUP_BY_CLAUSE^ any;
order_by_clause : ORDER_BY_CLAUSE^ any;
any : ANY_CHAR* -> ANY[$text];
SELECT_CLAUSE : 'select';
FROM_CLAUSE : 'from';
WHERE_CLAUSE : 'where';
GROUP_BY_CLAUSE : 'group' S+ 'by';
ORDER_BY_CLAUSE : 'order' S+ 'by';
ANY_CHAR : . ;
WS : S+ {skip();};
fragment S : ' ' | '\t' | '\r' | '\n';
If you now parse the input:
select JUST ABOUT ANYTHING from YOUR BASEMENT order by WHATEVER
the following AST will be created:
Trying to do something like this in your lexer would be messy and would mean some code (or predicates) to check for keywords in the char -stream (both not so good!).
source to share