Parsing regular expressions based on context free grammar
Good evening, stack overflow. I would like to develop an interpreter for expressions based on a fairly simple context-free grammar:
Basically, the language consists of two basic operators
( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150
Now I'm sure what I have to do to interpret the statement: 1) Lexical parsing to turn the statement into a sequence of tokens 2) Parsing the syntax to get a symbol table (HashMap with variables and their values) and a syntax tree (for executing GET statements) to 3) do the tree visit in order to get the results I want.
I would like some advice on the parsing method to read the source file. Given that the parser has to ignore any spaces, tabs, or newlines, is it possible to use a Java pattern to get the general statement that I want to parse? Is there a good way to read a formatted (and possibly more complex) expression like this
(
SET var
25
)
without confusing the parser with open and closed parentheses?
for example
Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
Interpreter.computeStatement(scan.next(pattern));
would this be a viable option for this problem?
source to share
In the end, I realized thanks to Ira Baxter that this context free grammar cannot be parsed with RegExp, and I used the concepts of S-Expressions to create an interpreter whose source code you can find here . If you have any questions about this (mainly because the comments are not translated into English, although I think the code is pretty clear) just let me know or comment here.
Basically what I am doing:
- Parsing each character and tokenizing it (e.g. '(' -> - OPEN_PAR, whereas "SET" -> STATEMENT_SET or a random letter like "b" is parsed as a VARIABLE)
- Then I use the token list generated for parsing, which checks for patterns occurring inside the token list according to the grammar
- If there is an expression in the expression, I check recursively for any expression within the expression, throwing an exception and, if necessary, moving on to the next correct expression
- At the end of the analysis of each individual statement, I compute the instruction as needed, just like for the specifications
source to share
Solution suggested by Ira Brakter :
Your name is extremely confusing. You seem to want to parse what is commonly referred to as "S-expressions" in the LISP world; this requires a (simple but) context-free grammar. You cannot parse such expressions with regular expressions. Time to learn about real parsers.
Maybe this will help: fooobar.com/questions/79055 / ...
source to share