Parsing regular expressions based on context free grammar

Question

Parsing regular expressions based on context free grammar

Good evening, stack overflow. I would like to develop an interpreter for expressions based on a fairly simple context-free grammar:

Grammar

Basically, the language consists of two basic operators

( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150

Now I'm sure what I have to do to interpret the statement: 1) Lexical parsing to turn the statement into a sequence of tokens 2) Parsing the syntax to get a symbol table (HashMap with variables and their values) and a syntax tree (for executing GET statements) to 3) do the tree visit in order to get the results I want.

I would like some advice on the parsing method to read the source file. Given that the parser has to ignore any spaces, tabs, or newlines, is it possible to use a Java pattern to get the general statement that I want to parse? Is there a good way to read a formatted (and possibly more complex) expression like this

(
  SET var

 25
 )

without confusing the parser with open and closed parentheses?

for example

Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
  Interpreter.computeStatement(scan.next(pattern));

would this be a viable option for this problem?

+3

java regex parsing

fedexist 09 dec. 14 at 19:25

source to share

2 answers

Solution suggested by Ira Brakter :

Your name is extremely confusing. You seem to want to parse what is commonly referred to as "S-expressions" in the LISP world; this requires a (simple but) context-free grammar. You cannot parse such expressions with regular expressions. Time to learn about real parsers.

Maybe this will help: fooobar.com/questions/79055 / ...

+1

Stephan 27 Aug '15 at 9:16

source to share

fedexist · Accepted Answer · 2015-10-04T18:26:57+0000

In the end, I realized thanks to Ira Baxter that this context free grammar cannot be parsed with RegExp, and I used the concepts of S-Expressions to create an interpreter whose source code you can find here . If you have any questions about this (mainly because the comments are not translated into English, although I think the code is pretty clear) just let me know or comment here.

Basically what I am doing:

Parsing each character and tokenizing it (e.g. '(' -> - OPEN_PAR, whereas "SET" -> STATEMENT_SET or a random letter like "b" is parsed as a VARIABLE)
Then I use the token list generated for parsing, which checks for patterns occurring inside the token list according to the grammar
If there is an expression in the expression, I check recursively for any expression within the expression, throwing an exception and, if necessary, moving on to the next correct expression
At the end of the analysis of each individual statement, I compute the instruction as needed, just like for the specifications

Parsing regular expressions based on context free grammar

More articles: