Parsing regular expressions based on context free grammar

Good evening, stack overflow. I would like to develop an interpreter for expressions based on a fairly simple context-free grammar:

Grammar

Basically, the language consists of two basic operators

( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150

      

Now I'm sure what I have to do to interpret the statement: 1) Lexical parsing to turn the statement into a sequence of tokens 2) Parsing the syntax to get a symbol table (HashMap with variables and their values) and a syntax tree (for executing GET statements) to 3) do the tree visit in order to get the results I want.

I would like some advice on the parsing method to read the source file. Given that the parser has to ignore any spaces, tabs, or newlines, is it possible to use a Java pattern to get the general statement that I want to parse? Is there a good way to read a formatted (and possibly more complex) expression like this

(
  SET var

 25
 )

      

without confusing the parser with open and closed parentheses?

for example

Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
  Interpreter.computeStatement(scan.next(pattern));

      

would this be a viable option for this problem?

+3


source to share


2 answers


In the end, I realized thanks to Ira Baxter that this context free grammar cannot be parsed with RegExp, and I used the concepts of S-Expressions to create an interpreter whose source code you can find here . If you have any questions about this (mainly because the comments are not translated into English, although I think the code is pretty clear) just let me know or comment here.

Basically what I am doing:



  • Parsing each character and tokenizing it (e.g. '(' -> - OPEN_PAR, whereas "SET" -> STATEMENT_SET or a random letter like "b" is parsed as a VARIABLE)
  • Then I use the token list generated for parsing, which checks for patterns occurring inside the token list according to the grammar
  • If there is an expression in the expression, I check recursively for any expression within the expression, throwing an exception and, if necessary, moving on to the next correct expression
  • At the end of the analysis of each individual statement, I compute the instruction as needed, just like for the specifications
+1


source


Solution suggested by Ira Brakter :

Your name is extremely confusing. You seem to want to parse what is commonly referred to as "S-expressions" in the LISP world; this requires a (simple but) context-free grammar. You cannot parse such expressions with regular expressions. Time to learn about real parsers.




Maybe this will help: fooobar.com/questions/79055 / ...

+1


source







All Articles