Ruby Regex: parsing C ++ classes

I am curious to learn how to parse C ++ code using regexp. What I have so far (using ruby) allows me to fetch class declarations and their parent classes (if any):

/(struct|class)\s+([^{:\s]+)\s*[:]?([^{]+)\s*\{/

      

Here's an example in Rubular. Note that I can correctly capture the "declaration" and "inheritance" parts.

The point I get stuck at is grabbing the class ... If I use the following extension of the original regex:

/(struct|class)\s+([^{:\s]+)\s*[:]?([^{]+)\s*\{[^}]*\};/

      

Then I can only grab the body of the class if it does not contain curly braces and therefore no class or function definition. I've tried a lot of things so far, but none of them did it better. For example, if I include in the regexp the fact that the body can contain curly braces, it will capture the first class declaration and then all subsequent classes as if they were part of the first class body!

What am I missing?

+3


source to share


3 answers


Group capture can help:

#                   named  v    backref          v
/(struct|class)\s+(?<match>{((\g<match>|[^{}]*))*})/m

      



Here we find the matching curly brace for the next struct

/ class

. You will probably want to customize the regex, I posted it to make the solution as clear as possible.

+1


source


Regular expressions are not recommended for parsing code.

Most compilers and interpreters use lexers and parsers to transform code into an abstract syntax tree before compiling or running the code.



Ruby has several lexer stones, for example this one you can try and include in your project.

+4


source


I can offer you the following:

(struct|class)\s+([^{:\s]+)\s*[:]?([^{]+)\{([^{}]|\{\g<4>\})*\};

      

Where \g<4>

is the recursive application of the fourth capture group, which is ([^{}]|\{\g<4>\})

.

Matching irregular languages ​​with regular expressions is never pretty. You might want to consider switching to a proper recursive descent parser, especially if you plan on doing something with the stuff you just grabbed.

0


source







All Articles