Python regex for matching single-line and multi-line comments.

Question

Python regex for matching single-line and multi-line comments.

I am trying to create a python regex for PLY that will match form comments

// some comment

and

/* comment
   more comment */

So i tried

t_COMMENT = r'//.+ | /\*.+\*/'

but this does not allow multi-line comments and when I try to solve this using the "dot matches all" options like

t_COMMENT = r'//.+ | (?s) /\*.+\*/'

this results in the comment type '//', which matches many lines. Also if I am trying to have two separate regex like

t_COMMENT = r'//.+' 
t_COMMENT2 = r'(?s) /\*.+\*/'

The "//" comment type still matches multiple lines, as if period matched all parameters.

Does anyone know how to solve this?

+3

python regex

Gottfried 13 Sep At 11:28 am

source to share

4 answers

According to PLY Doc, this can be accomplished with "Conditional Lexing". It can be more readable and easier to debug than a complex regular expression. The example they give is a little more complex as it keeps track of the nesting levels and content within the block. However, your case is simpler because you don't need all this information.

The code for a multi-line comment should be something like this:

# I'd prefer 'multi_line_comment', but it appears that 
# state names cannot have underscore in them
states = (
    ('multiLineComment','exclusive'),
)

def t_multiLineComment_start(t):
    r'/\*'
    t.lexer.begin('multiLineComment')          

def t_multiLineComment_end):
    r'\*/'
    t.lexer.begin('INITIAL')           

def t_multiLineComment_newline(t):
    r'\n'
    pass

# catch (and ignore) anything that isn't end-of-comment
def t_multiLineComment_content(t):
    r'[^(\*/)]'
    pass

Of course, for comments, //

you will need to have a different rule under the regular state.

+2

Zvika 29 Sep '14 at 6:35

source to share

Here's a slight variation on Avinash's solution.

pat = re.compile(r'(?://.*?$)|(?:/\*.*?\*/)', re.M|re.S)

0

PM 2Ring 13 Sep 14 at 11:59

source to share

This might be helpful

 (/\*(.|\n)*?*/)|(//.*)

0

Farshid 06 jan. 16 at 10:41

source to share

Avinash Raj · Accepted Answer · 2014-09-13T11:46:05+0000

Below regex will match both types of comments,

(?://[^\n]*|/\*(?:(?!\*/).)*\*/)

DEMO

>>> s = """// some comment
... 
... foo
... bar
... foobar
... /* comment
...    more comment */ bar"""
>>> m = re.findall(r'(?://[^\n]*|/\*(?:(?!\*/).)*\*/)', s, re.DOTALL)
>>> m
['// some comment', '/* comment\n   more comment */']

Python regex for matching single-line and multi-line comments.

More articles: