Python Regex gets everything in parentheses if only in quotes

Given the line

S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"

      

I would like to extract everything in parentheses IF the pairs are inside a quote. So far, I've managed to get everything in parentheses, but I can't figure out how to stop splitting in the inner parenthesis inside the quotes. My current code:

import re
S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"

p = re.compile( "\((.*?)\)" )
m =p.findall(S)
for element in m:
    print element

      

I want to:

45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699

      

I am currently getting:

45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation
45795362,-1,'!!_(disambiguation

      

What can I do to ignore the inner guy?

Thank!


In case it helps, here are the topics I looked at:

1) REGEX-String and escaped quote

2) Regular expression to return text between brackets

3) Get a parenthesized string in Python

+3


source to share


4 answers


You can use a non-capturing group to assert both comma and end of line:

p = re.compile(r'\((.*?)\)(?:,|$)')

      



Working demo

+3


source


for element in S[1:-1].split('),('):
    print element

      



+1


source


You can use the following regular expression.

>>> import re
>>> s = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"
>>> for i in re.findall(r"\(((?:'[^']*'|[^()])*)\)", s):
        print(i)


45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699

      

Explanation:

  • \(

    - Matches literal (character.
  • (

    - Start of the capture group.
  • (?:'[^']*'|[^()])*

    - The '[^'] * 'part greedily refers to a single quote. If it has any characters (

    in it )

    , it doesn't care. Because we have used [^']*

    that matches any character, but not '

    zero or more times. If the next character is not the start of a single quote, then the control jumps to the pattern that exists next to the character |

    , i.e. [^()]

    which matches any character but not (

    or )

    . Thus, all (?:'[^']*'|[^()])*

    coincide with one block kavychnym not char or any of (

    , )

    , zero or more times.
  • )

    end of the capture group.
  • \)

    literal).

DEMO

+1


source


Some simple approach would be negative lookahead - make sure no quotes follow after the parenthesis is closed, e.g.

import re
S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"

m = re.findall(r'\((.*?)\)(?![\'])', S)
for element in m:
    print element

      

prints

45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699

      

http://www.codeskulptor.org/#user39_CL89xhroV0_0.py

I put the quote in the character class (square brackets) so that you can add other characters that should have ignored the closing parenthesis.

0


source







All Articles