Python Regex gets everything in parentheses if only in quotes
Given the line
S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"
I would like to extract everything in parentheses IF the pairs are inside a quote. So far, I've managed to get everything in parentheses, but I can't figure out how to stop splitting in the inner parenthesis inside the quotes. My current code:
import re
S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"
p = re.compile( "\((.*?)\)" )
m =p.findall(S)
for element in m:
print element
I want to:
45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699
I am currently getting:
45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation
45795362,-1,'!!_(disambiguation
What can I do to ignore the inner guy?
Thank!
In case it helps, here are the topics I looked at:
1) REGEX-String and escaped quote
source to share
You can use the following regular expression.
>>> import re
>>> s = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"
>>> for i in re.findall(r"\(((?:'[^']*'|[^()])*)\)", s):
print(i)
45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699
Explanation:
-
\(
- Matches literal (character. -
(
- Start of the capture group. -
(?:'[^']*'|[^()])*
- The '[^'] * 'part greedily refers to a single quote. If it has any characters(
in it)
, it doesn't care. Because we have used[^']*
that matches any character, but not'
zero or more times. If the next character is not the start of a single quote, then the control jumps to the pattern that exists next to the character|
, i.e.[^()]
which matches any character but not(
or)
. Thus, all(?:'[^']*'|[^()])*
coincide with one block kavychnym not char or any of(
,)
, zero or more times. -
)
end of the capture group. -
\)
literal).
source to share
Some simple approach would be negative lookahead - make sure no quotes follow after the parenthesis is closed, e.g.
import re
S = "(45171924,-1,'AbuseFilter/658',2600),(43795362,-1,'!!_(disambiguation)',2600),(45795362,-1,'!!_(disambiguation)',2699)"
m = re.findall(r'\((.*?)\)(?![\'])', S)
for element in m:
print element
prints
45171924,-1,'AbuseFilter/658',2600
43795362,-1,'!!_(disambiguation)',2600
45795362,-1,'!!_(disambiguation)',2699
http://www.codeskulptor.org/#user39_CL89xhroV0_0.py
I put the quote in the character class (square brackets) so that you can add other characters that should have ignored the closing parenthesis.
source to share