Incorrect content replacement
I am trying to replace the term brunch with only sentences that contain any of the following words: Saturday, Sunday, and / or weekend. However, it replaces the whole sentence and not just the term brunch.
>>> reg = re.compile(r'(?:(?:^|\.)[^.]*(?=saturday|sunday|weekend)[^.]*(brunch)[^.]*(?:\$|\.)|(?:^|\.)[^.]*(brunch)[^.]*(?=saturday|sunday|weekend)[^.]*(?:\$|\.))',re.I)
>>> str = 'Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid
with any other offers, no cash back. Valid only for Wednesday-Friday dinner and
Saturday-Sunday brunch. Not valid on federal holidays. Reservation required.'
>>> reg.findall(str)
[('brunch', '')]
>>> reg.sub(r'BRUNCH',str)
'Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid with any
other offers, no cash backBRUNCH Not valid on federal holidays. Reservation required.'
I want it to create the following:
Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid with any other
offers, no cash back. Valid only for Wednesday-Friday dinner and Saturday-Sunday BRUNCH.
Not valid on federal holidays. Reservation required.
Answer:
To solve this problem, I was able to use the following:
>>> reg = re.compile(r'(?:((?:^|\.)[^.]*(?=saturday|sunday|weekend)[^.]*)(brunch)([^.]*(?:\$|\.))|((?:^|\.)[^.]*)(brunch)([^.]*(?=saturday|sunday|weekend)[^.]*(?:\$|\.)))',re.I)
>>> reg.sub('\g<1>BRUNCH\g<3>',str)
'Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid with any other offers, no cash back. Valid only for Wednesday-Friday dinner and Saturday-Sunday BRUNCH. Not valid on federal holidays. Reservation required.'
+3
source to share
4 answers
Instead of using a regular expression, it's easier to break it down into steps:
s = "Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid with any other offers, no cash back. Valid only for Wednesday-Friday dinner and Saturday-Sunday brunch. Not valid on federal holidays. Reservation required."
results = []
for line in s.split("."):
if any(text in line.lower() for text in ("saturday", "sunday", "weekend")):
results.append(line.replace("brunch", "BRUNCH"))
else:
results.append(line)
result = ".".join(results)
print(result)
+3
source to share
Keep the regex that way and use the backreference instead:
reg = re.compile(r'((?:saturday|sunday|weekend)\s+)brunch', re.I)
reg.sub(r'\1BRUNCH',str)
'Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid with any other
offers, no cash back. Valid only for Wednesday-Friday dinner and Saturday-Sunday BRUNCH.
Not valid on federal holidays. Reservation required.'
+1
source to share
You don't have to use regex
for everyone, you can split the sentence and process each one separately and use a list comprehension instead:
>>> import re
>>> l=s.split('.')
>>> print '.'.join([re.sub('brunch','BRUNCH',sent) if 'Saturday' in sent or 'Sunday' in sent or 'Weekend' in sent else sent for sent in l])
'Limit 1 per person. Limit 1 per table. Not valid for carryout. Not valid
with any other offers, no cash back. Valid only for Wednesday-Friday dinner and
Saturday-Sunday BRUNCH. Not valid on federal holidays. Reservation required.'
0
source to share