Regex expression after quote in python
I am trying to develop a Python program that will get the artist name from a twit from Pandora. For example, if I have this twitter:
I'm listening to Luther Vandross "I Can Do It Better" on Pandora #pandora http://t.co/ieDbLC393F .
I would only like to receive the name of Luther Vandross. I don't know much about regex, so I tried to do the following code:
print re.findall('".+?" by [\w+]+', text)
But the result was "I Can Do Better" by Luther
Do you have any idea on how I can design a regex in python to get it?
source to share
>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'''
>>> import re
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s)
>>> m
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P>
>>> m.groups()
('I Can Make It Better', 'Luther Vandross')
Additional test cases:
>>> tests = [
'''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''',
'''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''',
'''I'm listening to "It Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''',
'''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''',
'''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1'''
'''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''',
'''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun'''
]
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora')
>>> for s in tests:
print(expr.search(s).groups())
("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB')
('G.O.D. Remix', 'Canton Jones')
("It Been Awhile", '@staindmusic')
('Everlong', '@foofighters')
('El Preso (2000)', 'Fruko Y Sus Tesos')
("Space Age Pimpin'", '8 Ball & MJG')
source to share
Your regex is nearby, but you can change the delimiters to use " by
and on
. However, you need to use capturing groups with parentheses.
You can use regex like this:
" by (.+?) on
The idea behind this regex is to grab content between " by
and on
using a simple nongreedy regex.
Match info
MATCH 1
1. [43-58] `Luther Vandross`
code
import re
p = re.compile(ur'" by (.+?) on')
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n"
re.search(p, test_str)
source to share