Regex expression after quote in python

I am trying to develop a Python program that will get the artist name from a twit from Pandora. For example, if I have this twitter:

I'm listening to Luther Vandross "I Can Do It Better" on Pandora #pandora http://t.co/ieDbLC393F .

I would only like to receive the name of Luther Vandross. I don't know much about regex, so I tried to do the following code:

print  re.findall('".+?" by [\w+]+',  text)    

      

But the result was "I Can Do Better" by Luther

Do you have any idea on how I can design a regex in python to get it?

+3


source to share


5 answers


>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'''

>>> import re
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s)
>>> m
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P>
>>> m.groups()
('I Can Make It Better', 'Luther Vandross')

      

Additional test cases:



>>> tests = [
    '''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''',
    '''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''',
    '''I'm listening to "It Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''',
    '''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''',
    '''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1'''
    '''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''',
    '''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun'''
]
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora')
>>> for s in tests:
        print(expr.search(s).groups())

("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB')
('G.O.D. Remix', 'Canton Jones')
("It Been Awhile", '@staindmusic')
('Everlong', '@foofighters')
('El Preso (2000)', 'Fruko Y Sus Tesos')
("Space Age Pimpin'", '8 Ball & MJG')

      

+2


source


Your regex is nearby, but you can change the delimiters to use " by

and on

. However, you need to use capturing groups with parentheses.

You can use regex like this:

" by (.+?) on

      

Working demo

Regular expression visualization



The idea behind this regex is to grab content between " by

and on

using a simple nongreedy regex.

Match info

MATCH 1
1.  [43-58] `Luther Vandross`

      

code

import re
p = re.compile(ur'" by (.+?) on')
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n"

re.search(p, test_str)

      

+3


source


You need to use a capture group.

print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})',  text)  

      

I used the repetition quantifier since the first name can only contain first name, first name, last name, first name, last name.

+2


source


print  re.findall('".+?" by ((?:[A-Z][a-z]+ )+)',  text)   

      

You can try this. See demo.

https://regex101.com/r/vH0iN5/5

+1


source


You can use this search based regex:

str = 'I\'m listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.';
print re.search(r'(?<=by ).+?(?= on)', str).group()
Luther Vandross

      

+1


source







All Articles