How to get the value for a key in a string followed by another specific key = value set

my code is like:

string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
pattern = r'title=(.*?) color=red'
print re.compile(pattern).search(string).group(0)

      

and i got

"title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"

      

But I want to find all the content of "title" at once and then "color = red"

+3


source to share


4 answers


Do you need what immediately precedes color=red

? Then use

.*title=(.*?) color=red

      

Demo: https://regex101.com/r/sR4kN2/1

It greedily matches everything color=red

that comes before it , so only the title you want appears.


Alternatively, if you know there is a character that does not appear in the title, you can simplify by simply using a character class exception. For example, if you know you =

won't show up:



title=([^=]*?) color=red

      

Or, if you know no space will appear:

title=([^\s]*?) color=red

      


Third option, using a bit of code to find all red headers (assuming the input always alternates header, color):

for title, color in re.findall(r'title=(.*?) color=(.*?)\( |$\)'):
    if color == 'red':
        print title

      

+1


source


If you want to get the last sub-regexp match before a specific regexp, the solution is to use the greedy skipper. For example:

>>> pattern = '.*title="([^"]*)".*color="#123"'
>>> text = 'title="123" color="#456" title="789" color="#123"'
>>> print(re.match(pattern, s).groups(1))

      

the first one .*

is greedy and it will skip as much as possible (thus skipping the first title

) backup to the one that matches the desired color.

As a simpler example, consider that



a(.*)b(.*)c

      

processed on

a1111b2222b3333c

      

will match 1111b2222

in the first group and 3333

in the second.

+1


source


Why don't you skip the regular expressions and use some splitting functions instead:

search_title = False
found = None
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht colo\
r=red title=xxxy red=anything title=xxxyyy color=red"
parts = string.split()
for part in parts:
    key, value = part.split('=', 1)
    if search_title:
        if key == 'title':
            found = value
        search_title = False
    if key == 'color' and value == 'red':
        search_title = True
print(found)

      

leads to

xxxy

      

Regexes are great, but they can sometimes cause headaches.

0


source


Try using re module

>>>string = 'title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red'
>>>import re
>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'whatIwaht'

>>>re.search('(.*title=?)(.*) color=red', string).group(2)
'xyxyx'

      

0


source







All Articles