How to get the value for a key in a string followed by another specific key = value set
my code is like:
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
pattern = r'title=(.*?) color=red'
print re.compile(pattern).search(string).group(0)
and i got
"title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht color=red title=xxxy red=anything title=xxxyyy color=red"
But I want to find all the content of "title" at once and then "color = red"
source to share
Do you need what immediately precedes color=red
? Then use
.*title=(.*?) color=red
Demo: https://regex101.com/r/sR4kN2/1
It greedily matches everything color=red
that comes before it , so only the title you want appears.
Alternatively, if you know there is a character that does not appear in the title, you can simplify by simply using a character class exception. For example, if you know you =
won't show up:
title=([^=]*?) color=red
Or, if you know no space will appear:
title=([^\s]*?) color=red
Third option, using a bit of code to find all red headers (assuming the input always alternates header, color):
for title, color in re.findall(r'title=(.*?) color=(.*?)\( |$\)'):
if color == 'red':
print title
source to share
If you want to get the last sub-regexp match before a specific regexp, the solution is to use the greedy skipper. For example:
>>> pattern = '.*title="([^"]*)".*color="#123"'
>>> text = 'title="123" color="#456" title="789" color="#123"'
>>> print(re.match(pattern, s).groups(1))
the first one .*
is greedy and it will skip as much as possible (thus skipping the first title
) backup to the one that matches the desired color.
As a simpler example, consider that
a(.*)b(.*)c
processed on
a1111b2222b3333c
will match 1111b2222
in the first group and 3333
in the second.
source to share
Why don't you skip the regular expressions and use some splitting functions instead:
search_title = False
found = None
string = "title=abcd color=green title=efgh color=blue title=xyxyx color=yellow title=whatIwaht colo\
r=red title=xxxy red=anything title=xxxyyy color=red"
parts = string.split()
for part in parts:
key, value = part.split('=', 1)
if search_title:
if key == 'title':
found = value
search_title = False
if key == 'color' and value == 'red':
search_title = True
print(found)
leads to
xxxy
Regexes are great, but they can sometimes cause headaches.
source to share