Catch alternate strings from OR operator using Regex in Python?
I would like to catch a specific part of the lines where there are alternative Regex branches. How can I safely refer to specific points in alternate paths. I was thinking about doing
m=re.match("(A(?P<name>.+)B|C(?P<name>.+)D)", text)
match=m.group("name")
but there is a conflict with name redefinition. Using an index m.group
by index would be problematic since this Regex comes from config files and I cannot guarantee the nesting / index value that will result in a match.
EDIT: The setup has texts to match and regex from separate sources. I would like to achieve
import re
for text in ["ABBC", "DEEEF", "GHHI"]:
for regex in ["(A(.+)C|D(.+)F)", "G(.+)I"]:
m=re.match(regex, text)
if m:
print(m.group(1)) # should actually match the middle characters, but doesn't work generally
break
The number of possible regexes may grow in the future, so it should be a general solution
source to share
A possible solution is to use lookahead assertions. If you replace (A(.+)C|D(.+)F)
regex with
^(?=A.+C$|D.+F$)[A-Z](.+)[A-Z]
then group(1)
will usually match the middle characters.
It says: if you are at the beginning of a line ( ^
) and one of the pending assertions inside (?=...)
succeeds, match the string [A-Z](.+)[A-Z]
.
source to share