Catch alternate strings from OR operator using Regex in Python?

I would like to catch a specific part of the lines where there are alternative Regex branches. How can I safely refer to specific points in alternate paths. I was thinking about doing

m=re.match("(A(?P<name>.+)B|C(?P<name>.+)D)", text)
match=m.group("name")

      

but there is a conflict with name redefinition. Using an index m.group

by index would be problematic since this Regex comes from config files and I cannot guarantee the nesting / index value that will result in a match.

EDIT: The setup has texts to match and regex from separate sources. I would like to achieve

import re

for text in ["ABBC", "DEEEF", "GHHI"]:
    for regex in ["(A(.+)C|D(.+)F)", "G(.+)I"]:
        m=re.match(regex, text)
        if m:
            print(m.group(1)) # should actually match the middle characters, but doesn't work generally
            break

      

The number of possible regexes may grow in the future, so it should be a general solution

+3


source to share


2 answers


A possible solution is to use lookahead assertions. If you replace (A(.+)C|D(.+)F)

regex with

^(?=A.+C$|D.+F$)[A-Z](.+)[A-Z]

then group(1)

will usually match the middle characters.



It says: if you are at the beginning of a line ( ^

) and one of the pending assertions inside (?=...)

succeeds, match the string [A-Z](.+)[A-Z]

.

0


source


Your example can be made to work by changing

            print(m.group(1))

      

to



            print(filter(None, m.groups())[0])

      

(just taking the group containing the match).

0


source







All Articles