Find all substrings with at least one group

I am trying to find in a string all substring that satisfies a condition.

Let's say we have a line:

s = 'some text 1a 2a 3 xx sometext 1b yyy some text 2b.'

      

I need to apply the search pattern {(one (group of words), two (other group of words), three (other group of words)), word}. The first three positions are optional, but there must be at least one of them. If so, I need a word after them. The output should be:

2a  1a  3 xx
1b  yyy
2b 

      

I wrote this expression:

find_it = re.compile(r"((?P<one>\b1a\s|\b1b\s)|" +
                    r"(?P<two>\b2a\s|\b2b\s)|" +
                    r"(?P<three>\b3\s|\b3b\s))+" +
                    r"(?P<word>\w+)?")

      

Each group contains many or different words (not 1a, 1b). And I cannot mix them into one group. This should be None

if the group is empty. Obviously, the result is wrong.

find_it.findall(s)
> 2a  1a  2a   3 xx
> 1b  1b    yyy

      

I am grateful for your help!

+3


source to share


1 answer


You can use the following regex:

>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b.']

      

Here I am just shorthand for your regex using character class and modifier ?

. The following regex has 2 parts:

[12][ab]|3b?

      

[12][ab]

will meet 1a

, 1b

, 2a

, 2b

and 3b?

will correspond to 3b

and 3

.



And if you don't need a dot at the end 2b

, you can use the following regex using positive prediction , which is more general than the previous regex (since creating \s

optionally is not a good idea in the first group):

>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b']

      

Also, if your numbers and example substrings are just instances, you can use [0-9][a-z]

as a general regex:

>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']

      

0


source







All Articles