Python 3 regex - find all matching start and end index matches in string

This was my original approach:

string = '1'*15     
result = re.finditer(r'(?=11111)', string)      # overlapped = True   
                                                # Doesn't work for me 
for i in result:                                # python 3.5
   print(i.start(), i.end())

      

It finds all matching matches, but does not get the right end index. Output:

1 <_sre.SRE_Match object; span=(0, 0), match=''>
2 <_sre.SRE_Match object; span=(1, 1), match=''>
3 <_sre.SRE_Match object; span=(2, 2), match=''>
4 <_sre.SRE_Match object; span=(3, 3), match=''>
(and so on..)

      

My question is: How do I find all the matched matches and also get all the start and end indices?

+3


source to share


2 answers


The problem you are getting is that the lookahead is a zero-width assertion that consumes (i.e. adds to the match result) no text. It's just a position on the line. This way, all of your matches start and end at the same place in the string.

You need to enclose the lookahead pattern with a capturing group (i.e. (?=(11111))

) and access the start and end of group 1 (with i.start(1)

and i.end(1)

):

import re
s = '1'*15     
result = re.finditer(r'(?=(11111))', s)

for i in result:
    print(i.start(1), i.end(1))

      



See Python demo , its result

(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)
(5, 10)
(6, 11)
(7, 12)
(8, 13)
(9, 14)
(10, 15)

      

+3


source


You can compare to this implementation and see where the differences might be.

match = re.finditer(r'111','test111 end111 and another 111')
for i in match:
    print(i.start(),i.end()

      



If that doesn't work, you kindly share sample data

+1


source







All Articles