Python 3 regex - find all matching start and end index matches in string
This was my original approach:
string = '1'*15
result = re.finditer(r'(?=11111)', string) # overlapped = True
# Doesn't work for me
for i in result: # python 3.5
print(i.start(), i.end())
It finds all matching matches, but does not get the right end index. Output:
1 <_sre.SRE_Match object; span=(0, 0), match=''>
2 <_sre.SRE_Match object; span=(1, 1), match=''>
3 <_sre.SRE_Match object; span=(2, 2), match=''>
4 <_sre.SRE_Match object; span=(3, 3), match=''>
(and so on..)
My question is: How do I find all the matched matches and also get all the start and end indices?
source to share
The problem you are getting is that the lookahead is a zero-width assertion that consumes (i.e. adds to the match result) no text. It's just a position on the line. This way, all of your matches start and end at the same place in the string.
You need to enclose the lookahead pattern with a capturing group (i.e. (?=(11111))
) and access the start and end of group 1 (with i.start(1)
and i.end(1)
):
import re
s = '1'*15
result = re.finditer(r'(?=(11111))', s)
for i in result:
print(i.start(1), i.end(1))
See Python demo , its result
(0, 5) (1, 6) (2, 7) (3, 8) (4, 9) (5, 10) (6, 11) (7, 12) (8, 13) (9, 14) (10, 15)
source to share