Regex finds non digit and / or end of string
How do I include the end of a line and one non-digit character in a python 2.6 regex set to search?
I want to find ten digit numbers with no digit at the beginning and no digit or end of string at the end. This is a 10 digit ISBN number, and "X" is valid for the last digit.
The following steps don't work:
is10 = re.compile(r'\D(\d{9}[\d|X|x])[$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\Z|\D]')
The problem occurs with the last set: [\ $ | \ D] to match a non-digit or trailing string.
Test using
line = "abcd0123456789"
m = is10.search(line)
print m.group(1)
line = "abcd0123456789efg"
m = is10.search(line)
print m.group(1)
source to share
You need to group the alternatives with parentheses, not parentheses:
r'\D(\d{9}[\dXx])($|\D)'
|
is a different construction than []
. It denotes an alternative between two patterns and []
matches one of the contained characters. Therefore, |
it should only be used internally []
if you want to match the actual symbol |
. Portions of patterns are grouped using parentheses, so they should be used to limit the scope of the marked alternative |
.
If you want to avoid this, this creates groups of matches, you can use instead (?: )
:
r'\D(\d{9}[\dXx])(?:$|\D)'
source to share