How do I fix this regex in python?
I want to process some string date which is printed like this
'node0, node1 0.04, node8 11.11, node14 72.21\n'
'node1, node46 1247.25, node6 20.59, node13 64.94\n'
I want to find all floats here, this is the code I am using
for node in nodes
pattern= re.compile('(?<!node)\d+.\d+')
distance = pattern.findall(node)
however the result is the same
['0.04', '11.11', '4 72']
while i want it
['0.04', '11.11', '72.21']
Any suggestion on fixing this regex?
source to share
In regular expressions, a character is .
interpreted as a wildcard and can match (almost) any character. So your search pattern actually allows a digit or a set of digits, followed by any character, followed by another digit or set of digits. To stop this interpretation of the dot character, print it with a backslash \
.
(As an aside: you don't need to compile the regex pattern inside your loop. This will actually slow down your code.)
pattern = re.compile('(?<!node)\d+\.\d+')
for node in nodes:
distance = pattern.findall(node)
print distance
output:
['0.04', '11 .11 ', '72 .21']
['1247.25', '20 .59 ', '64 .94']
source to share