Matching string pairs
I have the following sample line:
R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6
I want to receive:
[('273141','C1'), ..., ('- 273141', 'C6')]
Numbers can be floating point numbers with exponential notation, i.e. - 2.5e-7
...
The current current regex looks like this:
re.findall(r'([+-]? \d+(\.\d*)?|\.\d+([eE][+-]?\d+)?)( [a-zA-Z0-9_]+)', split)
But he does not give the correct conclusion, what is wrong with him?
This is an example output:
(' 273141', '', '', ' C1')
or nothing matches.
-1
source to share
2 answers
import re
s = "R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6"
print re.findall(r'(?i)([-+]?\s*\d*\.?\d+(?:[eE][-+]?\d+)?)(\s+\w+)', s)
Conclusion demos ideone :
[(' 273141', ' C1'), ('+ 273141', ' C2'), ('+ 273141', ' C3'), ('+ 273141', ' C4'), ('+ 273141', ' C5'), ('- 273141', ' C6')]
+1
source to share
findall
will put all submatrices in the results. In your case, blank lines come from unmatched decimal places if present; so use non-capturing groups instead:
([+-]? \d+(?:\.\d*)?|\.\d+(?:[eE][+-]?\d+)?) ([a-zA-Z0-9_]+)
I also moved the space to the second capture group, so you won't get this lead space.
+2
source to share