Matching string pairs

I have the following sample line:

 R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6

      

I want to receive:

[('273141','C1'), ..., ('- 273141', 'C6')]

      

Numbers can be floating point numbers with exponential notation, i.e. - 2.5e-7

...

The current current regex looks like this:

re.findall(r'([+-]? \d+(\.\d*)?|\.\d+([eE][+-]?\d+)?)( [a-zA-Z0-9_]+)', split)

      

But he does not give the correct conclusion, what is wrong with him?

This is an example output:

(' 273141', '', '', ' C1')

      

or nothing matches.

-1


source to share


2 answers


I adapted the "Regex Floating Point Number" regex for you and shortened the regex a bit (note that there is no alternate list, which means less backward, and the option (?i)

is case insensitive to turn [A-Za-z]

into [a-z]

):

import re
s = "R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6"
print re.findall(r'(?i)([-+]?\s*\d*\.?\d+(?:[eE][-+]?\d+)?)(\s+\w+)', s)

      



Conclusion demos ideone :

[(' 273141', ' C1'), ('+ 273141', ' C2'), ('+ 273141', ' C3'), ('+ 273141', ' C4'), ('+ 273141', ' C5'), ('- 273141', ' C6')]

      

+1


source


findall

will put all submatrices in the results. In your case, blank lines come from unmatched decimal places if present; so use non-capturing groups instead:

([+-]? \d+(?:\.\d*)?|\.\d+(?:[eE][+-]?\d+)?) ([a-zA-Z0-9_]+)

      

I also moved the space to the second capture group, so you won't get this lead space.



demo version regex101

ideone demo

+2


source







All Articles