Matching string pairs

Question

Matching string pairs

I have the following sample line:

 R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6

I want to receive:

[('273141','C1'), ..., ('- 273141', 'C6')]

Numbers can be floating point numbers with exponential notation, i.e. - 2.5e-7

...

The current current regex looks like this:

re.findall(r'([+-]? \d+(\.\d*)?|\.\d+([eE][+-]?\d+)?)( [a-zA-Z0-9_]+)', split)

But he does not give the correct conclusion, what is wrong with him?

This is an example output:

(' 273141', '', '', ' C1')

or nothing matches.

-1

python python-2.7 regex

quesaionasis May 01 '15 at 22:37

source to share

2 answers

findall

will put all submatrices in the results. In your case, blank lines come from unmatched decimal places if present; so use non-capturing groups instead:

([+-]? \d+(?:\.\d*)?|\.\d+(?:[eE][+-]?\d+)?) ([a-zA-Z0-9_]+)

I also moved the space to the second capture group, so you won't get this lead space.

demo version regex101

ideone demo

+2

Jerry May 01 '15 at 22:44

source to share

Wiktor Stribiżew · Accepted Answer · 2015-05-01T22:50:18+0000

I adapted the "Regex Floating Point Number" regex for you and shortened the regex a bit (note that there is no alternate list, which means less backward, and the option (?i)

is case insensitive to turn [A-Za-z]

into [a-z]

):

import re
s = "R10666: 273141 C1 + 273141 C2 + 273141 C3 + 273141 C4 + 273141 C5 - 273141 C6"
print re.findall(r'(?i)([-+]?\s*\d*\.?\d+(?:[eE][-+]?\d+)?)(\s+\w+)', s)

Conclusion demos ideone :

[(' 273141', ' C1'), ('+ 273141', ' C2'), ('+ 273141', ' C3'), ('+ 273141', ' C4'), ('+ 273141', ' C5'), ('- 273141', ' C6')]

Matching string pairs

More articles: