How to extract part of a string

I have this line:

-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)

      

but actually I have a lot of lines like this:

a*p**(-1.0) + b*p**(c)

      

where a, b and c are double. And I would like to extract a, b and c of that line. How can I do this using Python?

+3


source to share


6 answers


import re
s = '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'
pattern = r'-?\d+\.\d*'  

a,_,b,c = re.findall(pattern,s)
print(a, b, c)

      

Output

('-1007.88670550662', '67293.8347365694', '-0.416543501823503')

      



s

- this is your test line, and what is not pattern

- regular expression pattern, we are looking for float, and as soon as we find them with the help findall()

we return them back in a

, b

,c

Note. This method only works if your string is in the format that you specified. otherwise you can play with the pattern however you want.

Edit like most of the people mentioned in the comments, if you need to add +

before your positive numbers you can use this templater'[-+]?\d+\.\d*'

+3


source


Using the reqular expression

(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)

      

We can do it



import re

pat = r'(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)'

regex = re.compile(pat)

print(regex.findall('-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'))

      

will print [('-1007.88670550662', '67293.8347365694', '-0.416543501823503')]

+1


source


If your formats are consistent and you don't want to dig deeper into regex (look at regex101 for that, btw), you can just punch through it.

Here starts:

>>> s= "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"
>>> a, buf, c = s.split("*p**")
>>> b = buf.split()[-1]
>>> a,b,c
('-1007.88670550662', '67293.8347365694', '(-0.416543501823503)')
>>> [float(x.strip("()")) for x in (a,b,c)]
[-1007.88670550662, 67293.8347365694, -0.416543501823503]

      

+1


source


Of course a module re

can be used to do this, although as noted in some of the comments on the other answers, corner cases can be interesting - decimal points, plus and minus signs, etc. This can be even more interesting; for example, can your number be imaginary?

In any case, if your string is always a valid Python expression, you can use Python's built-in tools to process it. Here is a good general explanation regarding the ast class NodeVisitor

. Using it for your example is pretty simple:

import ast

x = "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"

def getnums(s):
    result = []
    class GetNums(ast.NodeVisitor):
        def visit_Num(self, node):
            result.append(node.n)
        def visit_UnaryOp(self, node):
            if (isinstance(node.op, ast.USub) and
                isinstance(node.operand, ast.Num)):
                result.append(-node.operand.n)
            else:
                ast.NodeVisitor.generic_visit(self, node)
    GetNums().visit(ast.parse(s))
    return result

print(getnums(x))

      

This will return a list with all the numbers in your expression:

[-1007.88670550662, -1.0, 67293.8347365694, -0.416543501823503]

      

This method visit_UnaryOp

is only required for Python 3.x.

+1


source


You can use something like:

import re
a,_,b,c = re.findall(r"[\d\-.]+", subject)
print(a,b,c)

      

Demo

0


source


Although I prefer MooingRawr's answer as it is simple, I would expand it a bit to cover more situations.

A floating point number can be converted to a string in an amazing variety of formats:

  • Exponential format (for example 2.0e+07

    )
  • Without a leading digit (for example .5

    , which is 0.5

    )
  • No trailing digit (for example 5.

    , which is 5

    )
  • Positive numbers with a plus sign (for example +5

    , which is 5

    )
  • Numbers without decimal part (whole numbers) (for example, 0

    or 5

    )

Script

import re

test_values = [
    '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)',
    '-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)',
    '+2.*p**(-1.0) + -1.*p**(5)',
    '0*p**(-1.0) + .123*p**(7.89)'
]

pattern = r'([-+]?\.?\d+\.?\d*(?:[eE][-+]?\d+)?)'

for value in test_values:
    print("Test with '%s':" % value)
    matches = re.findall(pattern, value)
    del matches[1]
    print(matches, end='\n\n')

      

Output:

Test with '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)':
['-1007.88670550662', '67293.8347365694', '-0.416543501823503']

Test with '-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)':
['-2.000e+07', '1.23e+07', '-5e+07']

Test with '+2.*p**(-1.0) + -1.*p**(5)':
['+2.', '-1.', '5']

Test with '0*p**(-1.0) + .123*p**(7.89)':
['0', '.123', '7.89']

      

0


source







All Articles