How to extract part of a string
I have this line:
-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)
but actually I have a lot of lines like this:
a*p**(-1.0) + b*p**(c)
where a, b and c are double. And I would like to extract a, b and c of that line. How can I do this using Python?
import re
s = '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'
pattern = r'-?\d+\.\d*'
a,_,b,c = re.findall(pattern,s)
print(a, b, c)
Output
('-1007.88670550662', '67293.8347365694', '-0.416543501823503')
s
- this is your test line, and what is not pattern
- regular expression pattern, we are looking for float, and as soon as we find them with the help findall()
we return them back in a
, b
,c
Note. This method only works if your string is in the format that you specified. otherwise you can play with the pattern however you want.
Edit like most of the people mentioned in the comments, if you need to add +
before your positive numbers you can use this templater'[-+]?\d+\.\d*'
Using the reqular expression
(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)
We can do it
import re
pat = r'(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)'
regex = re.compile(pat)
print(regex.findall('-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'))
will print [('-1007.88670550662', '67293.8347365694', '-0.416543501823503')]
If your formats are consistent and you don't want to dig deeper into regex (look at regex101 for that, btw), you can just punch through it.
Here starts:
>>> s= "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"
>>> a, buf, c = s.split("*p**")
>>> b = buf.split()[-1]
>>> a,b,c
('-1007.88670550662', '67293.8347365694', '(-0.416543501823503)')
>>> [float(x.strip("()")) for x in (a,b,c)]
[-1007.88670550662, 67293.8347365694, -0.416543501823503]
Of course a module re
can be used to do this, although as noted in some of the comments on the other answers, corner cases can be interesting - decimal points, plus and minus signs, etc. This can be even more interesting; for example, can your number be imaginary?
In any case, if your string is always a valid Python expression, you can use Python's built-in tools to process it. Here is a good general explanation regarding the ast class NodeVisitor
. Using it for your example is pretty simple:
import ast
x = "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"
def getnums(s):
result = []
class GetNums(ast.NodeVisitor):
def visit_Num(self, node):
result.append(node.n)
def visit_UnaryOp(self, node):
if (isinstance(node.op, ast.USub) and
isinstance(node.operand, ast.Num)):
result.append(-node.operand.n)
else:
ast.NodeVisitor.generic_visit(self, node)
GetNums().visit(ast.parse(s))
return result
print(getnums(x))
This will return a list with all the numbers in your expression:
[-1007.88670550662, -1.0, 67293.8347365694, -0.416543501823503]
This method visit_UnaryOp
is only required for Python 3.x.
You can use something like:
import re
a,_,b,c = re.findall(r"[\d\-.]+", subject)
print(a,b,c)
Demo
Although I prefer MooingRawr's answer as it is simple, I would expand it a bit to cover more situations.
A floating point number can be converted to a string in an amazing variety of formats:
- Exponential format (for example
2.0e+07
) - Without a leading digit (for example
.5
, which is0.5
) - No trailing digit (for example
5.
, which is5
) - Positive numbers with a plus sign (for example
+5
, which is5
) - Numbers without decimal part (whole numbers) (for example,
0
or5
)
Script
import re
test_values = [
'-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)',
'-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)',
'+2.*p**(-1.0) + -1.*p**(5)',
'0*p**(-1.0) + .123*p**(7.89)'
]
pattern = r'([-+]?\.?\d+\.?\d*(?:[eE][-+]?\d+)?)'
for value in test_values:
print("Test with '%s':" % value)
matches = re.findall(pattern, value)
del matches[1]
print(matches, end='\n\n')
Output:
Test with '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)':
['-1007.88670550662', '67293.8347365694', '-0.416543501823503']
Test with '-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)':
['-2.000e+07', '1.23e+07', '-5e+07']
Test with '+2.*p**(-1.0) + -1.*p**(5)':
['+2.', '-1.', '5']
Test with '0*p**(-1.0) + .123*p**(7.89)':
['0', '.123', '7.89']