Parsing and evaluating arbitration equations with python
I would like to separate data from code in a python project. The data consists of pandas DataFrames and equations linking the columns of the DataFrame.
I am currently hard-coded the equations in my script, so I cannot dynamically modify them (load into all new DataFrames and the corresponding equations). Any suggestions on how to create equivalent functions?
eg:
#Given a pandas data frame:
import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns= ["a", "b", "c"])
# and a simple arbitrary expression such as:
equation1 ="a + 2*a/b"
#how to dynamically create a function equivalent to:
def f(df):
return(df['a']+2*df['a']/df['b']) # hard-coded equation 1
source to share
You can use DataFrame.eval
:
equation1 ="a + 2*a/b"
print (df.eval(equation1))
0 2.00
1 5.60
2 8.75
dtype: float64
source to share
You can assign it back to the framework with the assignment
df.assign(eq1=df.eval(equation1))
a b c eq1
0 1 2 3 2.00
1 4 5 6 5.60
2 7 8 9 8.75
You can make it even more dynamic using a dictionary of equations
deq = dict(
EQ1='a + 2 * a / b',
EQ2='c ** 2 / (a + b)'
)
df.assign(**{k: df.eval(v) for k, v in deq.items()})
a b c EQ1 EQ2
0 1 2 3 2.00 3.0
1 4 5 6 5.60 4.0
2 7 8 9 8.75 5.4
source to share
Another solution similar to @ piRSquared's solution that allows you to evaluate multiple equations in one step:
In [442]: equations = """
...: EQ1 = a + 2 * a / b
...: EQ2 =c ** 2 / (a + b)
...: """
In [443]: df.eval(equations, inplace=False)
Out[443]:
a b c EQ1 EQ2
0 1 2 3 2.00 3.0
1 4 5 6 5.60 4.0
2 7 8 9 8.75 5.4
NOTE: it is better to use multi-line expressions - in this case you can use previously calculated variables
Demo:
In [444]: equations = """
...: EQ1 = a + 2 * a / b
...: EQ2 = EQ1**2
...: """
In [445]: df.eval(equations, inplace=False)
Out[445]:
a b c EQ1 EQ2
0 1 2 3 2.00 4.0000
1 4 5 6 5.60 31.3600
2 7 8 9 8.75 76.5625
source to share