Pandas DataFrame eval with space in column names
I was looking at pandas DataFrame's eval method ( docs ) which I find to be good syntactic sugar and can also help improve performance .
This is an example from the docs:
from numpy.random import randn
import pandas as pd
df = pd.DataFrame(randn(10, 2), columns=list('ab'))
df.eval('a + b')
How can I use eval
when there is a space in the column names? Example:
df = pd.DataFrame(randn(10, 2), columns=["Col 1", "Col 2"])
I've tried this:
df.eval('"Col 1" + "Col 2"')
but this gives an error:
TypeError: data type "Col 1" not understood
source to share
pd.eval('df["Col 1"] + df["Col 2"]')
This keeps the eval argument as a string, but is less clean than the example with no spaces in the column names
Example:
print(df)
Col 1 Col 2
0 -0.206838 -1.007173
1 -0.762453 1.178220
2 -0.431943 -0.804775
3 0.830659 -0.244472
4 0.111637 0.943254
5 0.206615 0.436250
6 -0.568307 -0.680140
7 -0.127645 -0.098351
8 0.185413 -1.224999
9 0.767931 1.512654
print(pd.eval('df["Col 1"] + df["Col 2"]'))
0 -1.214011
1 0.415768
2 -1.236718
3 0.586188
4 1.054891
5 0.642865
6 -1.248447
7 -0.225995
8 -1.039586
9 2.280585
dtype: float64
EDIT
After some research, it looks like the above method works in either python 2.7 or 3.6 if you are using the python engine:
pd.eval('df["Col 1"] + df["Col 2"]', engine='python')
However, this does not give you the performance benefits that the engine can provide numexpr
. In python 2.7, this method works:
pd.eval('df["Col 1"] + df["Col 2"]', engine='numexpr')
but in python 3.6 you get an error ValueError: unknown type str160
.
I assume this is because pandas is passing the unicode string in numexpr
in 3.6, but byte in 2.7. I am guessing that this issue is related to this issue and maybe this too .
source to share