Pandas DataFrame eval with space in column names

Question

Pandas DataFrame eval with space in column names

I was looking at pandas DataFrame's eval method ( docs ) which I find to be good syntactic sugar and can also help improve performance .

This is an example from the docs:

from numpy.random import randn
import pandas as pd

df = pd.DataFrame(randn(10, 2), columns=list('ab'))
df.eval('a + b')

How can I use eval

when there is a space in the column names? Example:

df = pd.DataFrame(randn(10, 2), columns=["Col 1", "Col 2"])

I've tried this:

df.eval('"Col 1" + "Col 2"')

but this gives an error:

TypeError: data type "Col 1" not understood

+4

python eval pandas dataframe

FLab Jul 27. 17 at 12:39

source to share

3 answers

Thundzz · Answer 1 · 2017-07-27T12:45:37+0000

You can do this using:

df.eval(df["Col 1"] + df["Col 2"])

But this kind of contradicts the purpose of the eval function.

Alternatively, you can rename your columns to make them compatible with the eval syntax:

df.columns = df.columns.map(lambda x: x.replace(' ', '_'))

bunji · Answer 2 · 2017-07-27T12:51:17+0000

pd.eval('df["Col 1"] + df["Col 2"]')

This keeps the eval argument as a string, but is less clean than the example with no spaces in the column names

Example:

print(df)

      Col 1     Col 2
0 -0.206838 -1.007173
1 -0.762453  1.178220
2 -0.431943 -0.804775
3  0.830659 -0.244472
4  0.111637  0.943254
5  0.206615  0.436250
6 -0.568307 -0.680140
7 -0.127645 -0.098351
8  0.185413 -1.224999
9  0.767931  1.512654

print(pd.eval('df["Col 1"] + df["Col 2"]'))

0   -1.214011
1    0.415768
2   -1.236718
3    0.586188
4    1.054891
5    0.642865
6   -1.248447
7   -0.225995
8   -1.039586
9    2.280585
dtype: float64

EDIT

After some research, it looks like the above method works in either python 2.7 or 3.6 if you are using the python engine:

pd.eval('df["Col 1"] + df["Col 2"]', engine='python')

However, this does not give you the performance benefits that the engine can provide numexpr

. In python 2.7, this method works:

pd.eval('df["Col 1"] + df["Col 2"]', engine='numexpr')

but in python 3.6 you get an error ValueError: unknown type str160

.

I assume this is because pandas is passing the unicode string in numexpr

in 3.6, but byte in 2.7. I am guessing that this issue is related to this issue and maybe this too .

yts61 · Answer 3 · 2019-05-16T01:17:09+0000

thanks @Thundzz

    df.columns = df.columns.map(lambda x: x.replace(' ', '_'))

this snippet works well!

Pandas DataFrame eval with space in column names

More articles: