Filtering pandas dataframe with table function
I am trying to filter a pandas df dataframe containing columns A
and B
, using a tabulated function: I want to store all values where the value df.B
exceeds the limit that depends on df.A
, i.e. df.B>limit[i]
where i computed from df.A
as i=floor(df.A)
.
Unfortunately, I am not that experienced in Python. But it is clear to me that the comparison is very inefficient and I found a function df.query
that seems useful for doing this, but I was not successful.
Excerpt of the data frame df:
datetime A B
2014-05-31 03:30:00 2201 18.2
2014-05-31 03:40:00 2208 18.7
2014-05-31 03:50:00 2205 20.6
2014-05-31 04:00:00 2202 19.9
2014-05-31 04:10:00 22 18.2
2014-05-31 04:20:00 2204 18.2
2014-05-31 04:30:00 2198 18.1
2014-05-31 04:40:00 2204 19.1
2014-05-31 04:50:00 2202 20.3
2014-05-31 05:00:00 2205 20.8
...
limit = [ 0.0, 10.0, 40.0, 100.0, 240.0, 300.0, 480.0, 800.0, 1000.0, 1400.0, 1600.0, 1800.0,
1900.0, 1900.0, 1900.0, 1900.0, 1900.0, 1900.0, 1900.0, 1900.0, 1900.0 ]
I tried the following:
In [54]: df.query ( "df['A'] > limit[floor(df['B'])]")
...
NotImplementedError: 'Call' nodes are not implemented
In [55]: df.query ( "df['A'] > limit[floor(2.3)]")
...
NotImplementedError: 'Call' nodes are not implemented
In [56]: df.query ( "df['A'] > limit[12]" )
Out[56]:
datetime A B
2014-05-31 03:30:00 2201 18.2
2014-05-31 03:40:00 2208 18.7
2014-05-31 03:50:00 2205 20.6
...
(it works)
I am using Python 2.7.6.2, Numpy 1.8.0 and pandas 0.13.0
Can anyone explain to me why I can't use a function floor
inside this expression, respectively, how to do this filtering?
Thank.
source to share
I think you are trying to do something like this?
lim = np.array(limit)
df['C'] = lim[ df.B.astype(int).values ]
A B C
0 2201 18.2 1900
1 2208 18.7 1900
2 2205 20.6 1900
I just use whole truncation instead of floor to keep the pitch. And I think that no matter what you are trying to do, you need to use a numpy array, not a python list.
source to share