How to get the last N lines RELATIVE to another line in pandas (vectorial solution)?
I asked this question in the context of another, but I think I was trying to ask too many things at once. So, for simplicity:
I have a dataframe where a key is pressed for each test. I want to add a column that shows the last N rows. So if my data looks like this:
trial sid key_pressed RT
1 S04 x 0.502242
2 S04 m 0.348620
3 S04 m 0.312491
4 S04 x 0.342541
5 S04 n 0.419384
6 S04 n 0.348211
7 S04 z 0.376369
after that it will look like this (for each separate sid):
trial sid key_pressed RT last_3
1 S04 x 0.502242 NaN
2 S04 m 0.348620 NaN
3 S04 m 0.312491 [x, m, m]
4 S04 x 0.342541 [m, m, x]
5 S04 n 0.419384 [m, x, n]
6 S04 n 0.348211 [x, n, n]
7 S04 z 0.376369 [n, n, z]
Is there a vectorized solution for this? I cannot figure out how to select relative rows. (New to pandas - never mind, thinking this way, for now)
UPDATE. Based on the advice of the authors below, I did this:
df['shifted'] = pd.concat([df.groupby('sid')['key_pressed'].shift(2) + df.groupby('sid')['key_pressed'].shift(1) + df.groupby('sid')['key_pressed'].shift(0)])
which created a string mxm
for example. What's better.
source to share
One way would be to use shift
to move the column of the corresponding column down n
and then concatenate the records (they are strings for us to use +
):
df.last_3 = df.key_pressed.shift(1) + ', ' + df.key_pressed.shift(2) + ', ' + df.key_pressed.shift(3)
This creates lines from the previous three entries, separated by comma and space (not lists). I wouldn't use lists in DataFrames if possible because things can get a little confusing.
source to share
What do you want to do with these lists? Storing lists inside Series / DataFrames is usually not very convenient. Either way, it will bring you closer. You have to process nans
and then you're done.
In [6]: pd.concat([df.key_pressed.shift(i) for i in [0, 1, 2]], 1).apply(tuple, 1).map(list)
Out[6]:
0 [x, nan, nan]
1 [m, x, nan]
2 [m, m, x]
3 [x, m, m]
4 [n, x, m]
5 [n, n, x]
6 [z, n, n]
dtype: object
Note that we need to convert to a tuple and then a list to avoid pandas by automatically taking our list and returning it to Series. Try this and you will see why it doesn't work:
pd.concat([df.key_pressed.shift(i) for i in [0, 1, 2]], 1).apply(list, 1)
source to share
This solution avoids looping, but I'm not sure if it really counts as "vectorized", as once you start using it apply()
, I think you lose the performance benefits that a vector provides:
key_table = pd.concat(
[df.key_pressed.shift(2), df.key_pressed.shift(1), df.key_pressed],
axis=1
)
df['last_3'] = key_table.apply(
lambda row: ', '.join(str(k) for k in row),
axis=1
)
Output:
trial sid key_pressed RT last_3
0 1 S04 x 0.502242 nan, nan, x
1 2 S04 m 0.348620 nan, x, m
2 3 S04 m 0.312491 x, m, m
3 4 S04 x 0.342541 m, m, x
4 5 S04 n 0.419384 m, x, n
5 6 S04 n 0.348211 x, n, n
6 7 S04 z 0.376369 n, n, z
source to share