How to get the last N lines RELATIVE to another line in pandas (vectorial solution)?

I asked this question in the context of another, but I think I was trying to ask too many things at once. So, for simplicity:

I have a dataframe where a key is pressed for each test. I want to add a column that shows the last N rows. So if my data looks like this:

trial sid  key_pressed        RT  
1     S04            x  0.502242        
2     S04            m  0.348620      
3     S04            m  0.312491       
4     S04            x  0.342541      
5     S04            n  0.419384       
6     S04            n  0.348211      
7     S04            z  0.376369   

      

after that it will look like this (for each separate sid):

trial sid  key_pressed        RT           last_3
1     S04            x  0.502242        NaN
2     S04            m  0.348620        NaN
3     S04            m  0.312491        [x, m, m]
4     S04            x  0.342541        [m, m, x]
5     S04            n  0.419384        [m, x, n]
6     S04            n  0.348211        [x, n, n]
7     S04            z  0.376369        [n, n, z]

      

Is there a vectorized solution for this? I cannot figure out how to select relative rows. (New to pandas - never mind, thinking this way, for now)

UPDATE. Based on the advice of the authors below, I did this:

df['shifted'] = pd.concat([df.groupby('sid')['key_pressed'].shift(2) + df.groupby('sid')['key_pressed'].shift(1) + df.groupby('sid')['key_pressed'].shift(0)])

      

which created a string mxm

for example. What's better.

+3


source to share


4 answers


One way would be to use shift

to move the column of the corresponding column down n

and then concatenate the records (they are strings for us to use +

):

df.last_3 = df.key_pressed.shift(1) + ', ' + df.key_pressed.shift(2) + ', ' + df.key_pressed.shift(3)

      



This creates lines from the previous three entries, separated by comma and space (not lists). I wouldn't use lists in DataFrames if possible because things can get a little confusing.

+2


source


What do you want to do with these lists? Storing lists inside Series / DataFrames is usually not very convenient. Either way, it will bring you closer. You have to process nans

and then you're done.

In [6]: pd.concat([df.key_pressed.shift(i) for i in [0, 1, 2]], 1).apply(tuple, 1).map(list)
Out[6]: 
0    [x, nan, nan]
1      [m, x, nan]
2        [m, m, x]
3        [x, m, m]
4        [n, x, m]
5        [n, n, x]
6        [z, n, n]
dtype: object

      



Note that we need to convert to a tuple and then a list to avoid pandas by automatically taking our list and returning it to Series. Try this and you will see why it doesn't work:

pd.concat([df.key_pressed.shift(i) for i in [0, 1, 2]], 1).apply(list, 1)

      

+1


source


Oh - maybe this is the best solution. You can "shift" the data by a certain amount:

df['shifted'] = df.groupby('sid')['key_pressed'].shift(2)

      

Then I could create lists from this shifted data.

0


source


This solution avoids looping, but I'm not sure if it really counts as "vectorized", as once you start using it apply()

, I think you lose the performance benefits that a vector provides:

key_table = pd.concat(
    [df.key_pressed.shift(2), df.key_pressed.shift(1), df.key_pressed], 
    axis=1
)
 df['last_3'] = key_table.apply(
    lambda row: ', '.join(str(k) for k in row),
    axis=1
)

      

Output:

   trial  sid key_pressed        RT       last_3
0      1  S04           x  0.502242  nan, nan, x
1      2  S04           m  0.348620    nan, x, m
2      3  S04           m  0.312491      x, m, m
3      4  S04           x  0.342541      m, m, x
4      5  S04           n  0.419384      m, x, n
5      6  S04           n  0.348211      x, n, n
6      7  S04           z  0.376369      n, n, z

      

0


source







All Articles