How do I perform conditional operations on data?

Given df

A = pd.DataFrame([[1, 5, 2, 1, 2], [2, 4, 4, 1, 2], [3, 3, 1, 1, 2], [4, 2, 2, 3, 0], 
              [5, 1, 4, 3, -4], [1, 5, 2, 3, -20], [2, 4, 4, 2, 0], [3, 3, 1, 2, -1], 
              [4, 2, 2, 2, 0], [5, 1, 4, 2, -2]],
             columns=['a', 'b', 'c', 'd', 'e'],
             index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

      

How do I create a column 'f'

that matches the last value in the column 'e'

before the value in the column changes 'd'

, and retains that value until the next value in the column changes, the 'd'

output will be:

    a   b   c   d   e    f
1   1   5   2   1   2    nan
2   2   4   4   1   2    nan
3   3   3   1   1   2    nan
4   4   2   2   3   0     2
5   5   1   4   3   -4    2
6   1   5   2   3   -20   2
7   2   4   4   2   0    -20
8   3   3   1   2   -1   -20
9   4   2   2   2   0    -20
10  5   1   4   2   -2   -20

      

Edit: @Noobie

presented is a solution that when applied in real data breaks when in a column'd'

less than the previous value.
+3


source to share


2 answers


I think we should offer better support for dealing with contiguous groups, but until then, you can use the compare-cumsum-groupby pattern:

g = (A["d"] != A["d"].shift()).cumsum()
A["f"] = A["e"].groupby(g).last().shift().loc[g].values

      

which gives me

In [41]: A
Out[41]: 
    a  b  c  d   e     f
1   1  5  2  1   2   NaN
2   2  4  4  1   2   NaN
3   3  3  1  1   2   NaN
4   4  2  2  2   0   2.0
5   5  1  4  2  -4   2.0
6   1  5  2  2 -20   2.0
7   2  4  4  3   0 -20.0
8   3  3  1  3  -1 -20.0
9   4  2  2  3   0 -20.0
10  5  1  4  3  -2 -20.0

      




This works because it g

is a score corresponding to each contiguous group of values d

. Note that in this case, using the example you provided, g is the same as column "d", but this is not required. If we have g

, we can use it to group the column e

:

In [55]: A["e"].groupby(g).last()
Out[55]: 
d
1     2
2   -20
3    -2
Name: e, dtype: int64

      

and then

In [57]: A["e"].groupby(g).last().shift()
Out[57]: 
d
1     NaN
2     2.0
3   -20.0
Name: e, dtype: float64

In [58]: A["e"].groupby(g).last().shift().loc[g]
Out[58]: 
d
1     NaN
1     NaN
1     NaN
2     2.0
2     2.0
2     2.0
3   -20.0
3   -20.0
3   -20.0
3   -20.0
Name: e, dtype: float64

      

+4


source


easy my friend. untie POWER OF PANDAS

!



A.sort_values(by = 'd', inplace = True)
A['lag'] = A.e.shift(1)
A['output'] = A.groupby('d').lag.transform(lambda x : x.iloc[0])

A
Out[57]: 
    a  b  c  d   e   lag  output
1   1  5  2  1   2   NaN     NaN
2   2  4  4  1   2   2.0     NaN
3   3  3  1  1   2   2.0     NaN
4   4  2  2  2   0   2.0     2.0
5   5  1  4  2  -4   0.0     2.0
6   1  5  2  2 -20  -4.0     2.0
7   2  4  4  3   0 -20.0   -20.0
8   3  3  1  3  -1   0.0   -20.0
9   4  2  2  3   0  -1.0   -20.0
10  5  1  4  3  -2   0.0   -20.0

      

+1


source







All Articles