Populate the column using the previous value in the column and some calculations in pandas
My dataframe looks like this:
scale cons hold supply add.supply s_res z_res
48 -5 NaN NaN NaN NaN NaN NaN
49 -4 NaN NaN NaN NaN NaN NaN
50 -3 NaN NaN NaN NaN NaN NaN
51 -2 NaN NaN NaN NaN NaN NaN
52 -1 NaN NaN NaN NaN NaN NaN
53 0 0 300 0 NaN 100 200
54 1 20 NaN 0 NaN 200 322
55 2 30 NaN 70 NaN 100 100
56 3 25 NaN 0 NaN 400 110
57 4 15 NaN 0 NaN 100 300
58 5 10 NaN 0 NaN 100 180
59 6 40 NaN 0 NaN 100 100
...
I need to do the following:
Starting at a value where to scale = 1
fill the column with hold
values ββcalculated as follows:
I take the previous value in the column hold
and subtract from it the corresponding value of the current cell from the column cons
and add the corresponding value from the column supply
.
(For the cell in the column hold
that matches scale = 1
, it will be (300 - 20) + 0 = 280
, for the next cell (280 - 30) + 70) = 320
, for the next cell (320 - 25) + 0) = 295
, etc.)
If the value in the column is hold
less than the corresponding value in the column s_res
, then in the next cell I must add the difference between the corresponding values ββof the next cell in the columns s_res
and z_res
.
For example, the value in the column hold
is 295
where scale = 3
. This value is less than the value in the column s_res = 400
. Then you have the following meanings: (295 - 15) + 0 + (300 - 100) = 480
. And write this difference between s_res
and z_res
in the column add.supply
.
I need each new calculated value in a column hold
to check if this value is less than the value in the column s_res
.
The result should look like this:
scale cons hold supply add.supply s_res z_res
48 -5 NaN NaN NaN NaN NaN NaN
49 -4 NaN NaN NaN NaN NaN NaN
50 -3 NaN NaN NaN NaN NaN NaN
51 -2 NaN NaN NaN NaN NaN NaN
52 -1 NaN NaN NaN NaN NaN NaN
53 0 0 300 0 NaN 100 200
54 1 20 280 0 NaN 200 322
55 2 30 320 70 NaN 100 100
56 3 25 295 0 NaN 400 110
57 4 15 480 0 200 100 300
58 5 10 470 0 NaN 100 180
59 6 40 430 0 NaN 100 100
...
I would be grateful for any advice.
UPD I tried to apply the code
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()
to a larger data frame and I am having problems.
My new framework
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 NaN 0 NaN 200 322
2 2 30 NaN 70 NaN 100 100
3 3 25 NaN 0 NaN 400 110
4 4 15 NaN 0 NaN 100 300
5 5 10 NaN 0 NaN 100 180
6 6 40 NaN 0 NaN 100 100
7 7 60 NaN 0 NaN 300 400
8 8 50 NaN 0 NaN 245 300
9 9 70 NaN 0 NaN 300 600
10 10 50 NaN 0 NaN 143 228
...
The result should be as follows:
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 280 0 NaN 200 322
2 2 30 320 70 NaN 100 100
3 3 25 295 0 NaN 400 110
4 4 15 480 0 200 100 300
5 5 10 470 0 NaN 100 180
6 6 40 430 0 NaN 100 100
7 7 60 370 0 NaN 300 400
8 8 50 320 0 NaN 245 300
9 9 70 250 0 NaN 300 600
10 10 50 285 0 85 143 228
...
But the result of executing the code was not what it should be:
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 280 0 NaN 200 322
2 2 30 320 70 NaN 100 100
3 3 25 295 0 NaN 400 110
4 4 15 480 0 200 100 300
5 5 10 470 0 NaN 100 180
6 6 40 430 0 NaN 100 100
7 7 60 370 0 NaN 300 400
8 8 50 375 0 55 245 300
9 9 70 605 0 300 300 600
10 10 50 640 0 85 143 228
...
The error appears after hold = 370
, but I don't understand why.
source to share
Instead of doing this line by line, you can use a combination of cumsum()
and np.where
to do it across the entire DataFrame:
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()
Think about the transformations you want to do in two steps. You have an initial stage where you add and subtract from the initial value df.hold
. Then you change this new retention value in some cases, according to some conditions.
cumsum()
takes a series or DataFrame and creates a new version where each row is the cumulative sum of the previous rows and the current row. You can do this for df.cons
and df.supply
to get the cumulative amounts to be deducted and added to df.hold
. Now you are calculating the first stage df.hold
.
You can use np.where
to find out when it df.hold
meets the conditions that interest you. Where possible, you can set df['add.supply']
accordingly. Then you can add this new column to df.hold
. Note that we are using fillna(0)
to make sure each row matters, and cumsum()
again to keep the added conditionals over time.
UPDATE
The original code above didn't work after adding one value add.supply
, because the future values ββof the first stage df.hold
hadn't included it yet. There might be a way to do this non-iteratively, and definitely a better and cleaner way than what I've done below, but this will at least get the job done:
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
hold = df.hold.tolist()
s_res = df.s_res.tolist()
add = (df.z_res - df.s_res).shift(-1).tolist()
newh = [hold[0]]
totala = 0
for h, s, a in zip(hold, s_res, add):
newh.append(h + totala)
if newh[-1] < s:
totala += a
df['hold'] = pd.Series(newh[1:])
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
source to share