Populate the column using the previous value in the column and some calculations in pandas

Question

Populate the column using the previous value in the column and some calculations in pandas

My dataframe looks like this:

   scale  cons    hold    supply   add.supply     s_res      z_res
48  -5     NaN    NaN      NaN       NaN           NaN        NaN   
49  -4     NaN    NaN      NaN       NaN           NaN        NaN   
50  -3     NaN    NaN      NaN       NaN           NaN        NaN   
51  -2     NaN    NaN      NaN       NaN           NaN        NaN   
52  -1     NaN    NaN      NaN       NaN           NaN        NaN   
53   0      0     300       0        NaN           100        200   
54   1     20     NaN       0        NaN           200        322   
55   2     30     NaN      70        NaN           100        100   
56   3     25     NaN       0        NaN           400        110   
57   4     15     NaN       0        NaN           100        300   
58   5     10     NaN       0        NaN           100        180   
59   6     40     NaN       0        NaN           100        100   
...

I need to do the following:

Starting at a value where to scale = 1

fill the column with hold

values calculated as follows:

I take the previous value in the column hold

and subtract from it the corresponding value of the current cell from the column cons

and add the corresponding value from the column supply

.

(For the cell in the column hold

that matches scale = 1

, it will be (300 - 20) + 0 = 280

, for the next cell (280 - 30) + 70) = 320

, for the next cell (320 - 25) + 0) = 295

, etc.)

If the value in the column is hold

less than the corresponding value in the column s_res

, then in the next cell I must add the difference between the corresponding values of the next cell in the columns s_res

and z_res

.

For example, the value in the column hold

is 295

where scale = 3

. This value is less than the value in the column s_res = 400

. Then you have the following meanings: (295 - 15) + 0 + (300 - 100) = 480

. And write this difference between s_res

and z_res

in the column add.supply

.

I need each new calculated value in a column hold

to check if this value is less than the value in the column s_res

.

The result should look like this:

   scale  cons    hold    supply   add.supply     s_res      z_res
48  -5     NaN    NaN      NaN       NaN           NaN        NaN   
49  -4     NaN    NaN      NaN       NaN           NaN        NaN   
50  -3     NaN    NaN      NaN       NaN           NaN        NaN   
51  -2     NaN    NaN      NaN       NaN           NaN        NaN   
52  -1     NaN    NaN      NaN       NaN           NaN        NaN   
53   0      0     300       0        NaN           100        200   
54   1     20     280       0        NaN           200        322   
55   2     30     320      70        NaN           100        100   
56   3     25     295       0        NaN           400        110   
57   4     15     480       0        200           100        300   
58   5     10     470       0        NaN           100        180   
59   6     40     430       0        NaN           100        100   
...

I would be grateful for any advice.

UPD I tried to apply the code

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

to a larger data frame and I am having problems.

My new framework

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    NaN     0        NaN        200     322
 2   2      30    NaN    70        NaN        100     100
 3   3      25    NaN     0        NaN        400     110
 4   4      15    NaN     0        NaN        100     300
 5   5      10    NaN     0        NaN        100     180
 6   6      40    NaN     0        NaN        100     100
 7   7      60    NaN     0        NaN        300     400
 8   8      50    NaN     0        NaN        245     300
 9   9      70    NaN     0        NaN        300     600
10  10      50    NaN     0        NaN        143     228
...

The result should be as follows:

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    280     0        NaN        200     322
 2   2      30    320    70        NaN        100     100
 3   3      25    295     0        NaN        400     110
 4   4      15    480     0        200        100     300
 5   5      10    470     0        NaN        100     180
 6   6      40    430     0        NaN        100     100
 7   7      60    370     0        NaN        300     400
 8   8      50    320     0        NaN        245     300
 9   9      70    250     0        NaN        300     600
10  10      50    285     0         85        143     228
...

But the result of executing the code was not what it should be:

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    280     0        NaN        200     322
 2   2      30    320    70        NaN        100     100
 3   3      25    295     0        NaN        400     110
 4   4      15    480     0        200        100     300
 5   5      10    470     0        NaN        100     180
 6   6      40    430     0        NaN        100     100
 7   7      60    370     0        NaN        300     400
 8   8      50    375     0         55        245     300
 9   9      70    605     0        300        300     600
10  10      50    640     0         85        143     228
...

The error appears after hold = 370

, but I don't understand why.

+3

python pandas

Yana dolyuk 18 jul. 17 at 15:42

source to share

1 answer

ASGM · Accepted Answer · 2017-07-18T16:23:41+0000

Instead of doing this line by line, you can use a combination of cumsum()

and np.where

to do it across the entire DataFrame:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

Think about the transformations you want to do in two steps. You have an initial stage where you add and subtract from the initial value df.hold

. Then you change this new retention value in some cases, according to some conditions.

cumsum()

takes a series or DataFrame and creates a new version where each row is the cumulative sum of the previous rows and the current row. You can do this for df.cons

and df.supply

to get the cumulative amounts to be deducted and added to df.hold

. Now you are calculating the first stage df.hold

.

You can use np.where

to find out when it df.hold

meets the conditions that interest you. Where possible, you can set df['add.supply']

accordingly. Then you can add this new column to df.hold

. Note that we are using fillna(0)

to make sure each row matters, and cumsum()

again to keep the added conditionals over time.

UPDATE

The original code above didn't work after adding one value add.supply

, because the future values of the first stage df.hold

hadn't included it yet. There might be a way to do this non-iteratively, and definitely a better and cleaner way than what I've done below, but this will at least get the job done:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()

hold = df.hold.tolist()
s_res = df.s_res.tolist()
add = (df.z_res - df.s_res).shift(-1).tolist()

newh = [hold[0]]
totala = 0
for h, s, a in zip(hold, s_res, add):
    newh.append(h + totala)
    if newh[-1] < s:
        totala += a

df['hold'] = pd.Series(newh[1:])
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)

Populate the column using the previous value in the column and some calculations in pandas

More articles: