Pandas, using generated values, iterating over rows in grouped data

Question

Pandas, using generated values, iterating over rows in grouped data

I'm new to Pandas and programming in general, but I could always find the answer to any problem via google so far. Sorry for the tricky descriptive question, hopefully someone can come up with something clearer.

I am trying to combine data together, execute functions on that data, update a column, and then use the data from that column in the next data group.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.random(9),columns=['A'])
df['B'] = [1,1,1,2,2,3,3,3,3]
df['C'] = np.nan
df['D'] = np.nan
df.loc[0:2,'C'] = 500

Giving me

    A           B   C       D
0   0.825828    1   500.0   NaN
1   0.218618    1   500.0   NaN
2   0.902476    1   500.0   NaN
3   0.452525    2   NaN     NaN
4   0.513505    2   NaN     NaN
5   0.089975    3   NaN     NaN
6   0.282479    3   NaN     NaN
7   0.774286    3   NaN     NaN
8   0.408501    3   NaN     NaN

500 in column C is the starting condition. I want to group data by column B and execute the following function in the first group

def function1(row):
    return row['A']*row['C']/6

giving me

    A           B   C       D
0   0.825828    1   500.0   68.818971
1   0.218618    1   500.0   18.218145
2   0.902476    1   500.0   75.206313
3   0.452525    2   NaN     NaN
4   0.513505    2   NaN     NaN
5   0.089975    3   NaN     NaN
6   0.282479    3   NaN     NaN
7   0.774286    3   NaN     NaN
8   0.408501    3   NaN     NaN

Then I want to sum the first three values in D and add them to the last value in C and make that value the value of group 2

    A           B   C           D
0   0.825828    1   500.000000  68.818971
1   0.218618    1   500.000000  18.218145
2   0.902476    1   500.000000  75.206313
3   0.452525    2   662.243429  NaN
4   0.513505    2   662.243429  NaN
5   0.089975    3   NaN         NaN
6   0.282479    3   NaN         NaN
7   0.774286    3   NaN         NaN
8   0.408501    3   NaN         NaN

Then I execute function1 in group 2 and repeat until I'm done with this

    A           B   C           D
0   0.825828    1   500.000000  68.818971
1   0.218618    1   500.000000  18.218145
2   0.902476    1   500.000000  75.206313
3   0.452525    2   662.243429  49.946896
4   0.513505    2   662.243429  56.677505
5   0.089975    3   768.867830  11.529874
6   0.282479    3   768.867830  36.198113
7   0.774286    3   768.867830  99.220591
8   0.408501    3   768.867830  52.347246

The data block will consist of hundreds of lines. I've tried different groups, use combinations, but I'm completely stumped.

thank

+3

python pandas

BruceWee Apr 21 17 at 11:02

source to share

2 answers

You can use numpy.unique()

for selction. In your code, it might look something like this:

import numpy as np
import math

unique, indices, counts = np.unique(df['B'], return_index=True, return_counts=True)

for i in range(len(indices)):
    for j in range(len(counts)):
        row = df[indices[i]+j]
        if math.isnan(row['C']):
            row['C'] = df.loc[indices[i-1], 'D']  
        # then call your function
        function1(row)

0

Nyps Apr 21 17 at 11:34 am

source to share

zipa · Accepted Answer · 2017-04-21T11:13:42+0000

Here's the solution:

df['D'] = df['A'] * df['C']/6

for i in df['B'].unique()[1:]:
    df.loc[df['B']==i, 'C'] = df['D'].sum()
    df.loc[df['B']==i, 'D'] = df['A'] * df['C']/6

Pandas, using generated values, iterating over rows in grouped data

More articles: