Pandas, using generated values, iterating over rows in grouped data
I'm new to Pandas and programming in general, but I could always find the answer to any problem via google so far. Sorry for the tricky descriptive question, hopefully someone can come up with something clearer.
I am trying to combine data together, execute functions on that data, update a column, and then use the data from that column in the next data group.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random(9),columns=['A'])
df['B'] = [1,1,1,2,2,3,3,3,3]
df['C'] = np.nan
df['D'] = np.nan
df.loc[0:2,'C'] = 500
Giving me
A B C D
0 0.825828 1 500.0 NaN
1 0.218618 1 500.0 NaN
2 0.902476 1 500.0 NaN
3 0.452525 2 NaN NaN
4 0.513505 2 NaN NaN
5 0.089975 3 NaN NaN
6 0.282479 3 NaN NaN
7 0.774286 3 NaN NaN
8 0.408501 3 NaN NaN
500 in column C is the starting condition. I want to group data by column B and execute the following function in the first group
def function1(row):
return row['A']*row['C']/6
giving me
A B C D
0 0.825828 1 500.0 68.818971
1 0.218618 1 500.0 18.218145
2 0.902476 1 500.0 75.206313
3 0.452525 2 NaN NaN
4 0.513505 2 NaN NaN
5 0.089975 3 NaN NaN
6 0.282479 3 NaN NaN
7 0.774286 3 NaN NaN
8 0.408501 3 NaN NaN
Then I want to sum the first three values ββin D and add them to the last value in C and make that value the value of group 2
A B C D
0 0.825828 1 500.000000 68.818971
1 0.218618 1 500.000000 18.218145
2 0.902476 1 500.000000 75.206313
3 0.452525 2 662.243429 NaN
4 0.513505 2 662.243429 NaN
5 0.089975 3 NaN NaN
6 0.282479 3 NaN NaN
7 0.774286 3 NaN NaN
8 0.408501 3 NaN NaN
Then I execute function1 in group 2 and repeat until I'm done with this
A B C D
0 0.825828 1 500.000000 68.818971
1 0.218618 1 500.000000 18.218145
2 0.902476 1 500.000000 75.206313
3 0.452525 2 662.243429 49.946896
4 0.513505 2 662.243429 56.677505
5 0.089975 3 768.867830 11.529874
6 0.282479 3 768.867830 36.198113
7 0.774286 3 768.867830 99.220591
8 0.408501 3 768.867830 52.347246
The data block will consist of hundreds of lines. I've tried different groups, use combinations, but I'm completely stumped.
thank
source to share
You can use numpy.unique()
for selction. In your code, it might look something like this:
import numpy as np
import math
unique, indices, counts = np.unique(df['B'], return_index=True, return_counts=True)
for i in range(len(indices)):
for j in range(len(counts)):
row = df[indices[i]+j]
if math.isnan(row['C']):
row['C'] = df.loc[indices[i-1], 'D']
# then call your function
function1(row)
source to share