Update NULL-filled rows for a column based on matching values ββof other columns in other rows
Suppose I have a dataframe as shown below:
df1=
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 NaN
2 A2 B0 C0 NaN
3 A3 B2 C2 NaN
4 A4 B2 C2 2,3
5 A5 B3 C3 NaN
6 A6 B3 C3 NaN
I want the result to be
df1=
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B3 C3 NaN
6 A6 B3 C3 NaN
I want to update coordinates with the same street and city . In the above example (B0, C0) with index 0 there are coordinates (1,1). So I need to update the coordinates with indices 1 and 2 to (1,1) since they have the same streets and cities (B0, C0). What is the best way to achieve this?
Also how can I update all this data in the same way if we are given a list of data. those.
df_list = [df1,df2,..]
Is it good to first generate a block of data with unique strings from all dataframes and then use this framework to find and update each dataframe?
source to share
If only one value is possible, not NaN
in every group, use sort_values
c ffill
( Series.fillna
c method='ffill'
):
df = df.sort_values(['street','city','coordinates'])
df['coordinates'] = df['coordinates'].ffill()
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
4 A4 B2 C2 2,3
3 A3 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
Solution with GroupBy.transform
with dropna
:
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.dropna())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.ffill().bfill())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
The second solution also works with multiple values ββ- first the direct fill values ββfor each group (do not replace the first values, remain NaN
), and then all the first values ββare replaced with the fill:
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 NaN
2 A2 B0 C0 NaN
3 A3 B2 C2 NaN
4 A4 B2 C2 2,3
5 A5 B2 C2 4,7
5 A6 B2 C2 NaN
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.ffill().bfill())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 4,7
5 A6 B2 C2 4,7
source to share