Update NULL-filled rows for a column based on matching values of other columns in other rows

Question

Update NULL-filled rows for a column based on matching values of other columns in other rows

Suppose I have a dataframe as shown below:

df1= name street city coordinates 0 A0 B0 C0 1,1 1 A1 B0 C0 NaN 2 A2 B0 C0 NaN 3 A3 B2 C2 NaN 4 A4 B2 C2 2,3 5 A5 B3 C3 NaN 6 A6 B3 C3 NaN

I want the result to be

df1= name street city coordinates 0 A0 B0 C0 1,1 1 A1 B0 C0 1,1 2 A2 B0 C0 1,1 3 A3 B2 C2 2,3 4 A4 B2 C2 2,3 5 A5 B3 C3 NaN 6 A6 B3 C3 NaN

I want to update coordinates with the same street and city . In the above example (B0, C0) with index 0 there are coordinates (1,1). So I need to update the coordinates with indices 1 and 2 to (1,1) since they have the same streets and cities (B0, C0). What is the best way to achieve this?

Also how can I update all this data in the same way if we are given a list of data. those. df_list = [df1,df2,..]

Is it good to first generate a block of data with unique strings from all dataframes and then use this framework to find and update each dataframe?

+3

python python-3.x pandas

nbbk Jul 27. '17 at 9:41

source to share

1 answer

jezrael · Accepted Answer · 2017-07-27T10:03:26+0000

If only one value is possible, not NaN

in every group, use sort_values

c ffill

( Series.fillna

c method='ffill'

):

df = df.sort_values(['street','city','coordinates'])
df['coordinates'] = df['coordinates'].ffill()
print (df)
  name street city coordinates
0   A0     B0   C0         1,1
1   A1     B0   C0         1,1
2   A2     B0   C0         1,1
4   A4     B2   C2         2,3
3   A3     B2   C2         2,3
5   A5     B2   C2         2,3
5   A6     B2   C2         2,3

Solution with GroupBy.transform

with dropna

:

df['coordinates'] = df.groupby(['street','city'])['coordinates']
                      .transform(lambda x: x.dropna())
print (df)
  name street city coordinates
0   A0     B0   C0         1,1
1   A1     B0   C0         1,1
2   A2     B0   C0         1,1
3   A3     B2   C2         2,3
4   A4     B2   C2         2,3
5   A5     B2   C2         2,3
5   A6     B2   C2         2,3

Or ffill

with bfill

:

df['coordinates'] = df.groupby(['street','city'])['coordinates']
                      .transform(lambda x: x.ffill().bfill())
print (df)
  name street city coordinates
0   A0     B0   C0         1,1
1   A1     B0   C0         1,1
2   A2     B0   C0         1,1
3   A3     B2   C2         2,3
4   A4     B2   C2         2,3
5   A5     B2   C2         2,3
5   A6     B2   C2         2,3

The second solution also works with multiple values - first the direct fill values for each group (do not replace the first values, remain NaN

), and then all the first values are replaced with the fill:

print (df)
  name street city coordinates
0   A0     B0   C0         1,1
1   A1     B0   C0         NaN
2   A2     B0   C0         NaN
3   A3     B2   C2         NaN
4   A4     B2   C2         2,3
5   A5     B2   C2         4,7
5   A6     B2   C2         NaN

df['coordinates'] = df.groupby(['street','city'])['coordinates']
                      .transform(lambda x: x.ffill().bfill())
print (df)
  name street city coordinates
0   A0     B0   C0         1,1
1   A1     B0   C0         1,1
2   A2     B0   C0         1,1
3   A3     B2   C2         2,3
4   A4     B2   C2         2,3
5   A5     B2   C2         4,7
5   A6     B2   C2         4,7

Update NULL-filled rows for a column based on matching values ​​of other columns in other rows

More articles:

Update NULL-filled rows for a column based on matching values of other columns in other rows