Padding NaN in DataFrame based on column values

Question

Padding NaN in DataFrame based on column values

I have data that resembles the following simplified example:

Col1    Col2    Col3
a       A       10.1
b       A       NaN
d       B       NaN
e       B       12.3    
f       B       NaN
g       C       14.1
h       C       NaN
i       C       NaN

... for many thousands of lines. I need to fill based on a value in Col2 using something similar to the ffill method. The result I'm looking for is the following:

Col1    Col2    Col3
a       A       10.1
b       A       10.1
d       B       NaN
e       B       12.3    
f       B       12.3
g       C       14.1
h       C       14.1
i       C       14.1

However, this method ignores the value in Col2. Any ideas?

+3

python pandas nan dataframe

DrTRD 15 jul. 15 at 19:58

source to share

4 answers

One answer I found is the following:

df['col3'] = df.groupby('Col2').transform('fillna',method='ffill')['col3']

Any thoughts?

+1

DrTRD 15 jul. 15 at 20:13

source to share

Is this what you are looking for?

import pandas as pd
import numpy as np


df['Col3'] = np.where(df['Col2'] == 'A', df['Col3'].fillna(10.1), df["Col3"])

Replace accordingly, of course.

0

Leb 15 jul. 15 at 20:08

source to share

You can take slices of the DataFrame for each item Col2

and then combine the results.

>>> pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))

  Col1 Col2  Col3
0    a    A  10.1
1    b    A  10.1
2    d    B   NaN
3    e    B  12.3
4    f    B  12.3
5    g    C  14.1
6    h    C  14.1
7    i    C  14.1

EDIT: It seems that the method provided by @EdChum is the fastest to date.

%timeit pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))
100 loops, best of 3: 3.57 ms per loop

%timeit df.groupby('Col2').transform('fillna',method='ffill')['Col3']
100 loops, best of 3: 4.59 ms per loop

%timeit df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
1000 loops, best of 3: 746 µs per loop

0

Alexander 15 jul. 15 at 20:10

source to share

EdChum · Accepted Answer · 2015-07-15T20:16:21+0000

If I understand correctly, you can group by 'Col2' and then call the broadcast to 'Col3' and call ffill

:

In [35]:

df['Col3'] = df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
df
Out[35]:
  Col1 Col2  Col3
0    a    A  10.1
1    b    A  10.1
2    d    B   NaN
3    e    B  12.3
4    f    B  12.3
5    g    C  14.1
6    h    C  14.1
7    i    C  14.1

Padding NaN in DataFrame based on column values

More articles: