How to fill in missing values ​​based on a column in pandas?

I have this data block in pandas:

df = pandas.DataFrame({
        "n": ["a", "b", "c", "a", "b", "x"],
        "t": [0, 0, 0, 1, 1, 1],
        "v": [10,20,30,40,50,60]
    })

      

how can it be filled with missing values ​​so that every column value t

has the same entries in the column n

? that is, each value t

must contain entries for a, b, c, x

, written as NaN

if they are missing:

   n  t   v
   a  0  10
   b  0  20
   c  0  30
   x  NaN NaN
   a  1  40
   b  1  50
   c  NaN NaN
   x  1  60

      

+3


source to share


4 answers


From what I understand, you want each value in to be "n"

evenly distributed among the subgroups grouped by "t"

. I also hope that these "n"

cannot be duplicated in these subgroups.

Given that these assumptions are correct, pd.pivot_table

it seems like a good option for this use case. Here, the values ​​under "n"

will be the column names, "t"

will be the grouped index, and the content will be DF

populated with the values ​​under "v"

. Push the stack DF

, save the entries, NaN

and fill it with the appropriate cells in "t"

with .loc

accessor.



df1 = pd.pivot_table(df, "v", "t", "n", "first").stack(dropna=False).reset_index(name="v")
df1.loc[df1['v'].isnull(), "t"] = np.nan

      

enter image description here

+1


source


plan

  • get unique column values 'n'

    . we will use this forreindex

  • we will apply f

    to our groups in each column group 't'

    , re-indexing idx

    , ensure that all items idx

    are represented for each group of unique't'

  • we set the index so that we can reindex

    in bits



idx = df.n.unique()
f = lambda x: x.reindex(idx)
df.set_index('n').groupby('t', group_keys=False).apply(f).reset_index()

   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

      

+2


source


You can use if df

not in NaN

before - create MultiIndex

, and then reindex

, NaN

in t

set column v

:

cols = ["n", "t"]
df1 = df.set_index(cols)
mux = pd.MultiIndex.from_product(df1.index.levels, names=cols)
df1 = df1.reindex(mux).sort_index(level=[1,0]).reset_index()
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

      

Another solution is to add NaN unstack

, stack

:

cols = ["n", "t"]
df1 = df.set_index(cols)['v'].unstack().stack(dropna=False)
df1 = df1.sort_index(level=[1,0]).reset_index(name='v')
df1['t'] = df1['t'].mask(df1['v'].isnull())
print (df1)
    n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0  60.0

      

But if some values NaN

require groupby

with loc

a unique

column value n

:

df = pd.DataFrame({"n": ["a", "b", "c", "a", "b", "x"], 
                       "t": [0, 0, 0, 1, 1, 1], 
                       "v": [10,20,30,40,50,np.nan]})
print (df)
   n  t     v
0  a  0  10.0
1  b  0  20.0
2  c  0  30.0
3  a  1  40.0
4  b  1  50.0
5  x  1   NaN

df1 = df.set_index('n')
        .groupby('t', group_keys=False)
        .apply(lambda x: x.loc[df.n.unique()])
        .reset_index()

print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0   NaN   

      


df1 = df.groupby('t', group_keys=False)
        .apply(lambda x: x.set_index('n').loc[df.n.unique()])
        .reset_index()
print (df1)
   n    t     v
0  a  0.0  10.0
1  b  0.0  20.0
2  c  0.0  30.0
3  x  NaN   NaN
4  a  1.0  40.0
5  b  1.0  50.0
6  c  NaN   NaN
7  x  1.0   NaN

      

+1


source


It looks like you are wrong. Usually NaNs are read automatically or you supply them. You can manually put NaN on np.nan

if yours is import numpy as np

up. Alternatively pandas stores numpy internally and you can get Nan onpandas.np.nan

0


source







All Articles