Concatenate columns of lists containing NaNs in a dataframe
I have a pandas df with two columns having either lists or NaN values. There are no rows with NaN in both columns . I want to create a third column that concatenates the values ββof the other two columns like this: -
if row df.a is NaN -> df.c = df.b
if row df.b is Nan -> df.c = df.a
else df.c = df.a + df.b
Input: -
df
a b
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
output:
df.c
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
I tried to use this nested condition using
df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)
but gives me this error:
TypeError: ('can only concatenate list (not "float") for list', u'occurred at index 0 ').
I am using (and it works)
if x is float
because that is the only way to find a list from a NaN value.
source to share
You can use fillna
to replace NaN
with empty list
first:
df = pd.DataFrame({'a': [[0, 1, 2], np.nan, [0, 1, 2]],
'b':[np.nan,[0, 1, 2],[ 5, 6, 7, 8, 9]]})
print (df)
s = pd.Series([[]], index=df.index)
df['c'] = df['a'].fillna(s) + df['b'].fillna(s)
print (df)
a b c
0 [0, 1, 2] NaN [0, 1, 2]
1 NaN [0, 1, 2] [0, 1, 2]
2 [0, 1, 2] [5, 6, 7, 8, 9] [0, 1, 2, 5, 6, 7, 8, 9]
source to share
You can convert NaN
to a list and then apply np.sum
:
In [718]: df['c'] = df[['a', 'b']].applymap(lambda x: [] if x != x else x).apply(np.sum, axis=1); df['c']
Out[718]:
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
9 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Name: c, dtype: object
This works for any number of columns with / NaN content.
source to share
When you use pd.DataFrame.stack
, null values ββare removed by default. We can then group by the first level of the index and concatenate the lists together withsum
df.stack().groupby(level=0).sum()
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object
We can then add it to the copy of the dataframe with assign
df.assign(c=df.stack().groupby(level=0).sum())
Or add it to a new column in place
df['c'] = df.stack().groupby(level=0).sum()
source to share