Subtracting data with an unequal number of rows
I have two dataframes like
import pandas as pd
import numpy as np
np.random.seed(0)
df1 = pd.DataFrame(np.random.randint(10, size=(5, 4)), index=list('ABCDE'), columns=list('abcd'))
df2 = pd.DataFrame(np.random.randint(10, size=(2, 4)), index=list('CE'), columns=list('abcd'))
a b c d
A 5 0 3 3
B 7 9 3 5
C 2 4 7 6
D 8 8 1 6
E 7 7 8 1
a b c d
C 5 9 8 9
E 4 3 0 3
The index is df2
always a subset of the index df1
, and the column names are identical.
I want to create a third dataframe df3 = df1 - df2
. If you do this, it turns out
a b c d
A NaN NaN NaN NaN
B NaN NaN NaN NaN
C -3.0 -5.0 -1.0 -3.0
D NaN NaN NaN NaN
E 3.0 4.0 8.0 -2.0
I do not want NAs
in the output, but the corresponding values df1
. Is there a sensible way to use eg. fillna
with values df1
on lines not contained in df2
?
A workaround would be to subtract only the required lines, for example:
sub_ind = df2.index df3 = df1.copy() df3.loc[sub_ind, :] = df1.loc[sub_ind, :] - df2.loc[sub_ind, :]
which gives me the desired output
a b c d
A 5 0 3 3
B 7 9 3 5
C -3 -5 -1 -3
D 8 8 1 6
E 3 4 8 -2
but maybe there is an easier way to achieve this?
source to share
I think this is what you want:
(df1-df2).fillna(df1)
Out[40]:
a b c d
A 5.0 0.0 3.0 3.0
B 7.0 9.0 3.0 5.0
C -3.0 -5.0 -1.0 -3.0
D 8.0 8.0 1.0 6.0
E 3.0 4.0 8.0 -2.0
Just subtract the data as usual, but package the result with parentheses and run the method pandas.DataFrame.fillna
on the result. Or, in a little more detail:
diff = df1-df2 diff.fillna(df1, inplace=True)
source to share
Here is an option using reindex
its parameter as well fill_value
. The main differences between this answer and @ayhan's answer:
- You can control padding value on only one of the dataframes, or both
- This can be generalized to
reindex
over custom index joiningdf1
anddf2
- We have better control over data type persistence
int
df1 - df2.reindex(df1.index, fill_value=0)
a b c d
A 5 0 3 3
B 7 9 3 5
C -3 -5 -1 -3
D 8 8 1 6
E 3 4 8 -2
source to share