Subtracting data with an unequal number of rows

Question

Subtracting data with an unequal number of rows

I have two dataframes like

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(10, size=(5, 4)), index=list('ABCDE'), columns=list('abcd'))
df2 = pd.DataFrame(np.random.randint(10, size=(2, 4)), index=list('CE'), columns=list('abcd'))

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C  2  4  7  6
D  8  8  1  6
E  7  7  8  1

   a  b  c  d
C  5  9  8  9
E  4  3  0  3

The index is df2

always a subset of the index df1

, and the column names are identical.

I want to create a third dataframe df3 = df1 - df2

. If you do this, it turns out

     a    b    c    d
A  NaN  NaN  NaN  NaN
B  NaN  NaN  NaN  NaN
C -3.0 -5.0 -1.0 -3.0
D  NaN  NaN  NaN  NaN
E  3.0  4.0  8.0 -2.0

I do not want NAs

in the output, but the corresponding values df1

. Is there a sensible way to use eg. fillna

with values df1

on lines not contained in df2

?

A workaround would be to subtract only the required lines, for example:

sub_ind = df2.index
df3 = df1.copy()
df3.loc[sub_ind, :] = df1.loc[sub_ind, :] - df2.loc[sub_ind, :]

which gives me the desired output

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C -3 -5 -1 -3
D  8  8  1  6
E  3  4  8 -2

but maybe there is an easier way to achieve this?

+3

python pandas dataframe

Cleb May 01 '17 at 14:41

source to share

3 answers

I think this is what you want:

(df1-df2).fillna(df1)

Out[40]: 
     a    b    c    d
A  5.0  0.0  3.0  3.0
B  7.0  9.0  3.0  5.0
C -3.0 -5.0 -1.0 -3.0
D  8.0  8.0  1.0  6.0
E  3.0  4.0  8.0 -2.0

Just subtract the data as usual, but package the result with parentheses and run the method pandas.DataFrame.fillna

on the result. Or, in a little more detail:

diff = df1-df2
diff.fillna(df1, inplace=True)

+3

blacksite May 01 '17 at 14:44

source to share

Here is an option using reindex

its parameter as well fill_value

. The main differences between this answer and @ayhan's answer:

You can control padding value on only one of the dataframes, or both
This can be generalized to reindex

over custom index joining df1

anddf2
We have better control over data type persistence int

df1 - df2.reindex(df1.index, fill_value=0)

   a  b  c  d
A  5  0  3  3
B  7  9  3  5
C -3 -5 -1 -3
D  8  8  1  6
E  3  4  8 -2

+2

piRSquared 01 May '17 at 15:10

source to share

user2285236 · Accepted Answer · 2017-05-01T14:45:40+0000

If you use the method sub

instead -

, you can pass the fill value:

df1.sub(df2, fill_value=0)
Out: 
     a    b    c    d
A  5.0  0.0  3.0  3.0
B  7.0  9.0  3.0  5.0
C -3.0 -5.0 -1.0 -3.0
D  8.0  8.0  1.0  6.0
E  3.0  4.0  8.0 -2.0

Subtracting data with an unequal number of rows

More articles: