Pandas - concatenate two dataframes with different number of rows

Question

Pandas - concatenate two dataframes with different number of rows

I have the following two data frames:

DF:

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    300
2000-10-01    400
2001-01-01    500

df1:

              value
period
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

This is the desired output:

DF:

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

I have set_index(['period'])

both on df1 and df2. I've also tried several things, including the concat and where expression after creating a new column, but notting works as expected. My first framework is primary. The second option is updating. It should replace the corresponding values in the first one and at the same time add new records if available.

How can i do this?

+3

python pandas

sretko May 08 '17 at 20:44

source to share

4 answers

Is this what you want?

In [151]: pd.concat([df1, df.loc[df.index.difference(df1.index)]]).sort_index()
Out[151]:
            value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

PS make sure both indexes are of the same type - it is better to convert them to datetime

dtype using the methodpd.to_datetime()

+3

MaxU May 08 '17 at 20:49

source to share

Another option with append

anddrop_duplicates

d1 = df1.append(df)
d1[~d1.index.duplicated()]

            value
period           
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700
2000-01-01    100
2000-04-01    200

+3

piRSquared May 08 '17 at 21:43

source to share

I used the pd.concat () function to concatenate the dataframes, then dump the duplicates to get the results.

df_con = pd.concat([df, df1])
df_con.drop_duplicates(subset="period",keep="last",inplace=True)
print(df_con)

       period  value
0  2000-01-01    100
1  2000-04-01    200
0  2000-07-01    350
1  2000-10-01    450
2  2001-01-01    550
3  2001-04-01    600
4  2001-07-01    700

To set the "period" back as an index, just set the index,

print(df_con.set_index("period"))

            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

0

Mr. Pacman May 08 '17 at 22:22

source to share

jezrael · Accepted Answer · 2017-05-08T20:56:34+0000

You can also use combine_first

if dtype

any index object

convert to_datetime

, which works well if it df1.index

is always in df.index

:

print (df.index.dtype)
object

print (df1.index.dtype)
object

df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

df = df1.combine_first(df)
#if necessary int columns
#df = df1.combine_first(df).astype(int)
print (df)
            value
period           
2000-01-01  100.0
2000-04-01  200.0
2000-07-01  350.0
2000-10-01  450.0
2001-01-01  550.0
2001-04-01  600.0
2001-07-01  700.0

If not, then a filter is needed intersection

:

df = df1.loc[df1.index.intersection(df.index)].combine_first(df)

Another solution with numpy.setdiff1d

andconcat

df = pd.concat([df.loc[np.setdiff1d(df.index, df1.index)], df1])
print (df)
            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

Pandas - concatenate two dataframes with different number of rows

More articles: