How to get data in the correct dataframe that is not in the left dataframe
I have two frames of data and I am trying to output data that is in one but not the other.
I can get data in the first dataframe, but not the second, using
only_new = old.merge(
new, 'outer', on=['Employee ID', 'Benefit Plan Type'],
suffixes=['','_'], indicator=True
).query('_merge == "left_only"').reindex_axis(old.columns, axis=1)
This is what I am using to get data which is only in my second dataframe
only_new =new.merge(
old, 'outer', on=['Employee ID', 'Benefit Plan Type'],
suffixes=['','_'], indicator=True
).query('_merge == "left only"').reindex_axis(new.columns, axis=1)
But it doesn't return any data, but using Excel I can see that there should be multiple rows.
This seems like it should work
only_new = old.merge(new, on='Employee ID', indicator=True, how='outer',
only_new[only_new['_merge'] == 'right_only'])
But i get
SyntaxError: non-keyword arg after keyword arg
+3
source to share
2 answers
Consider data frames old
andnew
old = pd.DataFrame(dict(
ID=[1, 2, 3, 4, 5],
Type=list('AAABB'),
Total=[9 for _ in range(5)],
ArbitraryColumn=['blah' for _ in range(5)]
))
new = pd.DataFrame(dict(
ID=[3, 4, 5, 6, 7],
Type=list('ABBCC'),
Total=[9 for _ in range(5)],
ArbitraryColumn=['blah' for _ in range(5)]
))
Then, to take a symmetrically identical solution
old.merge(
new, 'outer', on=['ID', 'Type'],
suffixes=['_', ''], indicator=True # changed order of suffixes
).query('_merge == "right_only"').reindex_axis(new.columns, axis=1)
# \......../ \./
# changed from `left` to `right` reindex with `new`
ArbitraryColumn ID Total Type
5 blah 6 9.0 C
6 blah 7 9.0 C
+1
source to share