Map dataframe with multiple columns as pandas keys

Question

Map dataframe with multiple columns as pandas keys

>>> p1.head()           
   StreamId            Timestamp    SeqNum
0         3  1490250116391063414  1158
1         3  1490250116391348339  3600
2         3  1490250116391542829  3600
3         3  1490250116391577184  1437
4         3  1490250116392819426  1389


>>> oss.head()
   OrderID    Symbol  Stream     SeqNo
0  5000000  AXBANK       3      1158
1  5000001  AXBANK       6      1733
2  5000002  AXBANK       6      1244
3  5000003  AXBANK       6      1388
4  5000004  AXBANK       3      1389

How can this be combined using 2 attributes as key (SeqNum and StreamId)

>>> merge
   OrderID    Symbol  Stream     SeqNo    Timestamp
0  5000000  AXBANK       3      1158      1490250116391063414
1  5000001  AXBANK       6      1733      NaN
2  5000002  AXBANK       6      1244      NaN
3  5000003  AXBANK       6      1388      NaN
4  5000004  AXBANK       3      1389      1490250116392819426

I tried to use

oss['Time1'] = oss['SeqNo'].map.((p1.set_index('SeqNum')['Timestamp']))

But I need to include both (SeqNum-SeqNo and Stream-StreamId) as keys. I know it can be easy if I rename the column names in both data files and use a merge, but I want to avoid it. I would rather use something generic (take THESE dataframe, draw THESE columns to THOSE columns to OTHER DATAFRAMEs, and extract the required coulmns)

+3

python pandas

pythonRcpp 28 Mar 17 at 8:13

source to share

2 answers

piRSquared · Answer 1 · 2017-03-28T08:20:36+0000

Using join

oss.join(p1.set_index(['StreamId', 'SeqNum']), on=['Stream', 'SeqNo'])

   OrderID  Symbol  Stream  SeqNo     Timestamp
0  5000000  AXBANK       3   1158  1.490250e+18
1  5000001  AXBANK       6   1733           NaN
2  5000002  AXBANK       6   1244           NaN
3  5000003  AXBANK       6   1388           NaN
4  5000004  AXBANK       3   1389  1.490250e+18

jezrael · Answer 2 · 2017-03-28T08:16:46+0000

I think you need merge

with drop

:

print (pd.merge(oss, p1, left_on=['Stream','SeqNo'], 
                         right_on=['StreamId','SeqNum'],how='left')
          .drop(['StreamId','SeqNum'], axis=1))

   OrderID  Symbol  Stream  SeqNo     Timestamp
0  5000000  AXBANK       3   1158  1.490250e+18
1  5000001  AXBANK       6   1733           NaN
2  5000002  AXBANK       6   1244           NaN
3  5000003  AXBANK       6   1388           NaN
4  5000004  AXBANK       3   1389  1.490250e+18

Another solution with columns rename

:

d = {'Stream':'StreamId','SeqNo':'SeqNum'}
print (pd.merge(oss.rename(columns=d), p1, how='left'))
   OrderID  Symbol  StreamId  SeqNum     Timestamp
0  5000000  AXBANK         3    1158  1.490250e+18
1  5000001  AXBANK         6    1733           NaN
2  5000002  AXBANK         6    1244           NaN
3  5000003  AXBANK         6    1388           NaN
4  5000004  AXBANK         3    1389  1.490250e+18

Map dataframe with multiple columns as pandas keys

More articles: