Speed up pandas apply or use map

Question

Speed up pandas apply or use map

I have a DataFrame and I want to populate a new column based on the lookup table. I cannot use map

as the values from the lookup table take up many indexes.

import pandas as pd
import numpy as np

d = pd.DataFrame({'I': np.random.randint(3, size=5),
                  'B0': np.random.choice([True, False], 5),
                  'B1': np.random.choice([True, False], 5)})

which is my data (actually my data is much larger):

      B0     B1  I
0   True  False  0
1  False  False  0
2  False  False  1
3   True  False  1
4  False   True  2

then my lookup table:

l = pd.DataFrame({(True, True): [1.1, 2.2, 3.3],
              (True, False): [1.3, 2.1, 3.1],
              (False, True): [1.2, 2.1, 3.1],
              (False, False): [1.1, 2.0, 5.1]}
             )
l.index.name = 'I'
l.columns.names = 'B0', 'B1'
l = l.stack(['B0', 'B1'])

which the

I  B0     B1   
0  False  False    1.1
          True     1.2
   True   False    1.3
          True     1.1
1  False  False    2.0
          True     2.1
   True   False    2.1
          True     2.2
2  False  False    5.1
          True     3.1
   True   False    3.1
          True     3.3

so I want to add a column w

from my data querying the value loop table (I, B0, B1)

. I use:

d['w'] = d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1)

and it works:

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

the problem is that it is very slow. How can I speed it up?

+3

python pandas

Ruggero Turra May 31 '17 at 16:57

source to share

2 answers

we can combine d

with flat (after applying reset_index()

) l

:

In [5]: d.merge(l.reset_index())
Out[5]:
      B0     B1  I    0
0   True  False  0  1.3
1   True  False  0  1.3
2  False   True  0  1.2
3  False  False  0  1.1
4  False   True  2  3.1

+3

MaxU May 31 '17 at 17:08

source to share

piRSquared · Accepted Answer · 2017-05-31T17:07:44+0000

It should be faster

find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

FROM join

d.join(l.rename('w'), on=['I', 'B0', 'B1'])


      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

Timing
small data

%%timeit
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
100 loops, best of 3: 1.98 ms per loop

%timeit d.assign(w=d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1))
100 loops, best of 3: 11.8 ms per loop

%timeit d.join(l.rename('w'), on=['I', 'B0', 'B1'])
100 loops, best of 3: 1.99 ms per loop

%timeit d.merge(l.reset_index())
100 loops, best of 3: 2.89 ms per loop

Speed ​​up pandas apply or use map

More articles:

Speed up pandas apply or use map