Speed ​​up pandas apply or use map

I have a DataFrame and I want to populate a new column based on the lookup table. I cannot use map

as the values ​​from the lookup table take up many indexes.

import pandas as pd
import numpy as np

d = pd.DataFrame({'I': np.random.randint(3, size=5),
                  'B0': np.random.choice([True, False], 5),
                  'B1': np.random.choice([True, False], 5)})

      

which is my data (actually my data is much larger):

      B0     B1  I
0   True  False  0
1  False  False  0
2  False  False  1
3   True  False  1
4  False   True  2

      

then my lookup table:

l = pd.DataFrame({(True, True): [1.1, 2.2, 3.3],
              (True, False): [1.3, 2.1, 3.1],
              (False, True): [1.2, 2.1, 3.1],
              (False, False): [1.1, 2.0, 5.1]}
             )
l.index.name = 'I'
l.columns.names = 'B0', 'B1'
l = l.stack(['B0', 'B1'])

      

which the

I  B0     B1   
0  False  False    1.1
          True     1.2
   True   False    1.3
          True     1.1
1  False  False    2.0
          True     2.1
   True   False    2.1
          True     2.2
2  False  False    5.1
          True     3.1
   True   False    3.1
          True     3.3

      

so I want to add a column w

from my data querying the value loop table (I, B0, B1)

. I use:

d['w'] = d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1)

      

and it works:

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

      

the problem is that it is very slow. How can I speed it up?

+3


source to share


2 answers


It should be faster

find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)

      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

      

FROM join

d.join(l.rename('w'), on=['I', 'B0', 'B1'])


      B0     B1  I    w
0   True  False  0  1.3
1  False  False  0  1.1
2  False  False  1  2.0
3   True  False  1  2.1
4  False   True  2  3.1

      




Timing
small data

%%timeit
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
100 loops, best of 3: 1.98 ms per loop

%timeit d.assign(w=d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1))
100 loops, best of 3: 11.8 ms per loop

%timeit d.join(l.rename('w'), on=['I', 'B0', 'B1'])
100 loops, best of 3: 1.99 ms per loop

%timeit d.merge(l.reset_index())
100 loops, best of 3: 2.89 ms per loop

      

+4


source


we can combine d

with flat (after applying reset_index()

) l

:



In [5]: d.merge(l.reset_index())
Out[5]:
      B0     B1  I    0
0   True  False  0  1.3
1   True  False  0  1.3
2  False   True  0  1.2
3  False  False  0  1.1
4  False   True  2  3.1

      

+3


source







All Articles