Speed ββup pandas apply or use map
I have a DataFrame and I want to populate a new column based on the lookup table. I cannot use map
as the values ββfrom the lookup table take up many indexes.
import pandas as pd
import numpy as np
d = pd.DataFrame({'I': np.random.randint(3, size=5),
'B0': np.random.choice([True, False], 5),
'B1': np.random.choice([True, False], 5)})
which is my data (actually my data is much larger):
B0 B1 I
0 True False 0
1 False False 0
2 False False 1
3 True False 1
4 False True 2
then my lookup table:
l = pd.DataFrame({(True, True): [1.1, 2.2, 3.3],
(True, False): [1.3, 2.1, 3.1],
(False, True): [1.2, 2.1, 3.1],
(False, False): [1.1, 2.0, 5.1]}
)
l.index.name = 'I'
l.columns.names = 'B0', 'B1'
l = l.stack(['B0', 'B1'])
which the
I B0 B1
0 False False 1.1
True 1.2
True False 1.3
True 1.1
1 False False 2.0
True 2.1
True False 2.1
True 2.2
2 False False 5.1
True 3.1
True False 3.1
True 3.3
so I want to add a column w
from my data querying the value loop table (I, B0, B1)
. I use:
d['w'] = d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1)
and it works:
B0 B1 I w
0 True False 0 1.3
1 False False 0 1.1
2 False False 1 2.0
3 True False 1 2.1
4 False True 2 3.1
the problem is that it is very slow. How can I speed it up?
+3
source to share
2 answers
It should be faster
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
B0 B1 I w
0 True False 0 1.3
1 False False 0 1.1
2 False False 1 2.0
3 True False 1 2.1
4 False True 2 3.1
FROM join
d.join(l.rename('w'), on=['I', 'B0', 'B1'])
B0 B1 I w
0 True False 0 1.3
1 False False 0 1.1
2 False False 1 2.0
3 True False 1 2.1
4 False True 2 3.1
Timing
small data
%%timeit
find_these = list(zip(d.I, d.B0, d.B1))
d.assign(w=l.loc[find_these].values)
100 loops, best of 3: 1.98 ms per loop
%timeit d.assign(w=d.apply(lambda x: l[x['I'], x['B0'], x['B1']], axis=1))
100 loops, best of 3: 11.8 ms per loop
%timeit d.join(l.rename('w'), on=['I', 'B0', 'B1'])
100 loops, best of 3: 1.99 ms per loop
%timeit d.merge(l.reset_index())
100 loops, best of 3: 2.89 ms per loop
+4
source to share