Apply lookup table in DataFrame for bins or ranges
I have a DataFrame that looks like this. Let's assume these are sales volumes for a salesperson list.
Also, I have a lookup table that contains a dollar fee. It looks like this. So, $ 0-50,000 = 5%, $ 50,001- $ 250,000 = 4%, etc.
What I want to do is apply the lookup table to the sales table to create below DataFrame.
Attempt 1:
In [66]: a
Out[66]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000
In [67]: b
Out[67]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02
In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):
File "<ipython-input-68-d229bce29f01>", line 1, in <module>
c = b['Commission'][a <= b.index.values]
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
res = self._combine_const(other, func, raise_on_error=False)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
raise_on_error=raise_on_error)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
return self.apply('eval', **kwargs)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
applied = getattr(b, f)(**kwargs)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
result = get_result(other)
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
result = func(values, other)
ValueError: operands could not be broadcast together with shapes (3,3) (4,)
Attempt 2:
In [59]: a
Out[59]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000
In [60]: b
Out[60]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02
In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):
File "<ipython-input-61-99e8134e826c>", line 1, in <module>
c = b.lookup(a['Sales_1'],['Commission'])
File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
raise ValueError('Row labels must have same size as column labels')
ValueError: Row labels must have same size as column labels
Can anyone help me with applying lookup table in DataFrame? It doesn't have to be exactly like this, but it illustrates my general need.
source to share
To work with ranges pd.cut
is your friend. Based on your current frame b
, you only need to make changes to the list of bins you passed as an argument to determine the lowest range. I put 0 here, since there are no negative sales, but you can put any negative number if needed, even deal with -np.inf
and np.inf
instead 1E14
for your lower and upper bounds:
pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]:
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
I found b
, as shown below, clearer to use with cutout:
Sales Commission
0 -inf NaN
1 50000 0.05
2 250000 0.04
3 750000 0.03
4 inf 0.02
The arguments then become:
pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()
source to share
@Boud has already hit this one out of the park. But here are my take
numpy
through searchsorted
pd.DataFrame(
b.Commission.values[
b.index.values.searchsorted(a.values.ravel())
].reshape(a.values.shape),
a.index, a.columns)
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
pandas
using
i also as well as shift definitions boundariespd.merge_asof
stack
a
a_ = a.stack().sort_values().to_frame('Sales')
b_ = pd.DataFrame(dict(
Sales=np.append(0, b.index[:-1]),
Commissions=b.Commission.values
))
print(a_)
print()
print(b_)
Sales
0 Sales_3 100000
1 Sales_1 100000
0 Sales_1 200000
2 Sales_3 200000
0 Sales_2 300000
2 Sales_1 400000
1 Sales_2 500000
Sales_3 500000
2 Sales_2 1000000
Commissions Sales
0 0.05 0
1 0.04 50000
2 0.03 250000
3 0.02 750000
We can now use pd.merge_asof
pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack()
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04
naive time test
source to share