Apply lookup table in DataFrame for bins or ranges

I have a DataFrame that looks like this. Let's assume these are sales volumes for a salesperson list.

enter image description here

Also, I have a lookup table that contains a dollar fee. It looks like this. So, $ 0-50,000 = 5%, $ 50,001- $ 250,000 = 4%, etc.

enter image description here

What I want to do is apply the lookup table to the sales table to create below DataFrame.

enter image description here

Attempt 1:

In [66]: a
Out[66]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [67]: b
Out[67]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):

  File "<ipython-input-68-d229bce29f01>", line 1, in <module>
    c = b['Commission'][a <= b.index.values]

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
    res = self._combine_const(other, func, raise_on_error=False)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
    raise_on_error=raise_on_error)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
    return self.apply('eval', **kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
    result = get_result(other)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
    result = func(values, other)

ValueError: operands could not be broadcast together with shapes (3,3) (4,) 

      

Attempt 2:

In [59]: a
Out[59]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [60]: b
Out[60]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):

  File "<ipython-input-61-99e8134e826c>", line 1, in <module>
    c = b.lookup(a['Sales_1'],['Commission'])

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
    raise ValueError('Row labels must have same size as column labels')

ValueError: Row labels must have same size as column labels

      

Can anyone help me with applying lookup table in DataFrame? It doesn't have to be exactly like this, but it illustrates my general need.

+3


source to share


2 answers


To work with ranges pd.cut

is your friend. Based on your current frame b

, you only need to make changes to the list of bins you passed as an argument to determine the lowest range. I put 0 here, since there are no negative sales, but you can put any negative number if needed, even deal with -np.inf

and np.inf

instead 1E14

for your lower and upper bounds:

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]: 
  Sales_1 Sales_2 Sales_3
0    0.04    0.03    0.04
1    0.04    0.03    0.03
2    0.03    0.02    0.04

      

I found b

, as shown below, clearer to use with cutout:



          Sales  Commission
0          -inf         NaN
1         50000        0.05
2        250000        0.04
3        750000        0.03
4           inf        0.02

      

The arguments then become:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()

      

+8


source


@Boud has already hit this one out of the park. But here are my take

numpy

through searchsorted

pd.DataFrame(
    b.Commission.values[
        b.index.values.searchsorted(a.values.ravel())
    ].reshape(a.values.shape),
    a.index, a.columns)

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

      

pandas

using i also as well as shift definitions boundariespd.merge_asof


stack

a



a_ = a.stack().sort_values().to_frame('Sales')
b_ = pd.DataFrame(dict(
        Sales=np.append(0, b.index[:-1]),
        Commissions=b.Commission.values
    ))

print(a_)
print()
print(b_)

             Sales
0 Sales_3   100000
1 Sales_1   100000
0 Sales_1   200000
2 Sales_3   200000
0 Sales_2   300000
2 Sales_1   400000
1 Sales_2   500000
  Sales_3   500000
2 Sales_2  1000000

   Commissions   Sales
0         0.05       0
1         0.04   50000
2         0.03  250000
3         0.02  750000

      

We can now use pd.merge_asof

pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack()

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

      


naive time test

enter image description here

+2


source







All Articles