Apply lookup table in DataFrame for bins or ranges

Question

Apply lookup table in DataFrame for bins or ranges

I have a DataFrame that looks like this. Let's assume these are sales volumes for a salesperson list.

Also, I have a lookup table that contains a dollar fee. It looks like this. So, $ 0-50,000 = 5%, $ 50,001- $ 250,000 = 4%, etc.

What I want to do is apply the lookup table to the sales table to create below DataFrame.

Attempt 1:

In [66]: a
Out[66]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [67]: b
Out[67]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):

  File "<ipython-input-68-d229bce29f01>", line 1, in <module>
    c = b['Commission'][a <= b.index.values]

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
    res = self._combine_const(other, func, raise_on_error=False)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
    raise_on_error=raise_on_error)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
    return self.apply('eval', **kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
    result = get_result(other)

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
    result = func(values, other)

ValueError: operands could not be broadcast together with shapes (3,3) (4,)

Attempt 2:

In [59]: a
Out[59]: 
   Sales_1  Sales_2  Sales_3
0   200000   300000   100000
1   100000   500000   500000
2   400000  1000000   200000

In [60]: b
Out[60]: 
            Commission
Sales                 
50000             0.05
250000            0.04
750000            0.03
9999999999        0.02

In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):

  File "<ipython-input-61-99e8134e826c>", line 1, in <module>
    c = b.lookup(a['Sales_1'],['Commission'])

  File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
    raise ValueError('Row labels must have same size as column labels')

ValueError: Row labels must have same size as column labels

Can anyone help me with applying lookup table in DataFrame? It doesn't have to be exactly like this, but it illustrates my general need.

+3

python python-3.x pandas

Kyle 27 Mar 17 at 14:31

source to share

2 answers

@Boud has already hit this one out of the park. But here are my take

`numpy`

through searchsorted

pd.DataFrame(
    b.Commission.values[
        b.index.values.searchsorted(a.values.ravel())
    ].reshape(a.values.shape),
    a.index, a.columns)

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

`pandas`

using i also as well as shift definitions boundariespd.merge_asof

stack

a

a_ = a.stack().sort_values().to_frame('Sales')
b_ = pd.DataFrame(dict(
        Sales=np.append(0, b.index[:-1]),
        Commissions=b.Commission.values
    ))

print(a_)
print()
print(b_)

             Sales
0 Sales_3   100000
1 Sales_1   100000
0 Sales_1   200000
2 Sales_3   200000
0 Sales_2   300000
2 Sales_1   400000
1 Sales_2   500000
  Sales_3   500000
2 Sales_2  1000000

   Commissions   Sales
0         0.05       0
1         0.04   50000
2         0.03  250000
3         0.02  750000

We can now use pd.merge_asof

pd.merge_asof(a_, b_).set_index(a_.index).Commissions.unstack()

   Sales_1  Sales_2  Sales_3
0     0.04     0.03     0.04
1     0.04     0.03     0.03
2     0.03     0.02     0.04

naive time test

+2

piRSquared 27 Mar 17 at 17:22

source to share

Boud · Accepted Answer · 2017-03-27T14:43:23+0000

To work with ranges pd.cut

is your friend. Based on your current frame b

, you only need to make changes to the list of bins you passed as an argument to determine the lowest range. I put 0 here, since there are no negative sales, but you can put any negative number if needed, even deal with -np.inf

and np.inf

instead 1E14

for your lower and upper bounds:

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]: 
  Sales_1 Sales_2 Sales_3
0    0.04    0.03    0.04
1    0.04    0.03    0.03
2    0.03    0.02    0.04

I found b

, as shown below, clearer to use with cutout:

          Sales  Commission
0          -inf         NaN
1         50000        0.05
2        250000        0.04
3        750000        0.03
4           inf        0.02

The arguments then become:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()

Apply lookup table in DataFrame for bins or ranges

numpy (adsbygoogle = window.adsbygoogle || []).push({});

pandas (adsbygoogle = window.adsbygoogle || []).push({});

More articles:

`numpy`

`pandas`