Pandas - Create a new column with a field for an indexed floating point frame

I am using pandas 13.0 and I am trying to create a new colum using apply () and the function name foo ().

My dataframe looks like this:

df = pandas.DataFrame({
         'a':[ 0.0,  0.1,  0.2,  0.3], 
         'b':[10.0, 20.0, 30.0, 40.0], 
         'c':[ 1.0,  2.0,  3.0,  4.0]
     })

df.set_index(df['a'], inplace=True)

      

So my dataframe is:

in: print df

out:
           a    b     c
      a
      0.0  0.0  10.0  1.0
      0.1  0.1  20.0  2.0
      0.2  0.2  30.0  3.0
      0.3  0.3  40.0  4.0 

      

My function is like this:

def foo(arg1, arg2):
    return arg1*arg2

      

Now I want to create the column name 'd' using foo ();

df['d'] = df.apply(foo(df['b'], df['c']), axis=1)

      

But I am getting the following error:

TypeError: ("'Series' object is not callable", u'occurred at index 0.0')

      

How can I use pandas.apply () with foo () for an index that are made from floats?

thank

+3


source to share


1 answer


The problem here is that you are trying to process this string, but you are passing through the series as arguments, which is wrong, you can do it like this:

In [7]:

df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
df
Out[7]:
       a   b  c    d
a                   
0.0  0.0  10  1   10
0.1  0.1  20  2   40
0.2  0.2  30  3   90
0.3  0.3  40  4  160

      

Your best bet would be to just call your direct function:

In [8]:

df['d'] = foo(df['b'], df['c'])
df
Out[8]:
       a   b  c    d
a                   
0.0  0.0  10  1   10
0.1  0.1  20  2   40
0.2  0.2  30  3   90
0.3  0.3  40  4  160

      

The advantage of the above method is that it is vectorized and will perform the operation across the entire series rather than line at a time.



In [15]:

%timeit df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
%timeit df['d'] = foo(df['b'], df['c'])
1000 loops, best of 3: 270 µs per loop
1000 loops, best of 3: 214 µs per loop

      

Not much difference here, now compare with 400,000 lines of df:

In [18]:

%timeit df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
%timeit df['d'] = foo(df['b'], df['c'])
1 loops, best of 3: 5.84 s per loop
100 loops, best of 3: 8.68 ms per loop

      

So you see ~ 672x faster here.

+5


source







All Articles