Pandas - Create a new column with a field for an indexed floating point frame

Question

Pandas - Create a new column with a field for an indexed floating point frame

I am using pandas 13.0 and I am trying to create a new colum using apply () and the function name foo ().

My dataframe looks like this:

df = pandas.DataFrame({
         'a':[ 0.0,  0.1,  0.2,  0.3], 
         'b':[10.0, 20.0, 30.0, 40.0], 
         'c':[ 1.0,  2.0,  3.0,  4.0]
     })

df.set_index(df['a'], inplace=True)

So my dataframe is:

in: print df

out:
           a    b     c
      a
      0.0  0.0  10.0  1.0
      0.1  0.1  20.0  2.0
      0.2  0.2  30.0  3.0
      0.3  0.3  40.0  4.0

My function is like this:

def foo(arg1, arg2):
    return arg1*arg2

Now I want to create the column name 'd' using foo ();

df['d'] = df.apply(foo(df['b'], df['c']), axis=1)

But I am getting the following error:

TypeError: ("'Series' object is not callable", u'occurred at index 0.0')

How can I use pandas.apply () with foo () for an index that are made from floats?

thank

+3

python pandas

Julien 08 Sep '14 at 7:36

source to share

1 answer

EdChum · Accepted Answer · 2014-09-08T07:40:25+0000

The problem here is that you are trying to process this string, but you are passing through the series as arguments, which is wrong, you can do it like this:

In [7]:

df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
df
Out[7]:
       a   b  c    d
a                   
0.0  0.0  10  1   10
0.1  0.1  20  2   40
0.2  0.2  30  3   90
0.3  0.3  40  4  160

Your best bet would be to just call your direct function:

In [8]:

df['d'] = foo(df['b'], df['c'])
df
Out[8]:
       a   b  c    d
a                   
0.0  0.0  10  1   10
0.1  0.1  20  2   40
0.2  0.2  30  3   90
0.3  0.3  40  4  160

The advantage of the above method is that it is vectorized and will perform the operation across the entire series rather than line at a time.

In [15]:

%timeit df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
%timeit df['d'] = foo(df['b'], df['c'])
1000 loops, best of 3: 270 µs per loop
1000 loops, best of 3: 214 µs per loop

Not much difference here, now compare with 400,000 lines of df:

In [18]:

%timeit df['d'] = df.apply(lambda row: foo(row['b'], row['c']), axis=1)
%timeit df['d'] = foo(df['b'], df['c'])
1 loops, best of 3: 5.84 s per loop
100 loops, best of 3: 8.68 ms per loop

So you see ~ 672x faster here.

Pandas - Create a new column with a field for an indexed floating point frame

More articles: