Parameter in custom function when using pandas.Series.apply

Question

Parameter in custom function when using pandas.Series.apply

Here is a simple pandas Dataframe as shown below:

df = pd.DataFrame( {
    'word':     ['flower', 'mountain', 'ocean', 'universe'],
    'k':        [1, 2, 3, 4]
} )

>>> df
   k      word
0  1    flower
1  2  mountain
2  3     ocean
3  4  universe

I want to change df to this (replace each word with first k letters)

>>> df
   k  word
0  1     f
1  2    mo
2  3   oce
3  4  univ

I have an idea to achieve this using pandas.Series.apply with a custom function

def get_first_k_letters( x, k ):
    return x[:k]

df['word'] = df['word'].apply( get_first_k_letters, args=(3,) )

>>> df
   k word
0  1  flo
1  2  mou
2  3  oce
3  4  uni

I can easily replace each word with my first three letters by setting args = (3,).

But I want to replace each word with its first k letters (k is not always the same) and I don't know what is the parameter for args in this case.

Can anyone help me? Thank you! (Other methods without using pandas.Series.apply would be fine too!)

+3

python pandas

o0Helloworld0o May 26 '17 at 8:21

source to share

2 answers

You can do:

df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)

When executed apply

with an option axis=1

, each row is output to x

(from lambda

. Provide axis=0

gives columns, not rows). Providing x['word']

and x['k']

your function gives the correct result:

0       f
1      mo
2     oce
3    univ
dtype: object

+2

Mathias711 May 26 '17 at 8:24

source to share

MaxU · Accepted Answer · 2017-05-26T08:48:25+0000

I would consider this approach:

In [121]: df['word'] = [w[1][:w[0]] for w in df.values]

In [122]: df
Out[122]:
   k  word
0  1     f
1  2    mo
2  3   oce
3  4  univ

Timeline: for 40,000 lines DF:

In [123]: df = pd.concat([df] * 10**4, ignore_index=True)

In [124]: df.shape
Out[124]: (40000, 2)

In [125]: %timeit df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)
1 loop, best of 3: 4.04 s per loop

In [126]: %timeit [w[1][:w[0]] for w in df.values]
10 loops, best of 3: 52.5 ms per loop

In [127]: 4.04 * 1000 / 52.5
Out[127]: 76.95238095238095

Parameter in custom function when using pandas.Series.apply

More articles: