Parameter in custom function when using pandas.Series.apply
Here is a simple pandas Dataframe as shown below:
df = pd.DataFrame( {
'word': ['flower', 'mountain', 'ocean', 'universe'],
'k': [1, 2, 3, 4]
} )
>>> df
k word
0 1 flower
1 2 mountain
2 3 ocean
3 4 universe
I want to change df to this (replace each word with first k letters)
>>> df
k word
0 1 f
1 2 mo
2 3 oce
3 4 univ
I have an idea to achieve this using pandas.Series.apply with a custom function
def get_first_k_letters( x, k ):
return x[:k]
df['word'] = df['word'].apply( get_first_k_letters, args=(3,) )
>>> df
k word
0 1 flo
1 2 mou
2 3 oce
3 4 uni
I can easily replace each word with my first three letters by setting args = (3,).
But I want to replace each word with its first k letters (k is not always the same) and I don't know what is the parameter for args in this case.
Can anyone help me? Thank you! (Other methods without using pandas.Series.apply would be fine too!)
source to share
I would consider this approach:
In [121]: df['word'] = [w[1][:w[0]] for w in df.values]
In [122]: df
Out[122]:
k word
0 1 f
1 2 mo
2 3 oce
3 4 univ
Timeline: for 40,000 lines DF:
In [123]: df = pd.concat([df] * 10**4, ignore_index=True)
In [124]: df.shape
Out[124]: (40000, 2)
In [125]: %timeit df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)
1 loop, best of 3: 4.04 s per loop
In [126]: %timeit [w[1][:w[0]] for w in df.values]
10 loops, best of 3: 52.5 ms per loop
In [127]: 4.04 * 1000 / 52.5
Out[127]: 76.95238095238095
source to share
You can do:
df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)
When executed apply
with an option axis=1
, each row is output to x
(from lambda
. Provide axis=0
gives columns, not rows). Providing x['word']
and x['k']
your function gives the correct result:
0 f
1 mo
2 oce
3 univ
dtype: object
source to share