Python: efficient separator column in pandas DF

Question

Python: efficient separator column in pandas DF

Suppose I have a DF containing a column of the form

0     A.1
1     A.2
2     B.3
3     4.C

And suppose I want to split these columns into. using only the element after '.'. The naive way to do it would be

for i in range(len(tbl)):
  tbl['column_name'].iloc[i] = tbl['column_name'].iloc[i].split('.',1)[1]

It works. And this is very slow for large tables. Does anyone have any idea how to speed up the process? I can use the new columns in DF, so I am not limited to changing the original column (as I am reusing it in the example). Thank!

+3

performance python split pandas

user3861925 04 june 15 at 7:33

source to share

2 answers

Ami tavory · Answer 1 · 2015-06-04T07:58:15+0000

pandas

has string methods that do things like this efficiently without loops (performance kill ). In this case, you can use .str.split

:

>> import pandas as pd
>> df = pd.DataFrame({'a': ['A.1', 'A.2', 'B.3', 'C.4']})
>> df
    a
0   A.1
1   A.2
2   B.3
3   C.4
>> df.a.str.split('.').apply(pd.Series)
    0   1
0   A   1
1   A   2
2   B   3
3   C   4

maxymoo · Answer 2 · 2015-06-04T07:53:22+0000

For a large data frame, it should be faster to use map

instead of a for loop:

%timeit df['newcol']  = df.column_name.map(lambda x: x.split('.')[1])
100 loops, best of 3: 10.7 ms per loop

%timeit for i in range(len(df)): df['newcol'].iloc[i] = df['column_name'].iloc[i].split('.',1)[1]
1 loops, best of 3: 7.63 s per loop

Python: efficient separator column in pandas DF

More articles: