Panda manipulating one column into a new column
How can I do complex panda column manipulation into a new column? eg:
import pandas as pd
import ast
d = {'col1' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'col2' : pd.Series(['[9, 10]', '[10, 11]', '[11, 12]', '[12,13]'],
index=['a', 'b', 'c', 'd'])
}
df = pd.DataFrame(d)
print(df)
So the last column is actually a string, but I want to convert it to a list.
I tried:
df['new'] = ast.literal_eval(df['col2')
which throws an error.
I've tried many other things and couldn't get anything.
I suppose there is another way to answer this question:
In the previous file, I created my df with lists being column elements and then stored in csv. When I open the csv file, the lists are interpreted as strings. So another solution would be to keep the original panda in a way that keeps the lists.
source to share
json.loads
works because your lists are valid json
. You can use json
already imported inpandas
df.assign(new=df.col2.apply(pd.io.json.loads))
col1 col2 new
a 1 [9, 10] [9, 10]
b 2 [10, 11] [10, 11]
c 3 [11, 12] [11, 12]
d 4 [12,13] [12, 13]
print(type(df.assign(new=df.col2.apply(pd.io.json.loads)).iloc[0, -1]))
<class 'list'>
For some reason, parsing json
seems to be faster thanliteral_eval
%timeit df.assign(new=df.col2.apply(pd.io.json.loads))
%timeit df.assign(new=df.col2.apply(literal_eval))
%timeit df.assign(new=[ast.literal_eval(x) for x in df['col2']])
small data
1000 loops, best of 3: 410 Β΅s per loop
1000 loops, best of 3: 468 Β΅s per loop
1000 loops, best of 3: 397 Β΅s per loop
big data
df = pd.concat([df] * 10000, ignore_index=True)
100 loops, best of 3: 17.9 ms per loop
1 loop, best of 3: 333 ms per loop
1 loop, best of 3: 331 ms per loop
source to share