Split list of tuples in dataframe column into dataframe columns
I have a dataframe that contains a list of tuples in one of its columns. I need to split the tuples into appropriate columns. My dataframe df looks like this: -
A B
[('Apple',50),('Orange',30),('banana',10)] Winter
[('Orange',69),('WaterMelon',50)] Summer
The expected output should be:
Fruit rate B
Apple 50 winter
Orange 30 winter
banana 10 winter
Orange 69 summer
WaterMelon 50 summer
+3
source to share
3 answers
You can use the DataFrame
constructor with numpy.repeat
and numpy.concatenate
:
df1 = pd.DataFrame(np.concatenate(df.A), columns=['Fruit','rate']).reset_index(drop=True)
df1['B'] = np.repeat(df.B.values, df['A'].str.len())
print (df1)
Fruit rate B
0 Apple 50 Winter
1 Orange 30 Winter
2 banana 10 Winter
3 Orange 69 Summer
4 WaterMelon 50 Summer
Another solution with chain.from_iterable
:
from itertools import chain
df1 = pd.DataFrame(list(chain.from_iterable(df.A)), columns=['Fruit','rate'])
.reset_index(drop=True)
df1['B'] = np.repeat(df.B.values, df['A'].str.len())
print (df1)
Fruit rate B
0 Apple 50 Winter
1 Orange 30 Winter
2 banana 10 Winter
3 Orange 69 Summer
4 WaterMelon 50 Summer
+1
source to share
This should work:
fruits = []
rates = []
seasons = []
def create_lists(row):
tuples = row['A']
season = row['B']
for t in tuples:
fruits.append(t[0])
rates.append(t[1])
seasons.append(season)
df.apply(create_lists, axis=1)
new_df = pd.DataFrame({"Fruit" :fruits, "Rate": rates, "B": seasons})[["Fruit", "Rate", "B"]]
output:
Fruit Rate B
0 Apple 50 winter
1 Orange 30 winter
2 banana 10 winter
3 Orange 69 summer
4 WaterMelon 50 summer
+1
source to share
You can do this in a chained operation:
(
df.apply(lambda x: [[k,v,x.B] for k,v in x.A],axis=1)
.apply(pd.Series)
.stack()
.apply(pd.Series)
.reset_index(drop=True)
.rename(columns={0:'Fruit',1:'rate',2:'B'})
)
Out[1036]:
Fruit rate B
0 Apple 50 Winter
1 Orange 30 Winter
2 banana 10 Winter
3 Orange 69 Summer
4 WaterMelon 50 Summer
0
source to share