Split every cell in dataframe (pandas / python)
I have a large pandas framework consisting of many rows and columns containing binary data such as' 0 | 1 ',' 0 | 0 ',' 1 | 1 ',' 1 | 0 ', which I would like to split in either 2 data frames, and / or expand so that this (both helpful to me):
a b c d
rowa 1|0 0|1 0|1 1|0
rowb 0|1 0|0 0|0 0|1
rowc 0|1 1|0 1|0 0|1
becomes
a b c d
rowa1 1 0 0 1
rowa2 0 1 1 0
rowb1 0 0 0 0
rowb2 1 0 0 1
rowc1 0 1 1 0
rowc2 1 0 0 1
and / or
df1: a b c d
rowa 1 0 0 1
rowb 0 0 0 0
rowc 0 1 1 0
df2: a b c d
rowa 0 1 1 0
rowb 1 0 0 1
rowc 1 0 0 1
I am currently trying to do something like the following but find it not very efficient, any advice would be helpful.
Atmp_dict=defaultdict(list)
Btmp_dict=defaultdict(list)
for index,row in df.iterrows():
for columnname in list(df.columns.values):
Atmp_dict[columnname].append(row[columnname].split('|')[0])
Btmp_dict[columnname].append(row[columnname].split('|')[1])
source to share
It's quite compact, but it seems like it should be even simpler and more compact.
df1 = df.applymap( lambda x: str(x)[0] )
df2 = df.applymap( lambda x: str(x)[2] )
Or iterate over the columns as in other answers. I don't think this is important. Note that since the question is asking binary data, it is ok (and easier) to just do str[0]
and str[2]
and not use split
or extract
.
Or you could do it, which seems almost silly, but there is nothing wrong with that and it is pretty compact.
df1 = df.stack().str[0].unstack()
df2 = df.stack().str[2].unstack()
stack
just converts it to a series so you can use str
and then unstack
converts it back to dataframe.
source to share
user2734178 is close, but his or her answer has some problems. Here's a small variation that works
import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
# df is your original DataFrame
for col in df.columns:
df1[col] = df[col].apply(lambda x: x.split('|')[0])
df2[col] = df[col].apply(lambda x: x.split('|')[1])
Here's another option that's a little more elegant. Replace the loop as follows:
for col in df.columns:
df1[col] = df[col].str.extract("(\d)\|")
df2[col] = df[col].str.extract("\|(\d)")
source to share
Since all of your values ββare strings, you can use an accessory .str
to separate everything using a pipe as a separator, comme ca.
import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
#df is defined as in your first example
for col in df.columns:
df1[col] = df[col].str[0]
df2[col] = df[col].str[-1]
Then you probably want to redo the columns df1
and df2
how int
using astype(int)
.
source to share