Split every cell in dataframe (pandas / python)

I have a large pandas framework consisting of many rows and columns containing binary data such as' 0 | 1 ',' 0 | 0 ',' 1 | 1 ',' 1 | 0 ', which I would like to split in either 2 data frames, and / or expand so that this (both helpful to me):

        a   b   c   d
rowa    1|0 0|1 0|1 1|0
rowb    0|1 0|0 0|0 0|1
rowc    0|1 1|0 1|0 0|1

      

becomes

        a   b   c   d
rowa1   1   0   0   1
rowa2   0   1   1   0
rowb1   0   0   0   0
rowb2   1   0   0   1
rowc1   0   1   1   0
rowc2   1   0   0   1

      

and / or

    df1:    a   b   c   d
    rowa    1   0   0   1
    rowb    0   0   0   0
    rowc    0   1   1   0


    df2:    a   b   c   d
    rowa    0   1   1   0
    rowb    1   0   0   1
    rowc    1   0   0   1

      

I am currently trying to do something like the following but find it not very efficient, any advice would be helpful.

Atmp_dict=defaultdict(list)
Btmp_dict=defaultdict(list)

for index,row in df.iterrows():
    for columnname in list(df.columns.values):
        Atmp_dict[columnname].append(row[columnname].split('|')[0])
        Btmp_dict[columnname].append(row[columnname].split('|')[1])

      

+3


source to share


3 answers


It's quite compact, but it seems like it should be even simpler and more compact.

df1 = df.applymap( lambda x: str(x)[0] ) 
df2 = df.applymap( lambda x: str(x)[2] )

      

Or iterate over the columns as in other answers. I don't think this is important. Note that since the question is asking binary data, it is ok (and easier) to just do str[0]

and str[2]

and not use split

or extract

.



Or you could do it, which seems almost silly, but there is nothing wrong with that and it is pretty compact.

df1 = df.stack().str[0].unstack()
df2 = df.stack().str[2].unstack()

      

stack

just converts it to a series so you can use str

and then unstack

converts it back to dataframe.

+1


source


user2734178 is close, but his or her answer has some problems. Here's a small variation that works

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()

# df is your original DataFrame
for col in df.columns:
    df1[col] = df[col].apply(lambda x: x.split('|')[0])
    df2[col] = df[col].apply(lambda x: x.split('|')[1])

      



Here's another option that's a little more elegant. Replace the loop as follows:

for col in df.columns:
    df1[col] = df[col].str.extract("(\d)\|")
    df2[col] = df[col].str.extract("\|(\d)")

      

+2


source


Since all of your values ​​are strings, you can use an accessory .str

to separate everything using a pipe as a separator, comme ca.

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()

#df is defined as in your first example
for col in df.columns:
    df1[col] = df[col].str[0]
    df2[col] = df[col].str[-1]

      

Then you probably want to redo the columns df1

and df2

how int

using astype(int)

.

0


source







All Articles