Creating new records in dataframe based on character

Question

Creating new records in dataframe based on character

I have fields in a pandas dataframe like the example data below. The values in one of the fields are fractions with the form something / count (something). I would like to split the values like below and create new entries. Basically the numerator and denominator. Some of the values even have multiple /s, like count (something) / count (thing) / count (dog). So I would like to split this value into 3 records. Any advice on how to do this would be greatly appreciated.

Sample Data:

SampleDf=pd.DataFrame([['tom','sum(stuff)/count(things)'],['bob','count(things)/count(stuff)']],columns=['ReportField','OtherField'])


Example Output:

OutputDf=pd.DataFrame([['tom1','sum(stuff)'],['tom2','count(things)'],['bob1','count(things)'],['bob2','count(stuff)']],columns=['ReportField','OtherField'])

+3

python pandas

ndderwerdo 06 jul. 17 at 2:55

source to share

3 answers

Vaishali · Answer 1 · 2017-07-06T03:19:16+0000

There might be a better way, but try this,

df = df.set_index('ReportField')
df = pd.DataFrame(df.OtherField.str.split('/', expand = True).stack().reset_index(-1, drop = True)).reset_index()

You get

    ReportField 0
0   tom         sum(stuff)
1   tom         count(things)
2   bob         count(things)
3   bob         count(stuff)

0p3n5ourcE · Answer 2 · 2017-07-06T03:43:01+0000

One possible way could be as follows:

# split and stack 
new_df = pd.DataFrame(SampleDf.OtherField.str.split('/').tolist(), index=SampleDf.ReportField).stack().reset_index()
print(new_df)

Output:

    ReportField  level_1          0
0         tom        0     sum(stuff)
1         tom        1  count(things)
2         bob        0  count(things)
3         bob        1   count(stuff)

Now combine ReportField

with level_1

:

# combine strings for tom1, tom2 ,.....
new_df['ReportField'] = new_df.ReportField.str.cat((new_df.level_1+1).astype(str))

# remove level column
del new_df['level_1']
# rename columns
new_df.columns = ['ReportField', 'OtherField']
print (new_df)

Output:

    ReportField     OtherField
0        tom1       sum(stuff)
1        tom2    count(things)
2        bob1    count(things)
3        bob2     count(stuff)

jezrael · Answer 3 · 2017-07-06T05:32:59+0000

You can use:

split

with expand=True

for newDataFrame
change stack

andreset_index
add counter to column ReportField

converting to str

usingastype
remove helper column level_1

withdrop

OutputDf = SampleDf.set_index('ReportField')['OtherField'].str.split('/',expand=True)
                   .stack().reset_index(name='OtherField')
OutputDf['ReportField'] = OutputDf['ReportField'] + OutputDf['level_1'].add(1).astype(str)
OutputDf = OutputDf.drop('level_1', axis=1)
print (OutputDf)
  ReportField     OtherField
0        tom1     sum(stuff)
1        tom2  count(things)
2        bob1  count(things)
3        bob2   count(stuff)

Creating new records in dataframe based on character

More articles: