Populate a new column based on an existing column marked as regex pandas

Question

Populate a new column based on an existing column marked as regex pandas

I have a dataframe in Pandas as shown below

df = pd.DataFrame({'Firstname':['Vishal', 'Nishal', 'Indira', 'Jagdish', 'Tamnna'], 'Actual Age':[25,33,58,58,30]})

  Firstname  Actual Age
0    Vishal          25 
1   Nishant          33
2    Indira          58
3   Jagdish          58   
4    Tamnna          30

and regex:

\w+ish\w*

What I don't seem to figure is to provide the result as shown below:

  Firstname  Actual Age  CopyDown
0    Vishal          25    Vishal
1   Nishant          33   Nishant
2    Indira          58   Nishant
3   Jagdish          58   Jagdish
4    Tamnna          30   Jagdish

So basically what I want to do is look at the Firstname column and if I can match a given regex, keep copying that value in the new column until the next match is found, and basically keep doing that until you get it through.

Any ideas? I've been stuck on this for a few days. Its basically a copy function that I want to implement, which might be useful on denormalized datasets. (using dates as stuff)

Thank you in advance

+3

python pandas regex

Vishal jethwa 13 jul. 15 at 15:12

source to share

2 answers

You can use Series.str.extract("(\w+ish\w*)")

to get matches.

Then you can use Series.fillna(method='ffill')

to fill empty matches forward

This line should suffice:

df['CopyDown'] = df.Firstname.str.extract('(\w+ish\w*)').fillna(method='ffill')

+3

firelynx 13 jul. '15 at 15:30

source to share

Jianxun Li · Accepted Answer · 2015-07-13T15:30:21+0000

Here's one way to do it. First determine if there is a match. Then groupby

using a trick cumsum

. Finally, fill in each subgroup using the first value.

import pandas as pd
import re

# your data
# =============================
print(df)


  Firstname  Actual Age
0    Vishal          25
1   Nishant          33
2    Indira          58
3   Jagdish          58
4    Tamnna          30

# processing
# =============================
pattern = re.compile(r'\w+ish\w*')

df['matched'] = [(pattern.match(x) is not None) for x in df.Firstname.values]
df['diff_names'] = df.matched.astype(int).cumsum()


def func(group):
    group['CopyDown'] = group['Firstname'].values[0]
    return group.drop(['matched', 'diff_names'], axis=1)

df.groupby('diff_names').apply(func)


  Firstname  Actual Age CopyDown
0    Vishal          25   Vishal
1   Nishant          33  Nishant
2    Indira          58  Nishant
3   Jagdish          58  Jagdish
4    Tamnna          30  Jagdish

Populate a new column based on an existing column marked as regex pandas

More articles: