Populate a new column based on an existing column marked as regex pandas
I have a dataframe in Pandas as shown below
df = pd.DataFrame({'Firstname':['Vishal', 'Nishal', 'Indira', 'Jagdish', 'Tamnna'], 'Actual Age':[25,33,58,58,30]})
Firstname Actual Age
0 Vishal 25
1 Nishant 33
2 Indira 58
3 Jagdish 58
4 Tamnna 30
and regex:
\w+ish\w*
What I don't seem to figure is to provide the result as shown below:
Firstname Actual Age CopyDown
0 Vishal 25 Vishal
1 Nishant 33 Nishant
2 Indira 58 Nishant
3 Jagdish 58 Jagdish
4 Tamnna 30 Jagdish
So basically what I want to do is look at the Firstname column and if I can match a given regex, keep copying that value in the new column until the next match is found, and basically keep doing that until you get it through.
Any ideas? I've been stuck on this for a few days. Its basically a copy function that I want to implement, which might be useful on denormalized datasets. (using dates as stuff)
Thank you in advance
source to share
Here's one way to do it. First determine if there is a match. Then groupby
using a trick cumsum
. Finally, fill in each subgroup using the first value.
import pandas as pd
import re
# your data
# =============================
print(df)
Firstname Actual Age
0 Vishal 25
1 Nishant 33
2 Indira 58
3 Jagdish 58
4 Tamnna 30
# processing
# =============================
pattern = re.compile(r'\w+ish\w*')
df['matched'] = [(pattern.match(x) is not None) for x in df.Firstname.values]
df['diff_names'] = df.matched.astype(int).cumsum()
def func(group):
group['CopyDown'] = group['Firstname'].values[0]
return group.drop(['matched', 'diff_names'], axis=1)
df.groupby('diff_names').apply(func)
Firstname Actual Age CopyDown
0 Vishal 25 Vishal
1 Nishant 33 Nishant
2 Indira 58 Nishant
3 Jagdish 58 Jagdish
4 Tamnna 30 Jagdish
source to share