Create a loop to insert or remove items based on different scenarios
Let's say I have the following dataset:
mydf <- data.frame( "MemberID"=c("111","0111A","0111B","112","0112A","113","0113B"),
"resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA))
Note: 111,112 and 113 are family member IDs.
I would like to do two things:
a) if I have resignation dates for a family representative, for example in case 111, I want to insert the same resignation dates for 0111A and 0111B (they represent the spouse and children of 111, if you're interested) <w> b) if I have there are no resignation dates for the family representative, for example 113, I would just like to delete lines 113 and 0113B.
My resulting dataframe should look like this:
mydf <- data.frame("MemberID"=c("111","0111A","0111B","112","0112A"),
"resign.date"=c("2013/01/01","2013/01/01","2013/01/01","2014/03/01","2014/03/01"))
Thanks in advance.
source to share
If resign.date
only present for (some) MembersID
no trailing letters, solution usingdata.table
library(data.table)
df <- data.table( "MemberID"=c("0111","0111A","0111B","0112","0112A","0113","0113B"),
"resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA))
df <- df[order(MemberID)] ## order data : MemberIDs w/out trailing letters first by ID
df[, myID := gsub("\\D+", "", MemberID)] ## create myID col : MemberID w/out trailing letters
df[ , my.resign.date := resign.date[1L], by = myID] ##assign first occurrence of resign date by myID
df <- df[!is.na(my.resign.date)] ##drop rows if my.resign.date is missing
EDIT
If there are inconsistencies in MemberID
(some of them have 0, some of them do not work), you can try to work a little, like in the future
df <- data.table( "MemberID"=c("111","0111A","0111B","112","0112A","113","0113B"),
"resign.date"=c("2013/01/01",NA,NA,"2014/03/01",NA,NA,NA))
df[, myID := gsub("(?<![0-9])0+", "", gsub("\\D+", "", MemberID), perl = TRUE)]
df <- df[order(myID, -MemberID)]
df[ , my.resign.date := resign.date[1L], by = myID]
df <- df[!is.na(my.resign.date)]
source to share
We can also use tidyverse
library(tidyverse)
mydf %>%
group_by(grp = parse_number(MemberID)) %>%
mutate(resign.date = first(resign.date)) %>%
na.omit() %>%
ungroup() %>%
select(-grp)
# A tibble: 5 x 2
# MemberID resign.date
# <fctr> <fctr>
#1 0111 2013/01/01
#2 0111A 2013/01/01
#3 0111B 2013/01/01
#4 0112 2014/03/01
#5 0112A 2014/03/01
source to share