Determine if a date in one column exists after a date in another column by group
Tell me that I pay clients after the date of service and stop serving them if they do not pay the bill. But the lag between service date and invoice date makes compliance difficult when customers request an additional service. To determine if clients are delinquent, I will need to know if the date of the new requested service will occur after the unpaid invoice has been sent (which could have been sent much later than the service date).
Sample data
df <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"), service.date = structure(c(1L, 3L, 5L, 6L, 2L, 9L, 4L, 7L, 8L, 10L), .Label = c("2011-01-01", "2011-01-03", "2011-02-01", "2011-03-01", "2011-03-02", "2011-04-02", "2011-05-09", "2011-08-19", "2011-09-02", "2011-09-10"), class = "factor"), bill.date = structure(c(4L, 5L, 2L, 6L, 9L, 1L, 8L, 10L, 3L, 7L), .Label = c("2011-08-09", "2011-08-10", "2011-08-11", "2011-08-12", "2011-08-13", "2011-08-14", "2011-08-15", "2011-08-16", "2011-08-17", "2011-08-19"), class = "factor")), .Names = c("id", "service.date", "bill.date"), class = "data.frame", row.names = c(NA, -10L))
# df
# id service.date bill.date
# A 2011-01-01 2011-08-12
# B 2011-02-01 2011-08-13
# C 2011-03-02 2011-08-10
# D 2011-04-02 2011-08-14
# A 2011-01-03 2011-08-17
# A 2011-09-02 2011-08-09
# B 2011-03-01 2011-08-16
# C 2011-05-09 2011-08-19
# B 2011-08-19 2011-08-11
# B 2011-09-10 2011-08-15
Therefore, if they requested an additional service before the invoice was sent for their initial service, they will not be considered offenders for now. But if they request an additional service after being billed and remain unpaid, they will be out of date.
Steps Longer
My idea is to use a grouping function, perhaps how by()
to find the first "bill.date" associated with the level in the factor variable "id" and then define for each "service.date" associated with each "id" if this occurs after a matching outstanding "bill.date" for the specified "id" level, eventually creating a boolean variable. Here's an example of what I would like to get:
Desired result
df$delinquent <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)
#df
# id service.date bill.date delinquent
# A 2011-01-01 2011-08-12 FALSE
# B 2011-02-01 2011-08-13 FALSE
# C 2011-03-02 2011-08-10 FALSE
# D 2011-04-02 2011-08-14 FALSE
# A 2011-01-03 2011-08-17 FALSE
# A 2011-09-02 2011-08-09 TRUE
# B 2011-03-01 2011-08-16 FALSE
# C 2011-05-09 2011-08-19 FALSE
# B 2011-08-19 2011-08-11 TRUE
# B 2011-09-10 2011-08-15 TRUE
So there are four "customers" in the sample data (called A, B, C, and D), and two of them would be flagged as misconduct (A and B) to receive service despite the outstanding score.
source to share
# Load some tidyverse libraries
require(dplyr)
# Convert factor dates to actual dates
df <- df %>% mutate(service.date = as.Date(service.date),
bill.date = as.Date(bill.date))
# If service date is later than earliest bill.date in each group, return delinquent
df %>% group_by(id) %>% mutate(delinquent = service.date > min(bill.date))
source to share
How about this method using data.table:
library(data.table)
dt<-as.data.table(df)
dt[order(as.Date(service.date),as.Date(bill.date)),
delinquent:=(cumsum(as.Date(service.date)>=as.Date(bill.date))>=1L),
by=id]
# id service.date bill.date delinquent
# 1: A 2011-01-01 2011-08-12 FALSE
# 2: B 2011-02-01 2011-08-13 FALSE
# 3: C 2011-03-02 2011-08-10 FALSE
# 4: D 2011-04-02 2011-08-14 FALSE
# 5: A 2011-01-03 2011-08-17 FALSE
# 6: A 2011-09-02 2011-08-09 TRUE
# 7: B 2011-03-01 2011-08-16 FALSE
# 8: C 2011-05-09 2011-08-19 FALSE
# 9: B 2011-08-19 2011-08-11 TRUE
#10: B 2011-09-10 2011-08-15 TRUE
This assumes that you want someone to be overdue if they were overdue at least once in the past.
Edit: A way to do it without the need for sorting, inspired by @Vlo:
dt[,delinquent:=as.Date(service.date)>=min(as.Date(bill.date)),by=id]
source to share