Determine if a date in one column exists after a date in another column by group

Question

Determine if a date in one column exists after a date in another column by group

Tell me that I pay clients after the date of service and stop serving them if they do not pay the bill. But the lag between service date and invoice date makes compliance difficult when customers request an additional service. To determine if clients are delinquent, I will need to know if the date of the new requested service will occur after the unpaid invoice has been sent (which could have been sent much later than the service date).

Sample data

df <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"), service.date = structure(c(1L, 3L, 5L, 6L, 2L, 9L, 4L, 7L, 8L, 10L), .Label = c("2011-01-01", "2011-01-03", "2011-02-01", "2011-03-01", "2011-03-02", "2011-04-02", "2011-05-09", "2011-08-19", "2011-09-02", "2011-09-10"), class = "factor"), bill.date = structure(c(4L, 5L, 2L, 6L, 9L, 1L, 8L, 10L, 3L, 7L), .Label = c("2011-08-09", "2011-08-10", "2011-08-11", "2011-08-12", "2011-08-13", "2011-08-14", "2011-08-15", "2011-08-16", "2011-08-17", "2011-08-19"), class = "factor")), .Names = c("id", "service.date", "bill.date"), class = "data.frame", row.names = c(NA, -10L))

# df
# id  service.date     bill.date
# A   2011-01-01       2011-08-12
# B   2011-02-01       2011-08-13
# C   2011-03-02       2011-08-10
# D   2011-04-02       2011-08-14
# A   2011-01-03       2011-08-17
# A   2011-09-02       2011-08-09
# B   2011-03-01       2011-08-16
# C   2011-05-09       2011-08-19
# B   2011-08-19       2011-08-11
# B   2011-09-10       2011-08-15

Therefore, if they requested an additional service before the invoice was sent for their initial service, they will not be considered offenders for now. But if they request an additional service after being billed and remain unpaid, they will be out of date.

Steps Longer My idea is to use a grouping function, perhaps how by()

to find the first "bill.date" associated with the level in the factor variable "id" and then define for each "service.date" associated with each "id" if this occurs after a matching outstanding "bill.date" for the specified "id" level, eventually creating a boolean variable. Here's an example of what I would like to get:

Desired result

df$delinquent <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)

#df

# id    service.date    bill.date   delinquent
# A     2011-01-01      2011-08-12   FALSE
# B     2011-02-01      2011-08-13   FALSE
# C     2011-03-02      2011-08-10   FALSE
# D     2011-04-02      2011-08-14   FALSE
# A     2011-01-03      2011-08-17   FALSE
# A     2011-09-02      2011-08-09   TRUE
# B     2011-03-01      2011-08-16   FALSE
# C     2011-05-09      2011-08-19   FALSE
# B     2011-08-19      2011-08-11   TRUE
# B     2011-09-10      2011-08-15   TRUE

So there are four "customers" in the sample data (called A, B, C, and D), and two of them would be flagged as misconduct (A and B) to receive service despite the outstanding score.

+3

date r grouping

coip 23 Mar 17 at 21:11

source to share

2 answers

How about this method using data.table:

library(data.table)

dt<-as.data.table(df)
dt[order(as.Date(service.date),as.Date(bill.date)),
   delinquent:=(cumsum(as.Date(service.date)>=as.Date(bill.date))>=1L),
   by=id]


#    id service.date  bill.date delinquent
# 1:  A   2011-01-01 2011-08-12      FALSE
# 2:  B   2011-02-01 2011-08-13      FALSE
# 3:  C   2011-03-02 2011-08-10      FALSE
# 4:  D   2011-04-02 2011-08-14      FALSE
# 5:  A   2011-01-03 2011-08-17      FALSE
# 6:  A   2011-09-02 2011-08-09       TRUE
# 7:  B   2011-03-01 2011-08-16      FALSE
# 8:  C   2011-05-09 2011-08-19      FALSE
# 9:  B   2011-08-19 2011-08-11       TRUE
#10:  B   2011-09-10 2011-08-15       TRUE

This assumes that you want someone to be overdue if they were overdue at least once in the past.

Edit: A way to do it without the need for sorting, inspired by @Vlo:

dt[,delinquent:=as.Date(service.date)>=min(as.Date(bill.date)),by=id]

+2

Mike H. 23 Mar 17 at 21:25

source to share

Vlo · Accepted Answer · 2017-03-23T21:25:22+0000

# Load some tidyverse libraries
require(dplyr)

# Convert factor dates to actual dates
df <- df %>% mutate(service.date = as.Date(service.date),
                    bill.date = as.Date(bill.date))

# If service date is later than earliest bill.date in each group, return delinquent
df %>% group_by(id) %>% mutate(delinquent = service.date > min(bill.date))

Determine if a date in one column exists after a date in another column by group

More articles: