Correct the previous year by id inside R

I have data something like this:

df <- data.frame(Id=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,9,9,9,9),Date=c("2013-04","2013-12","2013-01","2013-12","2013-11",
             "2013-12","2012-04","2013-12","2012-08","2014-12","2013-08","2014-12","2013-08","2014-12","2011-01","2013-11","2013-12","2014-01","2014-04"))

      

To get the correct format:

df$Date <- paste0(df$Date,"-01")

      

I would only need to get years

so that each id contains two dates following each other.

I, if we make the existing data something like this:

require(lubridate)
df$Date <- year(as.Date(df$Date)-days(1))

      

I get sometimes the same date for a given one id

.

The required output for the column Date

is:

 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011 2013 2014

      

Note that the last date for a given is id

always correct, so only the previous year needs to be corrected based on the last date. The date must be in a format that can only be converted to years as shown.

EDIT Here's an example:

Id Date 
1 2013-11-01    
1 2013-12-01     
1 2014-01-01    
1 2014-04-01

      

I now get the following: 2012,2013,2013,2013

I will need: 2012,2013,2013,2014

+3


source to share


3 answers


Using dplyr

using a similar approach as @David Arenburg's

library(dplyr)
 df %>% 
    group_by(Id) %>% 
    mutate(year=as.numeric(sub('-.*', '', Date)),
     year=replace(year, n()>1, c(year[2]-1, year[2])))
#    Id    Date year
#1   1 2013-04 2012
#2   1 2013-12 2013
#3   2 2013-01 2012
#4   2 2013-12 2013
#5   3 2013-11 2012
#6   3 2013-12 2013
#7   4 2012-04 2012
#8   4 2013-12 2013
#9   5 2012-08 2013
#10  5 2014-12 2014
#11  6 2013-08 2013
#12  6 2014-12 2014
#13  7 2013-08 2013
#14  7 2014-12 2014
#15  8 2011-01 2011

      

Or using base R

with(df, ave(as.numeric(sub('-.*', '', Date)), Id, 
     FUN=function(x) if(length(x)>1)(x[2]-1):x[2] else x))

#[1] 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011

      

Update



You may try

df$indx <- with(df, ave(Id, Id, FUN=function(x) (seq_along(x)-1)%/%2+1))

with(df, ave(as.numeric(sub('-.*', '', Date)), Id, indx, 
         FUN=function(x) if(length(x)>1)(x[2]-1):x[2] else x)) 
#[1] 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011
#[16] 2012 2013 2013 2014

      

or

 df %>%
     group_by(Id) %>%
     mutate(year=as.numeric(sub('-.*', '', Date))) %>% 
     group_by(indx=cumsum(rep(c(TRUE,FALSE), length.out=n())), add=TRUE) %>% 
     mutate(year=replace(year, n()>1, c(year[2]-1, year[2])))

      

+3


source


This is how I could solve it with a package data.table

(although it looks complicated to me)

library(data.table)
setDT(df)[, year := year(Date)][, 
            year := if(.N == 2) (year[2] - 1):year[2] else year,
            Id][]    

#     Id       Date year indx
#  1:  1 2013-04-01 2012    2
#  2:  1 2013-12-01 2013    2
#  3:  2 2013-01-01 2012    2
#  4:  2 2013-12-01 2013    2
#  5:  3 2013-11-01 2012    2
#  6:  3 2013-12-01 2013    2
#  7:  4 2012-04-01 2012    2
#  8:  4 2013-12-01 2013    2
#  9:  5 2012-08-01 2013    2
# 10:  5 2014-12-01 2014    2
# 11:  6 2013-08-01 2013    2
# 12:  6 2014-12-01 2014    2
# 13:  7 2013-08-01 2013    2
# 14:  7 2014-12-01 2014    2
# 15:  8 2011-01-01 2011    1

      

Or all in one step (thanks to @Arun for providing this):

setDT(df)[, year := {tmp = year(Date); 
            if (.N == 2L) (tmp[2]-1L):tmp[2] else tmp},
            Id]

      




Edit : For the new OPs data, we can change the code by adding an additional index

setDT(df)[, indx := if(.N > 2) rep(seq_len(.N/2), each = 2) + 1L else .N, Id] 
df[, year := {tmp = year(Date); if (.N > 1L) (tmp[2] - 1L):tmp[2] else tmp}, 
     list(Id, indx)][]
#     Id       Date indx year
#  1:  1 2013-04-01    2 2012
#  2:  1 2013-12-01    2 2013
#  3:  2 2013-01-01    2 2012
#  4:  2 2013-12-01    2 2013
#  5:  3 2013-11-01    2 2012
#  6:  3 2013-12-01    2 2013
#  7:  4 2012-04-01    2 2012
#  8:  4 2013-12-01    2 2013
#  9:  5 2012-08-01    2 2013
# 10:  5 2014-12-01    2 2014
# 11:  6 2013-08-01    2 2013
# 12:  6 2014-12-01    2 2014
# 13:  7 2013-08-01    2 2013
# 14:  7 2014-12-01    2 2014
# 15:  8 2011-01-01    1 2011
# 16:  9 2013-11-01    2 2012
# 17:  9 2013-12-01    2 2013
# 18:  9 2014-01-01    3 2013
# 19:  9 2014-04-01    3 2014

      

Or another possible solution provided by @akrun

setDT(df)[, `:=`(year = year(Date), indx = .N, indx2 = as.numeric(gl(.N,2, .N))), Id]
df[indx > 1, year:=(year[2]-1):year[2], list(Id, indx2)][]

      

+4


source


Here's a dplyr

solution. You can remove the intermediate fields last_year

and year2

, but I left them here for clarity:

library(stringr)
library(dplyr)

df %>%
  group_by(Id) %>%
  mutate(
    last_year = last(as.integer(str_sub(Date, 1, 4))),
    year2 = row_number() - n(),
    year = last_year + year2
    )

      

+2


source







All Articles