How do I add seasonal dummy variables?

I would like to add seasonality dummies to mine R data.table

based on quarters. I've looked at several examples, but I haven't been able to solve this problem yet. My knowledge about is R

limited, so I was wondering if you can get me on the right track.

Mine data.table

looks like this:

    Year_week  artist_id  number_of_events number_of_streams
   1:     16/30    8296         1            957892
   2:     16/33    8296         6            882282
   3:     16/34    8296         5            926037
   4:     16/35    8296         2            952704
   5:     15/37    17879        1             89515
   6:     16/22    22690        2            119653

      

I would like to have a format like this:

 Year_week  artist_id  number_of_events number_of_streams Q2 Q3 Q4
   1:     16/50    8296         1            957892        0  0  1       

      

+3


source to share


3 answers


Two approaches:

1) Using dcast

, cut

and sub

:

dcast(DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
                    breaks = c(0,13,26,39,53),
                    labels = paste0('Q',1:4))],
      Year_week + artist_id + number_of_events + number_of_streams ~ Q,
      value.var = 'Q',
      drop = c(TRUE,FALSE),
      fun = length)

      

gives:

   Year_week artist_id number_of_events number_of_streams Q1 Q2 Q3 Q4
1:     15/37     17879                1             89515  0  0  1  0
2:     16/22     22690                2            119653  0  1  0  0
3:     16/30      8296                1            957892  0  0  1  0
4:     16/33      8296                6            882282  0  0  1  0
5:     16/34      8296                5            926037  0  0  1  0
6:     16/35      8296                2            952704  0  0  1  0

      

What does it do:

  • as.integer(sub('.*/','',Year_week))

    outputs the week number from the column Year_week

  • Using cut

    , you will split it into quarters with appropriate labels (see also ?cut

    )
  • With, dcast

    you convert the quarter column to wide format using the aggregation ( length

    ) function . By using drop = c(TRUE,FALSE)

    in a function dcast

    , you will ensure that all quarters are on.

Notes:



  • Q

    -column is an ordered coefficient, so you can use it to organize and filter your data.
  • Depending on your use of dummy columns: you don't always need them. If you want to use them as a grouping or filtering of variables, you can just work with the variable Q

    .
  • However, some statistical tests require dummy variables (which justifies the step dcast

    ).

2) Using cut

, sub

and lapply

:

DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
              breaks = c(0,13,26,39,53),
              labels = paste0('Q',1:4))
   ][, paste0('Q',1:4) := lapply(paste0('Q',1:4), function(q) as.integer(q == Q))][]

      

which gives a similar result. Instead of transposing with, dcast

you just check if one of the square labels is in the column Q

.


Data used:

DT <- fread(' Year_week  artist_id  number_of_events number_of_streams
     16/30    8296         1            957892
     16/33    8296         6            882282
     16/34    8296         5            926037
     16/35    8296         2            952704
     15/37    17879        1             89515
     16/22    22690        2            119653')

      

+4


source


I assumed that Year_week

is where we can extract the recording date.



library(data.table)

whichQuart <- function(x){
  data.frame(+(x <= 13),
    +(x >13 & x <= 26),
    +(x > 26 & x <= 39),
    +(x > 39 & x <= 52))
}

dt <-     setDT(read.table(text="Year_week  artist_id  number_of_events number_of_streams
1:     16/30    8296         1            957892
2:     16/33    8296         6            882282
3:     16/34    8296         5            926037
4:     16/35    8296         2            952704
5:     15/37    17879        1             89515
6:     16/22    22690        2            119653", header=TRUE, stringsAsFactors=FALSE))

dt[, week := strsplit(Year_week, "/")[2]]  
dt[, c("Q1", "Q2", "Q3", "Q4") := whichQuart(week)]

#   Year_week artist_id number_of_events number_of_streams week Q1 Q2 Q3 Q4
#1:     16/30      8296                1            957892   16  0  1  0  0
#2:     16/33      8296                6            882282   33  0  0  1  0
#3:     16/34      8296                5            926037   16  0  1  0  0
#4:     16/35      8296                2            952704   33  0  0  1  0
#5:     15/37     17879                1             89515   16  0  1  0  0
#6:     16/22     22690                2            119653   33  0  0  1  0

      

+1


source


add a column quarter

to yourdf

df$quarter <- as.factor(df$quarter)
df <- cbind(df, model.matrix(~quarter, df))

      

Hope it works!

-3


source







All Articles