R separates intervals in hourly slots

probably very easy, but tried my best to find answers on the internet, but they usually refer to cut

snapshots as well, not overlapping intervals

x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))


we have 4 observations and I need to highlight them in their respective hourly windows.

                 start                 end
1: 2017-04-18 18:05:00 2017-04-18 19:05:00
2: 2017-04-18 18:00:00 2017-04-18 21:30:00
3: 2017-04-18 21:05:00 2017-04-18 22:00:00
4: 2017-04-18 16:05:00 2017-04-18 16:10:00


the first one, for example, will have 55 minutes at 18:00 slot and 5 minutes at 19:00 slot, the next 60 minutes at 18: 00.19: 00, 20:00 and 30 minutes at 21:00, the third - 55 minutes at 21:00 and the last 5 minutes at 16:00.

the result should be as shown below (sorry if I misunderstood the basic instructions);

              interval   Q
1: 2017-04-18 16:00:00 5
2: 2017-04-18 17:00:00 0
3: 2017-04-18 18:00:00 115
4: 2017-04-18 19:00:00 65
5: 2017-04-18 20:00:00 120
6: 2017-04-18 21:00:00  85


of course there is a direct way to shorten the burst in minutes and do the count by cut / spacing, but I believe the problem is so common that it should have a direct method. It is preferable that I also have windows with 0-digit values, but I can just order them if required


source to share

2 answers

This is a solution using dplyr

First, a helper function is created find_slots

to generate all hours between start

and end

. Then the values ​​are calculated Q


Finally, the data is summarized by grouping each slot.


find_slots <- function(a, b){
    slots = seq(a-minute(a)*60-second(a),

    dateseq = slots
    dateseq[1] = a
    r = c(dateseq, b)

    d = as.numeric(difftime(r[-1], r[-length(r)], unit = 'min'))

    data.frame(slot = slots, Q = d)

x %>%
    rowwise %>%
    do(find_slots(.$start, .$end)) %>%
    ungroup() %>%
    group_by(slot) %>%
    summarize(Q = sum(Q))


Result (no value 0 for 17:00):

                 slot   Q
1 2017-04-18 16:00:00   5
2 2017-04-18 18:00:00 115
3 2017-04-18 19:00:00  65
4 2017-04-18 20:00:00  60
5 2017-04-18 21:00:00  85
6 2017-04-18 22:00:00   0


Edit: using data.table

(Possibly faster, but I'm not very good at data.table)

A library is also used fasttime

to speed up time parsing.


x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))

find_slots2 <- function(a, b){
    a = fasttime::fastPOSIXct(a)
    b = fasttime::fastPOSIXct(b)
    slots = seq(a-data.table::minute(a)*60-data.table::second(a)*60,

    hourseq = c(a, slots[-1], b)

    d = difftime(hourseq[-1], hourseq[-length(hourseq)], unit = 'min')

    list(slot = slots, Q = d)

x[, find_slots2(start, end), by = 1:nrow(x)][order(slot), .(Q = as.numeric(sum(Q))), by = slot]




Lubridate has a feature lubridate::interval()

that might be helpful here.



All Articles