R separates intervals in hourly slots

probably very easy, but tried my best to find answers on the internet, but they usually refer to cut

snapshots as well, not overlapping intervals

require(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))

      

we have 4 observations and I need to highlight them in their respective hourly windows.

                 start                 end
1: 2017-04-18 18:05:00 2017-04-18 19:05:00
2: 2017-04-18 18:00:00 2017-04-18 21:30:00
3: 2017-04-18 21:05:00 2017-04-18 22:00:00
4: 2017-04-18 16:05:00 2017-04-18 16:10:00

      

the first one, for example, will have 55 minutes at 18:00 slot and 5 minutes at 19:00 slot, the next 60 minutes at 18: 00.19: 00, 20:00 and 30 minutes at 21:00, the third - 55 minutes at 21:00 and the last 5 minutes at 16:00.

the result should be as shown below (sorry if I misunderstood the basic instructions);

              interval   Q
1: 2017-04-18 16:00:00 5
2: 2017-04-18 17:00:00 0
3: 2017-04-18 18:00:00 115
4: 2017-04-18 19:00:00 65
5: 2017-04-18 20:00:00 120
6: 2017-04-18 21:00:00  85

      

of course there is a direct way to shorten the burst in minutes and do the count by cut / spacing, but I believe the problem is so common that it should have a direct method. It is preferable that I also have windows with 0-digit values, but I can just order them if required

+3


source to share


2 answers


This is a solution using dplyr

First, a helper function is created find_slots

to generate all hours between start

and end

. Then the values ​​are calculated Q

.

Finally, the data is summarized by grouping each slot.

library(dplyr)

find_slots <- function(a, b){
    slots = seq(a-minute(a)*60-second(a),
                b-minute(b)*60-second(b),
                "hour")

    dateseq = slots
    dateseq[1] = a
    r = c(dateseq, b)

    d = as.numeric(difftime(r[-1], r[-length(r)], unit = 'min'))

    data.frame(slot = slots, Q = d)
}

x %>%
    rowwise %>%
    do(find_slots(.$start, .$end)) %>%
    ungroup() %>%
    group_by(slot) %>%
    summarize(Q = sum(Q))

      

Result (no value 0 for 17:00):



                 slot   Q
1 2017-04-18 16:00:00   5
2 2017-04-18 18:00:00 115
3 2017-04-18 19:00:00  65
4 2017-04-18 20:00:00  60
5 2017-04-18 21:00:00  85
6 2017-04-18 22:00:00   0

      

Edit: using data.table

(Possibly faster, but I'm not very good at data.table)

A library is also used fasttime

to speed up time parsing.

library(fasttime)
library(data.table)

x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))

find_slots2 <- function(a, b){
    a = fasttime::fastPOSIXct(a)
    b = fasttime::fastPOSIXct(b)
    slots = seq(a-data.table::minute(a)*60-data.table::second(a)*60,
                b-data.table::minute(b)*60-data.table::second(b)*60,
                "hour")

    hourseq = c(a, slots[-1], b)

    d = difftime(hourseq[-1], hourseq[-length(hourseq)], unit = 'min')

    list(slot = slots, Q = d)
}

x[, find_slots2(start, end), by = 1:nrow(x)][order(slot), .(Q = as.numeric(sum(Q))), by = slot]

      

+3


source


Lubridate has a feature lubridate::interval()

that might be helpful here.



+1


source







All Articles