R separates intervals in hourly slots
probably very easy, but tried my best to find answers on the internet, but they usually refer to cut
snapshots as well, not overlapping intervals
require(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00",
"2017-04-18 21:05:00", "2017-04-18 16:05:00"),
end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))
we have 4 observations and I need to highlight them in their respective hourly windows.
start end
1: 2017-04-18 18:05:00 2017-04-18 19:05:00
2: 2017-04-18 18:00:00 2017-04-18 21:30:00
3: 2017-04-18 21:05:00 2017-04-18 22:00:00
4: 2017-04-18 16:05:00 2017-04-18 16:10:00
the first one, for example, will have 55 minutes at 18:00 slot and 5 minutes at 19:00 slot, the next 60 minutes at 18: 00.19: 00, 20:00 and 30 minutes at 21:00, the third - 55 minutes at 21:00 and the last 5 minutes at 16:00.
the result should be as shown below (sorry if I misunderstood the basic instructions);
interval Q
1: 2017-04-18 16:00:00 5
2: 2017-04-18 17:00:00 0
3: 2017-04-18 18:00:00 115
4: 2017-04-18 19:00:00 65
5: 2017-04-18 20:00:00 120
6: 2017-04-18 21:00:00 85
of course there is a direct way to shorten the burst in minutes and do the count by cut / spacing, but I believe the problem is so common that it should have a direct method. It is preferable that I also have windows with 0-digit values, but I can just order them if required
source to share
This is a solution using dplyr
First, a helper function is created find_slots
to generate all hours between start
and end
. Then the values ββare calculated Q
.
Finally, the data is summarized by grouping each slot.
library(dplyr)
find_slots <- function(a, b){
slots = seq(a-minute(a)*60-second(a),
b-minute(b)*60-second(b),
"hour")
dateseq = slots
dateseq[1] = a
r = c(dateseq, b)
d = as.numeric(difftime(r[-1], r[-length(r)], unit = 'min'))
data.frame(slot = slots, Q = d)
}
x %>%
rowwise %>%
do(find_slots(.$start, .$end)) %>%
ungroup() %>%
group_by(slot) %>%
summarize(Q = sum(Q))
Result (no value 0 for 17:00):
slot Q
1 2017-04-18 16:00:00 5
2 2017-04-18 18:00:00 115
3 2017-04-18 19:00:00 65
4 2017-04-18 20:00:00 60
5 2017-04-18 21:00:00 85
6 2017-04-18 22:00:00 0
Edit: using data.table
(Possibly faster, but I'm not very good at data.table)
A library is also used fasttime
to speed up time parsing.
library(fasttime)
library(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00",
"2017-04-18 21:05:00", "2017-04-18 16:05:00"),
end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))
find_slots2 <- function(a, b){
a = fasttime::fastPOSIXct(a)
b = fasttime::fastPOSIXct(b)
slots = seq(a-data.table::minute(a)*60-data.table::second(a)*60,
b-data.table::minute(b)*60-data.table::second(b)*60,
"hour")
hourseq = c(a, slots[-1], b)
d = difftime(hourseq[-1], hourseq[-length(hourseq)], unit = 'min')
list(slot = slots, Q = d)
}
x[, find_slots2(start, end), by = 1:nrow(x)][order(slot), .(Q = as.numeric(sum(Q))), by = slot]
source to share