Down sampling time series data in dplyr from Postgres DB
I am trying to use dplyr to query data from psql. My details are in the following format.
date name value
10-12-2012 10:01:00 var1 100
10-12-2012 10:55:00 var1 200
10-12-2012 11:01:00 var1 150
10-12-2012 11:50:00 var1 100
10-12-2012 12:11:00 var1 50
10-12-2012 12:40:00 var1 150
10-12-2012 10:01:00 var2 10
10-12-2012 10:33:00 var2 20
10-12-2012 11:04:00 var2 15
10-12-2012 11:45:00 var2 25
10-12-2012 12:02:00 var2 10
10-12-2012 12:55:00 var2 8
I want to aggregate data hourly (or predefined sample time) and want the mean of the "value" column to be for that hour.
Desired output:
date name value
10-12-2012 10:00:00 var1 150
10-12-2012 11:00:00 var1 125
10-12-2012 12:00:00 var1 100
10-12-2012 10:00:00 var2 15
10-12-2012 11:00:00 var2 20
10-12-2012 12:00:00 var2 9
I can do it in postgres directly using this query:
"SELECT date_trunc('hour', date), name, mean_arr(array_agg(cast(value as double precision))) FROM TABLENAME WHERE name IN ("var1","var2")
I would like to know if we can do the same using dplyr.
I am currently connecting to the database using this command in dplyr:
my_db <- rc_postgres(dbname="DBNAME",host="HOST",port="PORT",user="USER",password="PASSWD")
tbl_df <- tbl(my_db, "TABLENAME")
The raw request is as follows:
dataOut <- data.frame(tbl_df%>% select(date,name,value) %>% filter(name %in% c('var1','var2')
Please note that I have not aggregated the data currently, which I want to do.
Let me know if anyone has an answer. thanks Pradeep
+3
source to share
2 answers
This won't work directly with your SQL table. But here's what I will do:
library(tidyr)
library(dplyr)
df <- tbl_df %>%
separate(date, into = c("date", "time"), sep = " ") %>%
separate(time, into = c("hour", "minute", "seccond"), sep = ":") %>%
group_by(date, hour, name) %>%
summarise(mean(value)) %>%
ungroup() %>%
arrange(name) %>%
collect()
#> df
#Source: local data frame [6 x 4]
#
# date hour name mean(value)
#1 10-12-2012 10 var1 150
#2 10-12-2012 11 var1 125
#3 10-12-2012 12 var1 100
#4 10-12-2012 10 var2 15
#5 10-12-2012 11 var2 20
#6 10-12-2012 12 var2 9
+5
source to share