How to optimize a query requiring time normalization
I have the following data source that has multiple physical values ββ(one per column) coming from multiple devices at different times:
+-----------+------------+---------+-------+
| id_device | timestamp | Vln1 | kWl1 |
+-----------+------------+---------+-------+
| 123 | 1495696500 | | |
| 122 | 1495696800 | | |
| 122 | 1495697100 | 230 | 5.748 |
| 122 | 1495697100 | 230 | 5.185 |
| 124 | 1495700100 | 226.119 | 0.294 |
| 122 | 1495713900 | 230 | |
| 122 | 1495716000 | | |
| 122 | 1495716300 | 230 | |
| 122 | 1495716300 | | |
| 122 | 1495716300 | | |
| 122 | 1495716600 | 230 | 4.606 |
| 122 | 1495716600 | | |
| 124 | 1495739100 | | |
| 123 | 1495739400 | | |
+-----------+------------+---------+-------+
timestamp
(unfortunately) bigint
and each device sends data at different times and at different rates: some of the devices press every 5 minutes, others every 10 minutes, others every 15 minutes. Physical values ββcan be NULL
.
The front interface needs to display graphs - say line charts - of a specific time stamp, with timestamps every minute. Custom ticks are selected by the user. Graphs can be made from multiple physical values ββof multiple devices, and each line represents an independent request made to the backend.
Let's consider the case when:
- selected time - 10 minutes
- selected two lines for construction, having two different physical values ββ(columns) on two different devices:
- The device presses every 5 minutes
- Others every 10 minutes
What the front-end app expects are normalized results:
<timestamp>, <value>
Where
-
timestamp
represents the rounded time (00:00, 00:10, 00:20, etc.). - in case there is more than one in each "time field"
value
(for example: there will be 2 values ββfor a device pressing every 5 minutes during 00:00 and 00:10), one value will be returned, which is the aggregated value (AVG).
For this, I have created some plpgsql functions that help me, but I am not sure if what I am doing is the best from a performance standpoint.
Basically what I am doing:
- Get data for a specific device and a physical measure at a selected time interval
- Normalize Returned Data: Each timestamp is rounded to the selected time (i.e. 10:12:23 β 10:10:00). Thus, each tuple will represent a value in the "time bucket"
- Create
range
temporary buckets, according to the selected time, mark the selected by the user -
JOIN
data with a normalized time stamp with a range. Aggregate in case of multiple values ββwithin the same range.
Here are my functions:
create or replace function app_iso50k1.blkGetTimeSelParams(
t_end bigint,
t_granularity integer,
t_span bigint,
OUT delta_time_bucket interval,
OUT b_timebox timestamp,
OUT e_timebox timestamp)
as
$$
DECLARE
delta_time interval;
BEGIN
/* normalization: no minutes */
t_end = extract('epoch' from date_trunc('minute', (to_timestamp(t_end) at time zone 'UTC')::timestamp));
delta_time = app_iso50k1.blkGetDeltaTimeBucket(t_end, t_granularity);
e_timebox = date_trunc('minute', (to_timestamp(t_end - extract('epoch' from delta_time)) at time zone 'UTC'))::timestamp;
b_timebox = (to_timestamp(extract('epoch' from e_timebox) - t_span) at time zone 'UTC')::timestamp;
delta_time_bucket = delta_time;
END
$$ immutable language 'plpgsql' security invoker;
create or replace function app_iso50k1.getPhyMetData(
tablename character varying,
t_span bigint,
t_end bigint,
t_granularity integer,
idinstrum integer,
id_device integer,
varname character varying,
op character varying,
page_size int,
page int)
RETURNS TABLE(times bigint , val double precision) as
$$
DECLARE
series REFCURSOR;
serie RECORD;
first_notnull bool = false;
prev_val double precision;
time_params record;
q_offset int;
BEGIN
time_params = app_iso50k1.blkGetTimeSelParams(t_end, t_granularity, t_span);
if(page = 1) then
q_offset = 0;
else
q_offset = page_size * (page -1);
end if;
if not public.blkIftableexists('resgetphymetdata')
THEN
create temporary table resgetphymetdata (times bigint, val double precision);
ELSE
truncate table resgetphymetdata;
END IF;
execute format($ff$
insert into resgetphymetdata (
/* generate every possible range between these dates */
with ranges as (
select generate_series($1, $2, interval '$5 minutes') as range_start
),
/* normalize your data to which <t_granularity>-minute interval it belongs to */
rounded_hst as (
select
date_trunc ('minutes', (to_timestamp("timestamp") at time zone 'UTC')::timestamp)::timestamp -
mod (extract ('minutes' from ((to_timestamp("timestamp") at time zone 'UTC')::timestamp))::int, $5) * interval '1 minute' as round_time,
*
from public.%I
where
idinstrum = $3 and
id_device = $4 and
timestamp <= $8
)
select
extract('epoch' from r.range_start)::bigint AS times,
%s (hd.%I) AS val
from
ranges r
left join rounded_hst hd on r.range_start = hd.round_time
group by
r.range_start
order by
r.range_start
LIMIT $6 OFFSET $7
);
$ff$, tablename, op, varname) using time_params.b_timebox, time_params.e_timebox, idinstrum, id_device, t_granularity, page_size, q_offset, t_end;
/* data cleansing: val holes between not-null values are filled with the previous value */
open series no scroll for select * from resgetphymetdata;
loop
fetch series into serie;
exit when not found;
if NOT first_notnull then
if serie.val NOTNULL then
first_notnull = true;
prev_val = serie.val;
end if;
else
if serie.val is NULL then
update resgetphymetdata
set val = prev_val
where current of series;
else
prev_val = serie.val;
end if;
end if;
end loop;
close series;
return query select * from resgetphymetdata;
END;
$$ volatile language 'plpgsql' security invoker;
Do you see any good alternatives to what I have coded? Is there room for improvement? Thank!
source to share
No one has answered this question yet
Check out similar questions: