How to optimize a query requiring time normalization

I have the following data source that has multiple physical values ​​(one per column) coming from multiple devices at different times:

+-----------+------------+---------+-------+
| id_device | timestamp  |  Vln1   | kWl1  |
+-----------+------------+---------+-------+
|       123 | 1495696500 |         |       |
|       122 | 1495696800 |         |       |
|       122 | 1495697100 | 230     | 5.748 |
|       122 | 1495697100 | 230     | 5.185 |
|       124 | 1495700100 | 226.119 | 0.294 |
|       122 | 1495713900 | 230     |       |
|       122 | 1495716000 |         |       |
|       122 | 1495716300 | 230     |       |
|       122 | 1495716300 |         |       |
|       122 | 1495716300 |         |       |
|       122 | 1495716600 | 230     | 4.606 |
|       122 | 1495716600 |         |       |
|       124 | 1495739100 |         |       |
|       123 | 1495739400 |         |       |
+-----------+------------+---------+-------+

      

timestamp

(unfortunately) bigint

and each device sends data at different times and at different rates: some of the devices press every 5 minutes, others every 10 minutes, others every 15 minutes. Physical values ​​can be NULL

.

The front interface needs to display graphs - say line charts - of a specific time stamp, with timestamps every minute. Custom ticks are selected by the user. Graphs can be made from multiple physical values ​​of multiple devices, and each line represents an independent request made to the backend.

Let's consider the case when:

  • selected time - 10 minutes
  • selected two lines for construction, having two different physical values ​​(columns) on two different devices:
    • The device presses every 5 minutes
    • Others every 10 minutes

What the front-end app expects are normalized results:

<timestamp>, <value>

      

Where

  • timestamp

    represents the rounded time (00:00, 00:10, 00:20, etc.).
  • in case there is more than one in each "time field" value

    (for example: there will be 2 values ​​for a device pressing every 5 minutes during 00:00 and 00:10), one value will be returned, which is the aggregated value (AVG).

For this, I have created some plpgsql functions that help me, but I am not sure if what I am doing is the best from a performance standpoint.

Basically what I am doing:

  • Get data for a specific device and a physical measure at a selected time interval
  • Normalize Returned Data: Each timestamp is rounded to the selected time (i.e. 10:12:23 β†’ 10:10:00). Thus, each tuple will represent a value in the "time bucket"
  • Create range

    temporary buckets, according to the selected time, mark the selected by the user
  • JOIN

    data with a normalized time stamp with a range. Aggregate in case of multiple values ​​within the same range.

Here are my functions:

create  or replace function app_iso50k1.blkGetTimeSelParams(
      t_end bigint,
      t_granularity integer,
      t_span bigint,
  OUT delta_time_bucket interval,
  OUT b_timebox timestamp,
  OUT e_timebox timestamp)
as
$$
DECLARE
  delta_time interval;
BEGIN
  /* normalization: no minutes */
  t_end = extract('epoch' from date_trunc('minute', (to_timestamp(t_end) at time zone 'UTC')::timestamp));

  delta_time =  app_iso50k1.blkGetDeltaTimeBucket(t_end, t_granularity);
  e_timebox = date_trunc('minute', (to_timestamp(t_end - extract('epoch' from delta_time)) at time zone 'UTC'))::timestamp;
  b_timebox = (to_timestamp(extract('epoch' from e_timebox) - t_span) at time zone 'UTC')::timestamp;

  delta_time_bucket = delta_time;
END
$$ immutable language 'plpgsql' security invoker;


create or replace function app_iso50k1.getPhyMetData(
  tablename character varying,
  t_span bigint,
  t_end bigint,
  t_granularity integer,
  idinstrum integer,
  id_device integer,
  varname character varying,
  op character varying,
  page_size int,
  page int)
  RETURNS TABLE(times bigint , val double precision) as
$$
DECLARE
  series REFCURSOR;
  serie RECORD;
  first_notnull bool = false;
  prev_val double precision;
  time_params record;
  q_offset int;
BEGIN
  time_params = app_iso50k1.blkGetTimeSelParams(t_end, t_granularity, t_span);
  if(page = 1) then
    q_offset = 0;
  else
    q_offset = page_size * (page -1);
  end if;

  if not public.blkIftableexists('resgetphymetdata')
  THEN
    create temporary table resgetphymetdata (times bigint, val double precision);
  ELSE
    truncate table resgetphymetdata;
  END IF;

  execute format($ff$
  insert into resgetphymetdata (
    /* generate every possible range between these dates */
    with ranges as (
        select generate_series($1, $2, interval '$5 minutes') as range_start
    ),
      /* normalize your data to which <t_granularity>-minute interval it belongs to */
    rounded_hst as (
      select
        date_trunc ('minutes', (to_timestamp("timestamp") at time zone 'UTC')::timestamp)::timestamp -
        mod (extract ('minutes' from ((to_timestamp("timestamp") at time zone 'UTC')::timestamp))::int, $5) * interval '1 minute' as round_time,
        *
      from public.%I
      where
        idinstrum = $3 and
        id_device = $4 and
        timestamp <= $8
    )
    select
      extract('epoch' from r.range_start)::bigint AS times,
      %s (hd.%I) AS val
    from
      ranges r
      left join rounded_hst hd on r.range_start = hd.round_time
    group by
      r.range_start
    order by
      r.range_start
    LIMIT $6 OFFSET $7
  );
  $ff$, tablename, op, varname) using time_params.b_timebox, time_params.e_timebox, idinstrum, id_device, t_granularity, page_size, q_offset, t_end;

  /* data cleansing: val holes between not-null values are filled with the previous value */
  open series no scroll for select * from resgetphymetdata;
  loop
    fetch series into serie;
    exit when not found;

    if NOT first_notnull then
      if serie.val NOTNULL then
        first_notnull = true;
        prev_val = serie.val;
      end if;
    else
      if serie.val is NULL then
        update resgetphymetdata
        set val = prev_val
        where current of series;
      else
        prev_val = serie.val;
      end if;
    end if;
  end loop;
  close series;

  return query select * from resgetphymetdata;
END;
$$ volatile language 'plpgsql' security invoker;

      

Do you see any good alternatives to what I have coded? Is there room for improvement? Thank!

+3


source to share





All Articles