Postgres scheduling takes a disproportionate amount of time to complete

postgres 9.6 runs on amazon RDS.

I have 2 tables:

  • aggregated events - large table with 6 keys (ids)
  • campaign metadata - a small table with a campaign definition.

I join 2 to filter metadata like campaign name.

The request is to get a breakdown report by campaign channel and date (date daily).

No FK, not null. The reporting table has multiple rows per day for each campaign (since the aggregation is based on 6 attribute keys).

When I join, the query plan increases to 10s (versus 300ms)

explain analyze select c.campaign_channel as channel,date as day , sum( displayed )  as displayed
from report_campaigns c
left join events_daily r on r.campaign_id = c.c_id
where  provider_id = 7726 and c.p_id = 7726 and c.campaign_name <> 'test'
and date >= '20170513 12:00' and date <= '20170515 12:00'
group by c.campaign_channel,date;
                                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=71461.93..71466.51 rows=229 width=22) (actual time=104.189..114.788 rows=6 loops=1)
   Group Key: c.campaign_channel, r.date
   ->  Sort  (cost=71461.93..71462.51 rows=229 width=18) (actual time=100.263..106.402 rows=31205 loops=1)
         Sort Key: c.campaign_channel, r.date
         Sort Method: quicksort  Memory: 3206kB
         ->  Hash Join  (cost=1092.52..71452.96 rows=229 width=18) (actual time=22.149..86.955 rows=31205 loops=1)
               Hash Cond: (r.campaign_id = c.c_id)
               ->  Append  (cost=0.00..70245.84 rows=29948 width=20) (actual time=21.318..71.315 rows=31205 loops=1)
                     ->  Seq Scan on events_daily r  (cost=0.00..0.00 rows=1 width=20) (actual time=0.005..0.005 rows=0 loops=1)
                           Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone) AND (provider_id =
                     ->  Bitmap Heap Scan on events_daily_20170513 r_1  (cost=685.36..23913.63 rows=1 width=20) (actual time=17.230..17.230 rows=0 loops=1)
                           Recheck Cond: (provider_id = 7726)
                           Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
                           Rows Removed by Filter: 13769
                           Heap Blocks: exact=10276
                           ->  Bitmap Index Scan on events_daily_20170513_full_idx  (cost=0.00..685.36 rows=14525 width=0) (actual time=2.356..2.356 rows=13769 loops=1)
                                 Index Cond: (provider_id = 7726)
                     ->  Bitmap Heap Scan on events_daily_20170514 r_2  (cost=689.08..22203.52 rows=14537 width=20) (actual time=4.082..21.389 rows=15281 loops=1)
                           Recheck Cond: (provider_id = 7726)
                           Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
                           Heap Blocks: exact=10490
                           ->  Bitmap Index Scan on events_daily_20170514_full_idx  (cost=0.00..685.45 rows=14537 width=0) (actual time=2.428..2.428 rows=15281 loops=1)
                                 Index Cond: (provider_id = 7726)
                     ->  Bitmap Heap Scan on events_daily_20170515 r_3  (cost=731.84..24128.69 rows=15409 width=20) (actual time=4.297..22.662 rows=15924 loops=1)
                           Recheck Cond: (provider_id = 7726)
                           Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
                           Heap Blocks: exact=11318
                           ->  Bitmap Index Scan on events_daily_20170515_full_idx  (cost=0.00..727.99 rows=15409 width=0) (actual time=2.506..2.506 rows=15924 loops=1)
                                 Index Cond: (provider_id = 7726)
               ->  Hash  (cost=1085.35..1085.35 rows=574 width=14) (actual time=0.815..0.815 rows=582 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 37kB
                     ->  Bitmap Heap Scan on report_campaigns c  (cost=12.76..1085.35 rows=574 width=14) (actual time=0.090..0.627 rows=582 loops=1)
                           Recheck Cond: (p_id = 7726)
                           Filter: ((campaign_name)::text <> 'test'::text)
                           Heap Blocks: exact=240
                           ->  Bitmap Index Scan on report_campaigns_provider_id  (cost=0.00..12.62 rows=577 width=0) (actual time=0.062..0.062 rows=582 loops=1)
                                 Index Cond: (p_id = 7726)
 Planning time: 9651.605 ms
 Execution time: 115.092 ms


result:
 channel  |         day         | displayed
----------+---------------------+-----------
 Pin      | 2017-05-14 00:00:00 |   43434
 Pin      | 2017-05-15 00:00:00 |   3325325235

      

0


source to share


1 answer


It seems to me that it has something to do with the summation leading to pre-computation before left joining.

The solution might be to overlay the filtering WHERE clauses in two nested sub-SELECTs before concatenating and summing on the left.



Hope this works:

SELECT channel, day, sum( displayed )
FROM
(SELECT campaign_channel AS channel, date AS day, displayed, p_id AS c_id
 FROM report_campaigns WHERE p_id = 7726 AND campaign_name <> 'test' AND date >= '20170513 12:00' AND date <= '20170515 12:00') AS c,
(SELECT * FROM events_daily WHERE campaign_id = 7726) AS r
LEFT JOIN r.campaign_id = c.c_id
GROUP BY channel, day;

      

0


source







All Articles