Postgres scheduling takes a disproportionate amount of time to complete
postgres 9.6 runs on amazon RDS.
I have 2 tables:
- aggregated events - large table with 6 keys (ids)
- campaign metadata - a small table with a campaign definition.
I join 2 to filter metadata like campaign name.
The request is to get a breakdown report by campaign channel and date (date daily).
No FK, not null. The reporting table has multiple rows per day for each campaign (since the aggregation is based on 6 attribute keys).
When I join, the query plan increases to 10s (versus 300ms)
explain analyze select c.campaign_channel as channel,date as day , sum( displayed ) as displayed
from report_campaigns c
left join events_daily r on r.campaign_id = c.c_id
where provider_id = 7726 and c.p_id = 7726 and c.campaign_name <> 'test'
and date >= '20170513 12:00' and date <= '20170515 12:00'
group by c.campaign_channel,date;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=71461.93..71466.51 rows=229 width=22) (actual time=104.189..114.788 rows=6 loops=1)
Group Key: c.campaign_channel, r.date
-> Sort (cost=71461.93..71462.51 rows=229 width=18) (actual time=100.263..106.402 rows=31205 loops=1)
Sort Key: c.campaign_channel, r.date
Sort Method: quicksort Memory: 3206kB
-> Hash Join (cost=1092.52..71452.96 rows=229 width=18) (actual time=22.149..86.955 rows=31205 loops=1)
Hash Cond: (r.campaign_id = c.c_id)
-> Append (cost=0.00..70245.84 rows=29948 width=20) (actual time=21.318..71.315 rows=31205 loops=1)
-> Seq Scan on events_daily r (cost=0.00..0.00 rows=1 width=20) (actual time=0.005..0.005 rows=0 loops=1)
Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone) AND (provider_id =
-> Bitmap Heap Scan on events_daily_20170513 r_1 (cost=685.36..23913.63 rows=1 width=20) (actual time=17.230..17.230 rows=0 loops=1)
Recheck Cond: (provider_id = 7726)
Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
Rows Removed by Filter: 13769
Heap Blocks: exact=10276
-> Bitmap Index Scan on events_daily_20170513_full_idx (cost=0.00..685.36 rows=14525 width=0) (actual time=2.356..2.356 rows=13769 loops=1)
Index Cond: (provider_id = 7726)
-> Bitmap Heap Scan on events_daily_20170514 r_2 (cost=689.08..22203.52 rows=14537 width=20) (actual time=4.082..21.389 rows=15281 loops=1)
Recheck Cond: (provider_id = 7726)
Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
Heap Blocks: exact=10490
-> Bitmap Index Scan on events_daily_20170514_full_idx (cost=0.00..685.45 rows=14537 width=0) (actual time=2.428..2.428 rows=15281 loops=1)
Index Cond: (provider_id = 7726)
-> Bitmap Heap Scan on events_daily_20170515 r_3 (cost=731.84..24128.69 rows=15409 width=20) (actual time=4.297..22.662 rows=15924 loops=1)
Recheck Cond: (provider_id = 7726)
Filter: ((date >= '2017-05-13 12:00:00'::timestamp without time zone) AND (date <= '2017-05-15 12:00:00'::timestamp without time zone))
Heap Blocks: exact=11318
-> Bitmap Index Scan on events_daily_20170515_full_idx (cost=0.00..727.99 rows=15409 width=0) (actual time=2.506..2.506 rows=15924 loops=1)
Index Cond: (provider_id = 7726)
-> Hash (cost=1085.35..1085.35 rows=574 width=14) (actual time=0.815..0.815 rows=582 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 37kB
-> Bitmap Heap Scan on report_campaigns c (cost=12.76..1085.35 rows=574 width=14) (actual time=0.090..0.627 rows=582 loops=1)
Recheck Cond: (p_id = 7726)
Filter: ((campaign_name)::text <> 'test'::text)
Heap Blocks: exact=240
-> Bitmap Index Scan on report_campaigns_provider_id (cost=0.00..12.62 rows=577 width=0) (actual time=0.062..0.062 rows=582 loops=1)
Index Cond: (p_id = 7726)
Planning time: 9651.605 ms
Execution time: 115.092 ms
result:
channel | day | displayed
----------+---------------------+-----------
Pin | 2017-05-14 00:00:00 | 43434
Pin | 2017-05-15 00:00:00 | 3325325235
0
source to share
1 answer
It seems to me that it has something to do with the summation leading to pre-computation before left joining.
The solution might be to overlay the filtering WHERE clauses in two nested sub-SELECTs before concatenating and summing on the left.
Hope this works:
SELECT channel, day, sum( displayed )
FROM
(SELECT campaign_channel AS channel, date AS day, displayed, p_id AS c_id
FROM report_campaigns WHERE p_id = 7726 AND campaign_name <> 'test' AND date >= '20170513 12:00' AND date <= '20170515 12:00') AS c,
(SELECT * FROM events_daily WHERE campaign_id = 7726) AS r
LEFT JOIN r.campaign_id = c.c_id
GROUP BY channel, day;
0
source to share