Query including subquery and group is slower than expected
The entire query below is incredibly slow.
The [alias Stage_1] subquery query takes only 1.37 minutes, returning 9514 records, but the entire query takes over 20 minutes, returning 2606 records.
I could use the #temp table to hold the subquery to improve performance, but I would rather not.
The overview of the query is that the WeeklySpace table is internally joined to the Spaceblock_Name_to_PG table on SpaceblockName_SID, this truncates the results in WeeklySpace and includes PG_Code with the results in WeeklySpace. WeeklySpace then Full Outer joins Sales_PG_Wk across 3 fields. The where clause focuses the results and is subject to change. The results of the subquery are then summarized. You cannot make the final sum in the subquery because of the group and the sum used.
I believe the problem is with recalculating the subquery in the group in the final total. The SpaceblockName_SID field also appears to be involved in triggering the problem, as without it, the runtime with the group in the subquery is not affected.
I have read through a lot of suggestions though trying to solve the problem.
These include:
- Adding TOP 2147483647 with an order to force an intermediate materialization, both in a subquery and using a CTE.
- Adding a connection after stage_1.
- Cast'ing SpaceblockName_SID from int to varchar and back
The execution plan (split in two, shown below the code) for both the subquery and the entire query looks similar. Cost around Full Outer Join (Hash Match) which I expected.
The query is executed on T-SQL 2005.
Any help is greatly appreciated!
select
Cost_centre
, Fin_week
, SpaceblockName_SID
, sum(Propor_rep_SRV) as Total_SpaceblockName_SID_SRV
from
(
select
coalesce(space_side.fin_week , sales_side.fin_week) as Fin_week
,coalesce(space_side.cost_centre , sales_side.cost_Centre) as Cost_centre
,space_side.SpaceblockName_SID
,case
when space_side.SpaceblockName_SID is null
then sales_side.SalesExVAT
else sum(space_side.TLM)
/nullif(sum (sum(space_side.TLM) ) over (partition by coalesce(space_side.fin_week , sales_side.fin_week)
, coalesce(space_side.cost_centre , sales_side.cost_Centre)
, coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)) ,0)*sales_side.SalesExVAT
end as Propor_rep_SRV
from
WeeklySpace as space_side
INNER JOIN
Spaceblock_Name_to_PG
ON space_side.SpaceblockName_SID = Spaceblock_Name_to_PG.SpaceblockName_SID
and Spaceblock_Name_to_PG.PG_Code < 10000
full outer join
sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
where
coalesce(space_side.fin_week, sales_side.fin_week) between 201538 and 201550
and
coalesce(space_side.cost_centre, sales_side.cost_Centre) in (3, 2800)
group by
coalesce(space_side.fin_week, sales_side.fin_week)
,coalesce(space_side.cost_centre, sales_side.cost_Centre)
,coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)
,sales_side.SalesExVAT
,space_side.SpaceblockName_SID
) as stage_1
group by
Cost_centre
, Fin_week
, SpaceblockName_SID
Left side execution plan
Right side execution plan
source to share
If you look at the logic, I think you split it in half with UNION One with Spaceblock_Name_to_PG.PG_Code <10000 and another with Spaceblock_Name_to_PG.PG_Code> = 10000
And consider this change
If you can make a bunch of connections that you are going to throw away anyway
full outer join sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
and space_side.fin_week between 201538 and 201550
and sales_side.fin_week between 201538 and 201550
and space_side.cost_centre in (3, 2800)
and sales_side.cost_Centre in (3, 2800)
source to share