Request slows down if remote connection is dropped
I have a problem with a "problem" that I cannot figure out how to solve it. I have a query that runs after about 15 seconds, when I then add a connection (which is not used in the select), the query does indeed speed up to about 4 seconds, even if you didn't select anything from the additional connection. I suppose SQL Server chooses a different execution plan that is faster - but how do I get it to select the fastest query plan every time?
This request takes about 15 seconds:
SELECT
O.Lvl1_Business_Area_Cd
,O.Lvl1_Business_Area_Nm
,O.Lvl2_Division_Cd
,O.Lvl2_Division_Nm
,SUM(F.Economic_Capital) AS Economic_Capital
FROM
Facts.Financials AS F
LEFT JOIN Dimensions.Customer AS C ON F.Customer_Id = C.Customer_Id
LEFT JOIN Dimensions.Organization AS O ON C.CRU_Id = O.CRU_Id
WHERE
F.Year_Month_Id = 201706
AND Lvl1_Business_Area_Cd = 6008000
GROUP BY
O.Lvl1_Business_Area_Cd
,O.Lvl1_Business_Area_Nm
,O.Lvl2_Division_Cd
,O.Lvl2_Division_Nm
This request takes about 4 seconds:
SELECT
O.Lvl1_Business_Area_Cd
,O.Lvl1_Business_Area_Nm
,O.Lvl2_Division_Cd
,O.Lvl2_Division_Nm
,SUM(F.Economic_Capital) AS Economic_Capital
FROM
Facts.Financials AS F
LEFT JOIN Dimensions.Customer AS C ON F.Customer_Id = C.Customer_Id
LEFT JOIN Dimensions.Organization AS O ON C.CRU_Id = O.CRU_Id
LEFT JOIN Dimensions.Nace AS N ON C.NACE_Id = N.NACE_Id
WHERE
F.Year_Month_Id = 201706
AND Lvl1_Business_Area_Cd = 6008000
GROUP BY
O.Lvl1_Business_Area_Cd
,O.Lvl1_Business_Area_Nm
,O.Lvl2_Division_Cd
,O.Lvl2_Division_Nm
The only difference between the two requests is LEFT JOIN Dimensions.Nace AS N ON C.NACE_Id = N.NACE_Id
. However, nothing from this table is used in the select statement.
Facts. Financial performance is ~ 60 million rows, sizes. Customer ~ 17 million rows, Dimensions.Organization ~ 25.000, Dimensions.Nace ~ 1.000 rows.
Customer_Id = bigint
CRU_Id = bigint
Nace_id = varchar(4)
I have the following definitions in tables:
Facts.Financials: Clustered Index (YearMonth, Customer_Id), Non-Clustered (Customer_Id), Non-clustered columnstore index (Economic_Capital)
Dimensions.Customer: Clustered (Customer_Id, CRU_Id), Non-clustered (CRU_Id), Non-clustered (Nace_Id)
Dimensions.Organization: Clustered (CRU_Id), Non-clustered (Lvl1_Cd, Lvl2_Cd, Lvl3_Cd) Include (Lvl1_Nm, Lvl2_Nm, Lvl3_Nm, Lvl4_Cd, Lvl4_Nm, CRU_Id, CRU_Name)
Dimensions.Nace: Clustered (Nace_Id)
This is the execution plan for a slow query (15 seconds)
Slow XML Execution Plan: Slow Execution Plan
This is a fast execution plan (4 seconds)
XML Fast Execution Plan : Fast Execution Plan
Can anyone point me in the right direction for what I am not seeing? I have the wrong index or how can this happen?
I am running SQL Server 2014
source to share
However, nothing from this table is used in the select statement.
But you use it in join.so sql server optimizer will choose a different plan, some of the effects of joins are explained by Paul White here: Join 100 tables
So, although you don't use it in the selection, the connection can have different side effects
It can add additional columns (from the concatenated table)
It can add additional rows (the concatenated table can match the source row more than once)
It can delete rows (the concatenated table may not match)
It can enter NULL (for RIGHT or FULL JOIN)
So, if your compound doesn't add any of the above side effects, you might end up with a plan similar to the other.
source to share
I'm going to do it in reverse perspective. You are looking for companies first and totals for the respective year / month. Since the "Lvl1_Business_Area_Cd" field comes from your organization table and is in the WHERE clause, it forces your query from LEFT JOIN to INNER JOIN. Likewise, the customer table will thus be a mandatory INNER CONNECTION to the financial indicators.
Now I would also provide an index on that field AND CRU_ID as it is the basis for attaching to clients ... so
create index Lvl1CruID on Organization ( Lvl1_Business_Area_Cd, CRU_ID )
Likewise for a quick connection between client and finance
create index CruID_CustomerID on Customer ( Cru_ID, Customer_ID )
This way, the engine does not have to go to raw data pages to get the aggregation criteria from organization to financials through each record in the customer table.
Finally, your index of the financials table for the year / month and the client MUST be good. I moved criteria to JOIN instead of WHERE.
SELECT
O.Lvl1_Business_Area_Cd,
O.Lvl1_Business_Area_Nm,
O.Lvl2_Division_Cd,
O.Lvl2_Division_Nm,
SUM(F.Economic_Capital) Economic_Capital
FROM
Dimensions.Organization O
JOIN Dimensions.Customer C
ON O.CRU_Id = C.CRU_Id
JOIN Facts.Financials F
ON F.Year_Month_Id = 201706
AND C.Customer_Id = F.Customer_Id
WHERE
O.Lvl1_Business_Area_Cd = 6008000
GROUP BY
O.Lvl1_Business_Area_Cd,
O.Lvl1_Business_Area_Nm,
O.Lvl2_Division_Cd,
O.Lvl2_Division_Nm
Good indexes and a better understanding of how they CAN work are critical, especially moving between multiple tables to get your underlying data.
Now, if you want all organizations, even if they do not have any financial activity, you can simply change the FINANCIAL join to the LEFT JOIN.
After trying the preliminary aggregate approach, it might be better.
SELECT
O.Lvl1_Business_Area_Cd,
O.Lvl1_Business_Area_Nm,
O.Lvl2_Division_Cd,
O.Lvl2_Division_Nm,
PreQuery.Economic_Capital
FROM
Dimensions.Organization O
JOIN Dimensions.Customer C
ON O.CRU_Id = C.CRU_Id
JOIN
( select F.Customer_ID,
SUM(F.Economic_Capital) Economic_Capital
from
Facts.Financials F
where
F.Year_Month_Id = 201706
group by
F.Customer_ID ) PreQuery
AND C.Customer_Id = PreQuery.Customer_Id
WHERE
O.Lvl1_Business_Area_Cd = 6008000
source to share