Request slows down if remote connection is dropped

Question

Request slows down if remote connection is dropped

I have a problem with a "problem" that I cannot figure out how to solve it. I have a query that runs after about 15 seconds, when I then add a connection (which is not used in the select), the query does indeed speed up to about 4 seconds, even if you didn't select anything from the additional connection. I suppose SQL Server chooses a different execution plan that is faster - but how do I get it to select the fastest query plan every time?

This request takes about 15 seconds:

SELECT
    O.Lvl1_Business_Area_Cd
   ,O.Lvl1_Business_Area_Nm
   ,O.Lvl2_Division_Cd
   ,O.Lvl2_Division_Nm
   ,SUM(F.Economic_Capital) AS Economic_Capital
FROM
    Facts.Financials AS F
    LEFT JOIN Dimensions.Customer AS C ON F.Customer_Id = C.Customer_Id
    LEFT JOIN Dimensions.Organization AS O ON C.CRU_Id = O.CRU_Id
WHERE 
    F.Year_Month_Id = 201706
    AND Lvl1_Business_Area_Cd = 6008000
GROUP BY
    O.Lvl1_Business_Area_Cd
   ,O.Lvl1_Business_Area_Nm
   ,O.Lvl2_Division_Cd
   ,O.Lvl2_Division_Nm

This request takes about 4 seconds:

SELECT
    O.Lvl1_Business_Area_Cd
   ,O.Lvl1_Business_Area_Nm
   ,O.Lvl2_Division_Cd
   ,O.Lvl2_Division_Nm
   ,SUM(F.Economic_Capital) AS Economic_Capital
FROM
    Facts.Financials AS F
    LEFT JOIN Dimensions.Customer AS C ON F.Customer_Id = C.Customer_Id
    LEFT JOIN Dimensions.Organization AS O ON C.CRU_Id = O.CRU_Id
    LEFT JOIN Dimensions.Nace AS N ON C.NACE_Id = N.NACE_Id
WHERE 
    F.Year_Month_Id = 201706
    AND Lvl1_Business_Area_Cd = 6008000
GROUP BY
    O.Lvl1_Business_Area_Cd
   ,O.Lvl1_Business_Area_Nm
   ,O.Lvl2_Division_Cd
   ,O.Lvl2_Division_Nm

The only difference between the two requests is LEFT JOIN Dimensions.Nace AS N ON C.NACE_Id = N.NACE_Id

. However, nothing from this table is used in the select statement.

Facts. Financial performance is ~ 60 million rows, sizes. Customer ~ 17 million rows, Dimensions.Organization ~ 25.000, Dimensions.Nace ~ 1.000 rows.

Customer_Id = bigint
CRU_Id = bigint
Nace_id = varchar(4)

I have the following definitions in tables:

Facts.Financials: Clustered Index (YearMonth, Customer_Id), Non-Clustered (Customer_Id), Non-clustered columnstore index (Economic_Capital)
Dimensions.Customer: Clustered (Customer_Id, CRU_Id), Non-clustered (CRU_Id), Non-clustered (Nace_Id)
Dimensions.Organization: Clustered (CRU_Id), Non-clustered (Lvl1_Cd, Lvl2_Cd, Lvl3_Cd) Include (Lvl1_Nm, Lvl2_Nm, Lvl3_Nm, Lvl4_Cd, Lvl4_Nm, CRU_Id, CRU_Name)
Dimensions.Nace: Clustered (Nace_Id)

This is the execution plan for a slow query (15 seconds)

Slow XML Execution Plan: Slow Execution Plan

This is a fast execution plan (4 seconds)

XML Fast Execution Plan : Fast Execution Plan

Can anyone point me in the right direction for what I am not seeing? I have the wrong index or how can this happen?

I am running SQL Server 2014

+3

sql-server tsql

ssn 07 Aug 17 at 14:49

source to share

3 answers

However, nothing from this table is used in the select statement.

But you use it in join.so sql server optimizer will choose a different plan, some of the effects of joins are explained by Paul White here: Join 100 tables

So, although you don't use it in the selection, the connection can have different side effects

It can add additional columns (from the concatenated table)
    It can add additional rows (the concatenated table can match the source row more than once)
    It can delete rows (the concatenated table may not match)
    It can enter NULL (for RIGHT or FULL JOIN)

So, if your compound doesn't add any of the above side effects, you might end up with a plan similar to the other.

+1

TheGameiswar 07 Aug 17 at 16:03

source to share

I'm going to do it in reverse perspective. You are looking for companies first and totals for the respective year / month. Since the "Lvl1_Business_Area_Cd" field comes from your organization table and is in the WHERE clause, it forces your query from LEFT JOIN to INNER JOIN. Likewise, the customer table will thus be a mandatory INNER CONNECTION to the financial indicators.

Now I would also provide an index on that field AND CRU_ID as it is the basis for attaching to clients ... so

create index Lvl1CruID on Organization ( Lvl1_Business_Area_Cd, CRU_ID )

Likewise for a quick connection between client and finance

create index CruID_CustomerID on Customer ( Cru_ID, Customer_ID )

This way, the engine does not have to go to raw data pages to get the aggregation criteria from organization to financials through each record in the customer table.

Finally, your index of the financials table for the year / month and the client MUST be good. I moved criteria to JOIN instead of WHERE.

SELECT
      O.Lvl1_Business_Area_Cd,
      O.Lvl1_Business_Area_Nm,
      O.Lvl2_Division_Cd,
      O.Lvl2_Division_Nm,
      SUM(F.Economic_Capital) Economic_Capital
   FROM
      Dimensions.Organization O 
         JOIN Dimensions.Customer C 
            ON O.CRU_Id = C.CRU_Id
               JOIN Facts.Financials F
                  ON F.Year_Month_Id = 201706
                 AND C.Customer_Id = F.Customer_Id 
   WHERE 
      O.Lvl1_Business_Area_Cd = 6008000
   GROUP BY
      O.Lvl1_Business_Area_Cd,
      O.Lvl1_Business_Area_Nm,
      O.Lvl2_Division_Cd,
      O.Lvl2_Division_Nm

Good indexes and a better understanding of how they CAN work are critical, especially moving between multiple tables to get your underlying data.

Now, if you want all organizations, even if they do not have any financial activity, you can simply change the FINANCIAL join to the LEFT JOIN.

After trying the preliminary aggregate approach, it might be better.

SELECT
      O.Lvl1_Business_Area_Cd,
      O.Lvl1_Business_Area_Nm,
      O.Lvl2_Division_Cd,
      O.Lvl2_Division_Nm,
      PreQuery.Economic_Capital
   FROM
      Dimensions.Organization O 
         JOIN Dimensions.Customer C 
            ON O.CRU_Id = C.CRU_Id
               JOIN 
               ( select F.Customer_ID, 
                        SUM(F.Economic_Capital) Economic_Capital
                    from
                       Facts.Financials F
                    where 
                       F.Year_Month_Id = 201706
                    group by
                       F.Customer_ID ) PreQuery
                 AND C.Customer_Id = PreQuery.Customer_Id 
   WHERE 
      O.Lvl1_Business_Area_Cd = 6008000

0

DRapp 07 Aug 17 at 16:42

source to share

ssn · Accepted Answer · 2017-08-10T11:34:23+0000

A day or two later, SQL Server rolled out a new query plan that makes them nearly equally fast. I still don't know why.

Then I switched to a clustered columstore index on my main table, which gave me even faster response times for my queries.

Request slows down if remote connection is dropped

More articles: