How does SQL Server estimate the cost of an execution plan that contains a user-defined function?

I have a stored procedure that filters based on the result of a function DATEADD

. I understand this is similar to using UDFs because SQL Server cannot store statistics based on the output of this function it has a problem with estimating the cost of the execution plan.

The request looks something like this:

SELECT /* Columns */ FROM
TableA JOIN TableB
ON TableA.id = TableB.join_id
WHERE DATEADD(hour, TableB.HoursDifferent, TableA.StartDate) <= @Now

      

(So ​​it is not possible to pre-compute the result DATEADD

)

What I see is a horrible scary execution plan, which I think is due to the fact that SQL server incorrectly estimates the number of rows returned from a part of the tree as 1 when in fact its ~ 65,000. However I've seen the same stored procedure run in a fraction of the time when different (but not necessarily) data is present in the database.

My question is - in cases like this, how does the query optimizer evaluate the result of a function?

UPDATE: FYI, I'm more interested in understanding why at some point I get a good execution plan and why I don't do it the rest of the time - I already have a pretty good idea of ​​how I will fix this in the long run.

+2


source to share


3 answers


It helps to see the feature, but one thing I've seen, burying features like in queries can lead to poor performance. If you can evaluate some of them beforehand, you may be in better shape. For example, instead of

WHERE MyDate < GETDATE()

      

Try



DECLARE @Today DATETIME
SET @Today = GETDATE()
...
WHERE MyDate < @Today

      

it seems to work better

+1


source


This is not a calculation of the plan, the problem is here. The function on columns prevents SQL queries from being executed on the index. You will get an index scan or a table scan.

What I suggest is to see if one of the columns can be inferred from the function, basically see if you can move the function to the other side of the equality. This is not ideal, but it does mean that at least one column can be used for index lookups.

Something like this (rough idea, not tested) with an index on TableB.HoursDifference, then an index on the join column in TableA



DATEDIFF(hour, @Now, TableA.StartDate) >= TableB.HoursDifferent

      

On the calculation side, I suspect the optimizer will use 30% of the "thumb-suck" table because it cannot use statistics to get an accurate estimate and because this is an inequality. This means that 30 percent of the table will be returned by this predicate.

It is very difficult to say something for sure without seeing the execution plans. You mentioned an estimate of 1 line and an actual 65000. In some cases this is not a problem at all. http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

+3


source


@Kragen,

Short answer: If you're running queries with ten tables, get used to it . You need to learn all about query hints and a lot more tricks.

Long answer:

SQL Server usually generates excellent query plans for only three to five tables. Once you get out of this experience, you basically have to write the query plan yourself using all pointers and references. (Also, Scalar's features seem to be priced at Cost = Zero, which is just insane.)

The reason is that after that it is just too difficult. The query optimizer must decide what to do algorithmically , and there are too many possible combinations for even the brightest geniuses on the SQL Server team to create an algorithm that works truly universally.

They say the optimizer is smarter than you. This may be true. But you have one advantage. This is an advantage, if it doesn't work, you can throw it away and try again! At about sixth try, you should have something acceptable, even for connecting to ten tables, if you know the data. The query optimizer cannot do this, it has to come up with some kind of plan instantly, and it does not get second chances.

My favorite trick is to force the order of the where clause to convert it to a case statement. Instead:

WHERE
predicate1
AND predicate2
AND....

      

Use this:

WHERE
case 
when not predicate1 then 0
when not predicate2 then 0
when not .... then 0
else 1 end = 1

      

Order your predicates for the cheapest and most expensive, and you get a result that is logically the same, but which SQL server does not get along with - it has to do them in the order you say it.

+1


source







All Articles