SQL Server 2005 UDF Scalar Performance

I have a table where I store Lat / Long coordinates and I want to make a query where I want to get all records that are some distance from a certain point.

This table contains about 10 million records as well as an index on Lat / Long fields

It doesn't have to be exact. Among other things, I believe that 1 degree long == 1 degree lat, which I know is wrong, but the ellipse I get is good enough for this purpose.

In my examples below, let's say we are talking about [40, 140], and my radius in degrees is 2 degrees.

I tried this in two ways:


1) I created a UDF to calculate the area of ​​the distance between two points, and I am running this UDF in a query.

SELECT Lat, Long FROM Table   
WHERE (Lat BETWEEN 38 AND 42)   
  AND (Long BETWEEN 138 AND 142)  
  AND dbo.SquareDistance(Lat, Long, 40, 140) < 4

      

I first filter the square to speed up the query and allow SQL to use the index, and then refine that to match only the records that fall in the circle with my UDF.


2) Run a query to get a square (same as before but without the last line), load ALL of these entries into my ASP.Net code and calculate the circle in the ASP.Net side (same idea, calculate the square of the distance so that save the call to Sqrt, and compare with the square of my radius).


To my surprise, calculating a circle on the .Net side is about 10x faster than using a UDF, which makes me think I am doing something horribly wrong with this UDF ...

This is the code I am using:

CREATE FUNCTION [dbo].[SquareDistance] 
(@Lat1 float, @Long1 float, @Lat2 float, @Long2 float)
RETURNS float
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Result float
    DECLARE @LatDiff float, @LongDiff float

    SELECT @LatDiff = @Lat1 - @Lat2
    SELECT @LongDiff = @Long1 - @Long2

    SELECT @Result = (@LatDiff * @LatDiff) + (@LongDiff * @LongDiff)

    -- Return the result of the function
    RETURN @Result

END

      

Did I miss something?
Shouldn't you be using UDFs in SQL Server much faster than loading 25% more records than needed for .Net, with the overhead of DataReader, inter-process communication, and whatnot?

Is there something I am doing horribly wrong about this UDF that is causing it to run slowly?
Is there a way to improve it?

Many thanks!

+1


source to share


4 answers


You can improve the performance of this UDF by NOT declaring variables and making your calculations stricter. This will likely improve performance, but (but probably not that much).

CREATE FUNCTION [dbo].[SquareDistance] 
(@Lat1 float, @Long1 float, @Lat2 float, @Long2 float)
RETURNS float
AS
BEGIN
    Return ( SELECT ((@Lat1 - @Lat2) * (@Lat1 - @Lat2)) + ((@Long1 - @Long2) * (@Long1 - @Long2)))
END

      

Better yet would be to remove the function and put the computation in the original query.



SELECT Lat, Long FROM Table   
WHERE (Lat BETWEEN 38 AND 42)   
  AND (Long BETWEEN 138 AND 142)  
  AND ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140))  < 4

      

There is a bit of overhead when calling a user-defined function. By removing the function, you will probably do a little work.

Also, I recommend that you check your execution plan to make sure you get the index queries as you expect.

+3


source


There is a lot of overhead in using UDFs .

Even encoding it in a string might not be very good because the index cannot be used, although here the BETWEEN clauses are supposed to reduce the data that needs to be crumpled.

To expand on the G Mastros idea, separate the pick bit from the square chisel. This can help the optimizer.

SELECT
    Lat, Long
FROM
    (
    SELECT
        Lat, Long
    FROM 
        Table   
    WHERE
        (Lat BETWEEN 38 AND 42)   
        AND
        (Long BETWEEN 138 AND 142)
    ) foo
WHERE
    ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140))  < 4

      



Edit: you can reduce the actual calculations. This next idea can reduce the number of computations from 7 to 5

    ...
    SELECT
        Lat, Long,
        Lat - 40 AS LatDiff, Long - 140 AS LongDiff
    FROM 
    ...
    (LatDiff * LatDiff) + (LongDiff * LongDiff)  < 4
    ...

      

Basically, try the suggested 3 solutions and see what works. The optimizer might ignore the derived table, it might use it, or it might create an even worse plan.

+3


source


Check out this article that describes why UDFs in SQL Server are generally a bad idea. If you are unsure that the table you are calling the UDF will not grow, beware that UDFs always call ALL rows in your tables, and not (as one might mistakenly assume) only on the result set. This can give you a lot of success when creating your database.

A very good detail related article also contains some ways to overcome the problem, but the real fact is that the SQLQL SQLQL dialect skips a way to create a scalar function or a deterministic one (like Oracle).

+1


source


Update:

GMastros: You were absolutely right. Doing the math on the query itself is infinitely faster than UDFs. I am using the SQUARE () function to do the multiplication, which makes it more concise, but the performance is the same.

However, to do it this way: even twice as slow as doing math in .Net.
I can't figure it out, but I came up with a compromise that is useful for my particular situation (which sucks because I need to duplicate code, but that's the best scenario if we can't find a way to make the circle computation in SQL faster)

Thank!

0


source







All Articles