SELECT TOP 1 without ORDER BY Problem ignores WHERE clause

We have a table containing unique codes. To generate new unique code, we use the approach found in the below SQL statement and the uncovered cases where the statement NOT EXISTS

appears to allow strings to exist through them.

No problem with concurrency as it has been proven in a sandbox using a single query running against SQL Server 2016. If we put a statement ORDER BY

, it unexpectedly works as expected. It seems that without ORDER BY

this request conditionally ignores the offer WHERE

. In the event of a collision of all codes, I would expect to @code

be either NULL or remain the original state of 0.

DECLARE @code int = 0;

    select  @code = Code from (
        SELECT top 1 randoms.Code
        FROM (
            VALUES 
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
        ) randoms (Code)
        WHERE NOT EXISTS (SELECT 1 FROM TEST_Codes uc WHERE uc.Code = randoms.Code)
    ) c;


    SELECT 
        c.code,
        ud.*
    FROM (VALUES (@code)) as c(Code)
    LEFT OUTER JOIN TEST_Codes ud
        ON ud.Code = c.Code

      

This statement allows you to return duplicates, which is confusing because of the operator WHERE NOT EXISTS

.

If we change the definition of a species to c

like ) c ORDER BY c.Code

, it suddenly starts working. Why is this?

+3


source to share


2 answers


Sql server does not guarantee how many times it will execute computational scalars and similar expressions. Perhaps the reference is where a different value is used than the selected one, but when you add an order, it materializes it and calculates only once per line.

If you are on 2014 or higher, you can use the extended events session on query_trace_column_values

to see this.

DECLARE @TestCodes TABLE(Code int)
dbcc traceon(2486);
set statistics xml on;

    select  Code from (
        SELECT randoms.Code
        FROM (
            VALUES 
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
        ) randoms (Code)
        WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = randoms.Code)
    ) c
     option(recompile);


set statistics xml off;
dbcc traceoff(2486);

      

enter image description here

enter image description here

The column Union1005

is derived from continuous scanning in the upper right corner. It is also mentioned again in the join predicate. At this point, it is reevaluated and returns a different number.

You might be able to hack the request and only get it once, but nothing is guaranteed. The only 100% safe way is to materialize the random numbers in front (for example, in a temp table) before you check, so you can be sure they won't be recalculated or changed below you.




Below is an example of a hack with SQL for not guaranteed results. I wouldn't use this as it has drawbacks that it doesn't guarantee anything anyway, and even if it works, if you select the top 1, your "random" numbers won't be allocated anymore. It introduces an offset for lower numbers.

select  Code from (
    SELECT TOP 5 randoms.Code
    FROM (
        VALUES 
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
    ) randoms (Code)
    order by Code
    ) T
    WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = T.Code)

      

enter image description here

This materializes it, and the value deduced from the sort is the same as in the nested loops predicate.

enter image description here

+4


source


Hypothesis

You need a function that generates 5 random values ​​that you want to provide, not from the table containing your code values.

Suppose your TEST_CODE also contains no duplicates, and along with the TOP 1 statement, the developer may well assume that its numbers will always be random.

PROBLEM

@ Martin.Smith makes some good points on how this hack ... and yes, it is a hack method. The functions do not allow the RAND () function to be used, and what you are looking for is not so much a solution as a workaround. Note the reason for rejecting NEWID ()

Invalid use of a side or time-dependent operator in 'newid ()' inside a function.

What will Microsoft say? They clearly did not intend to allow Non-deterministic functions, so you cannot be supported here at all.

Instead, consider this as the Insert operation that is. You are left with only one random value, in which the only requirement is that it does not match any of the existing codes, active or passive.

ADDED - Also, you want to make your code portable, perhaps even to work. Use a stored procedure instead.

Testing methods:

CREATE TABLE Test_Codes (Code INT, ud NVARCHAR(50) )
GO

DECLARE @COUNT INT = 1;
WHILE (@Count < 100000)
    BEGIN
    INSERT INTO Test_Codes (Code, ud)
        VALUES (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)
                 , 'PoliceNumber[' + CAST(@Count AS NVARCHAR(7) ) + ']')
    SET @COUNT = @COUNT + 1 
    END
GO

      



Alternative solution

CREATE PROCEDURE dbo.USP_Code_INS (@Name NVARCHAR(100) )
AS
BEGIN
        INSERT INTO Test_Codes (Code, ud)
        SELECT TOP 1 B.Code, @Name
        FROM Test_Codes A
        RIGHT OUTER JOIN (  SELECT randoms.Code
                            FROM (
                                VALUES 
                                (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
                                (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
                                (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
                                (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
                                (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
                            ) randoms (Code) ) B ON B.Code = A.Code
        WHERE A.Code IS NULL
END
GO

      

Note the PREDICATE

simple comparison here NULL

. You are guaranteed to get a unique value with a solution that is tabular. If you are worried about the case where no rows will be inserted, add a security check (inserts will be expensive). In the meantime, there is no need to know about it. ”

SELECT TOP 1 B.Code, @Name
        FROM Test_Codes A
        RIGHT OUTER JOIN (table of NEWID() rows) B
WHERE A.Code IS NULL

      

You join tables to any matched rows based on the smallest table (B) and run a predicate that deletes any result when Test_Codes has a match. The predicate uses NULL or NULL boolean comparisons, which the optimizer is very efficient at guessing and sorting.

This is a set-based solution, and basically this is how relational languages ​​work.

/* Dynamic Method. Allows to create a job that is simple as a flag to optimize */
CREATE PROCEDURE dbo.USP_Code_INS2 (@BatchNumber INT, @Name NVARCHAR(100) )
AS
BEGIN
     DECLARE @BatchNewID NVARCHAR(MAX) = N'(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))'
           , @Count INT = 1
           , @SQLStatement NVARCHAR(MAX)
           , @ParmDefinitions NVARCHAR(100) = N'@Name NVARCHAR(100)'
     /*  Dynamically create newid() table to flag-size. */
     WHILE @Count < @BatchNumber AND @BatchNumber > 1
        BEGIN
            SET @BatchNewID = @BatchNewID + N'
                                , (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))'
            SET @Count = @Count + 1
        END
    /* Now insert into SQL statement */
    SET @SQLStatement = N'INSERT INTO Test_Codes (Code, ud)
        SELECT TOP 1 B.Code, @Name
        FROM Test_Codes A
        RIGHT OUTER JOIN (  SELECT randoms.Code
                            FROM (
                                VALUES ' + @BatchNewID
                        +   N') randoms (Code) ) B ON B.Code = A.Code
        WHERE A.Code IS NULL'
    --PRINT @SQLStatement
    EXEC sp_executesql @SQLStatement, @ParmDefinitions
            , @Name = @Name
END

      

Conclusion

  • Avoid functions as it complicates your decision.
  • Stored procedures offer a clear , simple and therefore manageable solution. Microsoft provides many solutions, so use what they are designed for.

The benefits of stored procedures

β€’ Precompiled execution : SQL Server compiles each stored procedure once and then reuses the execution plan. This leads to a huge improvement in storage performance. Procedures are called repeatedly.

β€’ Reduced client / server traffic . If: Network bandwidth is an issue in your environment, then you will be happy to know that Stored Procedures can shorten long SQL queries to a single line that is sent over the wire.

β€’ Efficient code reuse and programming abstractions : Stored procedures can be used by multiple users and client programs. If you use them as planned, you will find that the development cycle takes less time.

β€’ Advanced security features . You can grant users permission to execute the stored procedure independently of the underlying table

MSDN - Difference Between Stored Procedure and Function

+1


source







All Articles