Getting a random number for each line

I have a table with multiple names per row. For each line, I want to generate a random name. I wrote the following request:

BEGIN transaction t1

Create table TestingName
(NameID int,
 FirstName varchar(100),
 LastName varchar(100)
)

INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants' 
UNION 
SELECT 1, 'Bugs', 'Bunny' 
UNION 
SELECT 2, 'Homer', 'Simpson' 
UNION 
SELECT 3, 'Mickey', 'Mouse' 
UNION 
SELECT 4, 'Fred', 'Flintstone'

SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5

ROLLBACK Transaction t1

      

The problem is that the "ABS (CHECKSUM (NEWID ()))% 5" part of this query sometimes returns more than 1 row and sometimes returns 0 rows. I must be missing something, but I don't see it.

If I change the request to

DECLARE @n int
set @n= ABS(CHECKSUM(NEWID())) % 5

SELECT FirstName from TestingName
WHERE NameID = @n

      

Then everything works and I get a random number in the string.

If you take the above query and insert it into SQL Management Studio and run the first query multiple times, you can see what I am trying to describe.

The final update request will look like

Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5) 

      

It doesn't work because sometimes I get more than 1 line and sometimes I don't get lines.

What am I missing?

+3


source to share


3 answers


The problem is you are getting a different random value for each row. This is problem. This query is probably doing a full table scan. The sentence where

is executed for each line - and another random number is generated.

So, you can end up with a sequence of random numbers where none of the IDs match. Or a sequence where more than one matches. On average, you will have one match, but you don't want "average", you want a guarantee.

This is when you want rand()

, which only produces one random number for each request:



SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);

      

This should give you one meaning.

+1


source


Why not use top 1?



Select top 1 firstName
From testingName
Order by newId()

      

+1


source


This worked for me:

WITH
CTE
AS
(
    SELECT
        ID
        ,FName
        ,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
    FROM
        dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
    SELECT
        CTE.ID
        , CTE.FName
        , dbo.TestingName.FirstName AS RandomName
    FROM
        CTE
        LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;

      

This decision depends on how the smart optimizer works.

For example, if I use INNER JOIN

instead LEFT JOIN

(which is the correct choice for this query), the optimizer will move the calculation of the random numbers outside of the join loop and the end result is not what we expect.

I created a table TestingName

with 5 rows as in the question and a table TableWithABunchOfNames

with 100 rows.

Here is the execution plan with LEFT JOIN

. You can see that Compute scalar

which calculates random numbers is executed before the join cycle. You can see that 100 lines have been updated:

left join

Here is the execution plan with INNER JOIN

. You can see that Compute scalar

which calculates random numbers is executed after the join loop and with an additional filter. This query may not update all rows in TableWithABunchOfNames

, and some rows in TableWithABunchOfNames

may be updated multiple times. You can see there Filter

are 102 lines Stream aggregate

left and only 69 lines left. This means that only 69 rows were updated, and also a few matches for some rows (102 - 69 = 33).

inner join


To ensure that the result is what you expect, you must generate a random number for each line in TableWithABunchOfNames

and remember the result explicitly, i.e. materialize CTE

shown above. Then use this temporary result to join the table TestingName

.

You can add a column in TableWithABunchOfNames

to store the generated random numbers, or save CTE

to a temporary table or table variable.

0


source







All Articles